Web Scrapping using Python | DS-1
In this particular blog we will see what is web scrapping, how it works, what are the libraries required and will perform web scrapping using beatifulsoup library
Web Scrapping
Web scraping is the process of gathering information from websites. Some websites don’t like it when automatic scrapers gather their data, while others don’t mind.
If you are scrapping a page for educational purposes then you’re unlikely to have any problems. Still, it’s a good idea to do some research on your own and make sure that you’re not violating any Terms of Service of the website of company. because web scraping is illegal for some websites so it is better to check robots.txt file on that particular website.
Libraries used in Web Scrapping
· Beautiful soup
· Selenium
· Pandas
Now follow this steps for web scrapping:
Installing Libraries:
pip install beautifulsoup4
pip install selenium
Importing Libraries in code:
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
Webdriver:
As we all know, to get the content from the website we need to provide a webdriver of the browser we are using. As I am using Google Colab to perform this hands-on, I have to write the below-given code to drive the data from the website using the chromedriver.
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver=webdriver.Chrome('chromedriver',chrome_options=chrome_option)
Note: If you are using pycharm then you can download chromedriver from this link according to your chrome version ( to check chrome version go to Settings > About Chrome)
Getting website content:
Now, we have to copy the URL in which we want to do web scrapping. In this example we have used Iphones Data from flipkart.
driver.get("https://www.flipkart.com/search?q=IPHONE&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off")content = driver.page_source
soup = BeautifulSoup(content)
Creating an empty List for our data:
products=[]
prices=[]
features=[]
ratings=[]
Appending Data to List:
Now, as we have chosen shopping website and we will gather information like name, price, features, and ratings. for that right click on web page and click on inspect.
Now find that particular class which is having name, price, features and rating. for example in our case for ‘name’ there is class ‘_4rR01T’.
Now append the data using this code:
for a in soup.findAll('a',href=True, attrs={'class':'_1fQZEK'}):
name=a.find('div',attrs={'class':'_4rR01T'})
price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
feature=a.find('div',attrs={'class':'fMghEO'})
rating=a.find('div',attrs={'class':'_3LWZlK'})products.append(name.text)
prices.append(price.text)
features.append(feature.text)
ratings.append(rating.text)
Creating Data Frame and extracting to CSV :
Now, will create data frame from our list by giving column name using Pandas library
df = pd.DataFrame({'Product Name':products,'Price':prices , 'Feature':features, 'Rating': ratings})print(df.head())
extracting data to csv file
df.to_csv('products.csv', index=False, encoding='utf-8')
by downloading CSV file we can see the output
Github Link:
thank you !!!