Web Scrapping using Python | DS-1

In this particular blog we will see what is web scrapping, how it works, what are the libraries required and will perform web scrapping using beatifulsoup library

Web Scrapping

If you are scrapping a page for educational purposes then you’re unlikely to have any problems. Still, it’s a good idea to do some research on your own and make sure that you’re not violating any Terms of Service of the website of company. because web scraping is illegal for some websites so it is better to check robots.txt file on that particular website.

Libraries used in Web Scrapping

· Selenium

· Pandas

Now follow this steps for web scrapping:

Installing Libraries:

pip install beautifulsoup4
pip install selenium

Importing Libraries in code:

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver

Webdriver:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver=webdriver.Chrome('chromedriver',chrome_options=chrome_option)

Note: If you are using pycharm then you can download chromedriver from this link according to your chrome version ( to check chrome version go to Settings > About Chrome)

Getting website content:

driver.get("https://www.flipkart.com/search?q=IPHONE&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off")content = driver.page_source
soup = BeautifulSoup(content)

Creating an empty List for our data:

products=[]
prices=[]
features=[]
ratings=[]

Appending Data to List:

Now find that particular class which is having name, price, features and rating. for example in our case for ‘name’ there is class ‘_4rR01T’.

Now append the data using this code:

for a in soup.findAll('a',href=True, attrs={'class':'_1fQZEK'}):
name=a.find('div',attrs={'class':'_4rR01T'})
price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
feature=a.find('div',attrs={'class':'fMghEO'})
rating=a.find('div',attrs={'class':'_3LWZlK'})products.append(name.text)
prices.append(price.text)
features.append(feature.text)
ratings.append(rating.text)

Creating Data Frame and extracting to CSV :

df = pd.DataFrame({'Product Name':products,'Price':prices , 'Feature':features, 'Rating': ratings})print(df.head())

extracting data to csv file

df.to_csv('products.csv', index=False, encoding='utf-8')

by downloading CSV file we can see the output

Github Link:

thank you !!!

student