Web Scrapping using Python | DS-1

Web Scrapping

Web scraping is the process of gathering information from websites. Some websites don’t like it when automatic scrapers gather their data, while others don’t mind.

Libraries used in Web Scrapping

· Beautiful soup

Now follow this steps for web scrapping:

Installing Libraries:

pip install beautifulsoup4
pip install selenium

Importing Libraries in code:

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver

Webdriver:

As we all know, to get the content from the website we need to provide a webdriver of the browser we are using. As I am using Google Colab to perform this hands-on, I have to write the below-given code to drive the data from the website using the chromedriver.

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver=webdriver.Chrome('chromedriver',chrome_options=chrome_option)

Getting website content:

Now, we have to copy the URL in which we want to do web scrapping. In this example we have used Iphones Data from flipkart.

driver.get("https://www.flipkart.com/search?q=IPHONE&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off")content = driver.page_source
soup = BeautifulSoup(content)

Creating an empty List for our data:

products=[]
prices=[]
features=[]
ratings=[]

Appending Data to List:

Now, as we have chosen shopping website and we will gather information like name, price, features, and ratings. for that right click on web page and click on inspect.

for a in soup.findAll('a',href=True, attrs={'class':'_1fQZEK'}):
name=a.find('div',attrs={'class':'_4rR01T'})
price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
feature=a.find('div',attrs={'class':'fMghEO'})
rating=a.find('div',attrs={'class':'_3LWZlK'})products.append(name.text)
prices.append(price.text)
features.append(feature.text)
ratings.append(rating.text)

Creating Data Frame and extracting to CSV :

Now, will create data frame from our list by giving column name using Pandas library

df = pd.DataFrame({'Product Name':products,'Price':prices , 'Feature':features, 'Rating': ratings})print(df.head())
df.to_csv('products.csv', index=False, encoding='utf-8')

Github Link:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store