Scraping HTML Content using Python

To Scrape Data using python we are using BeautifulSoup python Package

!pip install beautifulsoup4

As a first step we have to import the packages and html page that we need to scrape. In here I have used some static HTML content which was customized to scrape the data.
#imports

import requests
from bs4 import BeautifulSoup

#html HTML Sample Doing Data Science with Python

Doing Data Science with Python

Author: Eranda Kodagoda

This will help to perform various data science activitied using python

Modules

Title	Duration in minutes
Getting Started	20
Setting Up Environment	40
Extracting Data	30
Exploring and Processing Data	45
Building Productive Model	45

To View the HTML using beautifulsoup we can use below code-lines and execute on python executor

from IPython.core.display import display, HTML
display(HTML(html_string))

To Print the HTML using beautifulsoup we can use below code-lines and execute on python executor

ps=BeautifulSoup(html_string,"lxml")
print(ps)

Find and extract content by HTML tags

#use Parameter name to select by tag name

body=ps.find(name="body")
print(body)

Extract the value referred in the HTML tag

# use text attribute to get the content of the tag

print(body.find(name="h1").text)

# find first element by using .text its restricting the HTML tag

print(body.find(name="p").text)

# find all elements

print(body.findAll(name="p"))

# get only the contents of "p" elements Loop through each element

for p in body.findAll(name="p"):
print(p.text)

# add attributes in selection process

print(body.find(name="p", attrs={"id":"description"}))

#get the data contain in the table

body=ps.find(name="body")
module_table=body.find(name="table",attrs={"id":"module"})
for row in module_table.findAll(name="tr") [1:]:
title=row.findAll(name="td")[0].text
duration=row.findAll(name="td") [1].text
print (title,duration)

Learn PL SQL, Data Science and Automation

Search This Blog

Scraping HTML Content using Python

Doing Data Science with Python

Modules

Comments

Post a Comment

Popular posts from this blog

Behavior Driven Development

Basics of Data Science with Python