Skip to main content

Scraping HTML Content using Python

To Scrape Data using python we are using BeautifulSoup python Package

!pip install beautifulsoup4

As a first step we have to import the packages and html page that we need to scrape. In here I have used some static HTML content which was customized to scrape the data.
#imports

import requests
from bs4 import BeautifulSoup



#html HTML Sample Doing Data Science with Python

Doing Data Science with Python

Author: Eranda Kodagoda

This will help to perform various data science activitied using python

Modules

Title Duration in minutes
Getting Started 20
Setting Up Environment 40
Extracting Data 30
Exploring and Processing Data 45
Building Productive Model 45
To View the HTML using beautifulsoup we can use below code-lines and execute on python executor

from IPython.core.display import display, HTML
display(HTML(html_string))

To Print the HTML using beautifulsoup we can use below code-lines and execute on python executor

ps=BeautifulSoup(html_string,"lxml")
print(ps)

Find and extract content by HTML tags

#use Parameter name to select by tag name

body=ps.find(name="body")
print(body)

Extract the value referred in the HTML tag


# use text attribute to get the content of the tag

print(body.find(name="h1").text)

# find first element by using .text its restricting the HTML tag

print(body.find(name="p").text)

# find all elements

print(body.findAll(name="p"))

# get only the contents of "p" elements Loop through each element

for p in body.findAll(name="p"):
print(p.text)

# add attributes in selection process

print(body.find(name="p", attrs={"id":"description"}))

#get the data contain in the table

body=ps.find(name="body")
module_table=body.find(name="table",attrs={"id":"module"})
for row in module_table.findAll(name="tr") [1:]:
title=row.findAll(name="td")[0].text
duration=row.findAll(name="td") [1].text
print (title,duration)

Comments

Popular posts from this blog

Insert script with multiple cursors and condition check

DECLARE CURSOR C1 IS   SELECT ID FROM TABLE_NAME_1 WHERE COLUMN IN ('');   CURSOR C2     IS       SELECT ID FROM TABLE_NAME_2 WHERE COLUMN IN ('');              CURSOR C3 (CP_TABLE_01_ID NUMBER,CP_TABLE_02_ID NUMBER)         IS           SELECT COUNT(*) AS COUNT_UP           FROM TABLE_NAME_3           WHERE COLUMN_CONDITION_01=CP_TABLE_02_ID           AND COLUMN_CONDITION_02=CP_TABLE_01_ID; COUNT_UP NUMBER; BEGIN FOR R1 IN C1 LOOP     FOR R2 IN C2     LOOP          OPEN C3(R1.ID,R2.ID);        FETCH C3 INTO COUNT_UP;        CLOSE C3;               IF (COUNT_UP=0) THEN           INSERT           INTO TABL...

REF Cursor

REF CURSOR WILL BE DYNAMICALLY OPENS OR OPEN BASED ON A LOGIC. DECLARE TYPE C1 IS REF CURSOR ; CURSOR C IS SELECT * FROM DUAL; REF_CURSOR RC; BEGIN IF (TO_CHAR(SYSDATE, 'DD' ) = 30 ) THEN OPEN REF_CURSOR FOR 'SELECT * FROM TABLE1' ; ELSIF ( TO_CHAR(SYSDATE, 'DD' ) = 29 ) THEN OPEN REF_CURSOR FOR SELECT * FROM TABLE2; ELSE OPEN REF_CURSOR FOR SELECT * FROM DUAL; END IF ; OPEN C; END ;