Web Scraping (BeautifulSoup, Selectors, Pagination) - Python Tutorial for Beginners #33
Video: Web Scraping (BeautifulSoup, Selectors, Pagination) - Python Tutorial for Beginners #33 by Taught by Celeste AI - AI Coding Coach
Watch full page →Web Scraping with BeautifulSoup: Fetching, Parsing, and Pagination in Python
Discover how to scrape web pages using Python by fetching HTML content with requests and parsing it with BeautifulSoup. Learn to extract data using tag searches and CSS selectors, and handle multi-page scraping through pagination links to collect comprehensive datasets.
Code
import requests
from bs4 import BeautifulSoup
from collections import Counter
# Function to scrape quotes from a single page URL
def scrape_quotes(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
quotes = []
# Find all quote blocks by their class
for quote_div in soup.find_all('div', class_='quote'):
text = quote_div.find('span', class_='text').get_text()
author = quote_div.find('small', class_='author').get_text()
tags = [tag.get_text() for tag in quote_div.select('.tags a.tag')]
quotes.append({'text': text, 'author': author, 'tags': tags})
return quotes
# Function to get the next page URL from pagination
def get_next_page(soup):
next_button = soup.select_one('li.next a')
if next_button:
return 'http://quotes.toscrape.com' + next_button['href']
return None
# Main scraping loop to collect quotes from multiple pages
def scrape_all_quotes(start_url):
all_quotes = []
url = start_url
while url:
print(f'Scraping {url}')
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
all_quotes.extend(scrape_quotes(url))
url = get_next_page(soup)
return all_quotes
if __name__ == '__main__':
start_url = 'http://quotes.toscrape.com'
quotes = scrape_all_quotes(start_url)
# Count the most common tags across all quotes
all_tags = [tag for quote in quotes for tag in quote['tags']]
tag_counts = Counter(all_tags)
print(f'Total quotes scraped: {len(quotes)}')
print('Most common tags:', tag_counts.most_common(5))
Key Points
- Use
requests.get()to fetch HTML content from web pages. - Parse HTML with BeautifulSoup to navigate and extract elements using
find(),find_all(), and CSS selectors likeselect(). - Extract text and attributes cleanly to build structured data such as dictionaries.
- Handle pagination by locating "next" page links and iterating through multiple pages to scrape comprehensive datasets.
- Analyze scraped data with Python tools like
collections.Counterto summarize and gain insights.