Simple Web Scraping Example Using BeautifulSoup: Instructable Statistics With Python
by FuzzyPotato in Circuits > Computers
321 Views, 4 Favorites, 0 Comments
Simple Web Scraping Example Using BeautifulSoup: Instructable Statistics With Python
In this Instructable, we'll explore the world of web scraping using the BeautifulSoup library in Python. Our focus will be on extracting view statistics from an Instructable member's account.
Web scraping allows us to automatically gather data from websites, and with BeautifulSoup, we can easily navigate and extract information from the HTML structure of Instructables pages.
Specifically, we'll be retrieving the view counts from Instructables projects. By harnessing the power of web scraping, we can track project popularity, monitor growth, and gain valuable insights into the reach and impact of different creations.
With just a few lines of code, we'll unlock the ability to collect and analyze data from Instructables projects, opening up a world of possibilities for research, analysis, and tracking. Let's dive in and discover the exciting world of web scraping with BeautifulSoup and Instructables statistics!
Supplies
To web scrape instructable statistics using Python, you'll need:
- Windows computer with Python installed
- IDE (I'm using PyCharm)
- beautifulsoup4 Library
- requests Library
Install Required Libraries
We will start by installing the beautifulsoup4 and requests libraries.
- Open the pyCharm development environment (IDE).
- Navigate to File>Settings>Project>Python Interpreter
- Search for BeautifulSoup.
- Install the beautifulsoup4 library.
- Wait for the installation process to complete. You should see a message indicating a successful installation.
- Repeat for the requests library
Code
Now that we have the beautifulsoup4 and requests libraries installed, it's time to write the Python code that will allow us to scrap the instructable values.
- Create a new Python script.
- Copy and paste the following code:
import requests
from bs4 import BeautifulSoup
account_url = "https://www.instructables.com/member/FuzzyPotato/instructables/" # Replace FuzzyPotato with the username of interest.
response = requests.get(account_url)
soup = BeautifulSoup(response.text, "html.parser")
instructables = soup.find_all("div", class_="thumbnail ible-thumbnail")
print("Instructables count:", len(instructables))
for instructable in instructables:
title = instructable.find("a", class_="title").text
url = instructable.find("a", class_="title")["href"]
project_url = "https://www.instructables.com" + url
response = requests.get(project_url)
soup = BeautifulSoup(response.content, 'html.parser')
view_count_element = soup.find('p', class_='svg-views view-count')
view_count = int(view_count_element.text.strip().replace(',', ''))
print(f"Title: {title}")
print(f"URL: {project_url}")
print(f"View Count: {view_count}")
print("-------------------------")
Extract and Display the Instructable Statistics
In the final step, we will run the code to scrape and display the view statistics from an Instructables member's projects. Here is how it will work:
- The code will start by opening the member's project page.
- Using BeautifulSoup, the code will navigate the HTML content and collect all the project titles.
- The code will then open each project page and extract the view count for each project.
- The view count for each project will be extracted.
- The code will display the extracted statistics to the user.
- If the view count element is not found, a message indicating its unavailability will be displayed.
By following these steps, you will successfully scrape and display the view statistics from an Instructables member's projects.
Feel free to expand on this code and explore more advanced scraping techniques or integrate it into your own projects. happy tinkering!