How to Chart Your Instructable Statistics
by danionescu in Circuits > Software
1285 Views, 19 Favorites, 0 Comments
How to Chart Your Instructable Statistics
In this instructable we'll be talking about how you can generate graphs with your instructable statistics. We'll be tracking views, favorites, comments and popularity. I've defined popularity as a percent of favorites / views.
I've searched for this subject on instructables, and found only one instructable which addresses this issue, but unfortunately the solution doesn't work now
Instructables itself does not have any graphs to track your personal instructable so i've decided to build a python script to do just that for me and anyone interested. It works by collecting data for your instructable, and sending it over your thingspeak.com personal account (it's free). Thingspeak it's great for sending any kind IOT data and plot graphs.
Also i'll try to explain step by step how the script works, how it parses the data and sends it to thingspeak. So if you're interested about the programming behind it, Step 3 is for you.
The prerequisites are:
- one or more personal instructables to track
- a thingspeak.com account (you can create one free)
- a development board like RaspberryPi, C.H.i.P, or anything running linux, you can even set it up on your personal computer or a personal server where you have SSH access
- basic unix and programming knolledge, i'll try to explain everything step by step but basic understanding is required
Creating Thingspeak Account, and Channels
In this step we'll be creating an thingspeak account and also one channel for every instructable we'll need to track.
ThingSpeak is an open source “Internet of Things” application and API to store and retrieve data from things using HTTP over the Internet or via a Local Area Network.
If you don't have an account visit the account create page. After submitting this simple register page we'll need to create a channel for every instructable that needs to be tracked.
A channel in thingspeak it's a place where similar data can live, in our case: views, favorites comments, and poplarity. We'll be using a channel to monitor all relevant parameters of an instructable. They provide an API to update channel data.
I've also attached a picture of how the channel create page should be filled.
The steps for creation of a channel with one instructable statistics are:
1. login using your username and password that you previously created
2. go to channels page and click "New Channel" button
3. enter a suggestive name for your instructable that is to be tracked like : "Instructables: Automated windows shades"
4. check field1, field2, field3 and field4 checkboxes
5. name field1 to "Views", field2 to "Favorites", field3 to "Comments" and field4 to "Popularity"
6. you can add a description if you like, and make the channel public (there is a checkbox on the page)
7. click "Save channel" button
You can repeat this step for all of your instructables that you want to track.
Server Environment Set Up Instructions and Configuration
The steps are :
1.checking dependencies and installing missing ones
2.downloading the project
3.configuring the project
4. add an entry in the crontab for the script and save logs to a file
1. checking dependencies and installing missing ones
- for this project we will need python 2.7.x
issue this command into a terminal to find out your version
python --version # it will output something like Python 2.7.9
If your version doens't match then install it, for raspberryPi, ubuntu you can issue:
sudo apt-get install -y python2.7
Then we'll need PIP, it's python package manager
sudo apt-get install python-pip
2.downloading the project
Download the "instructablesStatistics.tar.gz" attached file, copy it to your final location
cd our_parent_folder_location tar -xvzf instructablesStatistics.tar.gz #this will extract the archive in a folder called instructablesStatistics cd instructablesStatistics sudo pip install requests==2.10.0 #the last line will install request library<br>
import urllib, httplib, requests, re, json, time import os.path import config class InstructablesParser: STATS_KEY_REGEX = "LogHit\(\'([A-Z0-9]*)\'" STATS_URL = 'https://www.instructables.com/json-api/getIbleStats?id={0}' def parse_raw_metrics(self, url): page_html = requests.get(url) found_matches = re.findall(self.STATS_KEY_REGEX, page_html.content) instructables_stats_key = found_matches[0] stats_data = requests.get(self.STATS_URL.format(instructables_stats_key)) return json.loads(stats_data.content) class Statistics: FILE_NAME = 'history.txt' DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0} def get_since_last(self, id, latest_metrics): last_stats = self.__read_last_saved_data(id) if len(set(last_stats.items()) & set(self.DEFAULT_METRICS.items())) == 4: return last_stats stats = {} stats['views'] = latest_metrics['views'] - last_stats['views'] stats['comments'] = latest_metrics['comments'] - last_stats['comments'] stats['favorites'] = latest_metrics['favorites'] - last_stats['favorites'] stats['popularity'] = round((latest_metrics['favorites'] / float(latest_metrics['views'])) * 100, 3) return stats def __read_last_saved_data(self, id = None): if not os.path.isfile(self.FILE_NAME): open(self.FILE_NAME, 'w').close() file = open(self.FILE_NAME, 'r+') line = file.readline() file.close() if len(line) < 2: if id != None: return self.DEFAULT_METRICS else: return {} data = json.loads(line) if id != None: if id in data: return data[id] else: return self.DEFAULT_METRICS return data def update_with_latest(self, id, latest_metrics): file = open(self.FILE_NAME, 'r') data = self.__read_last_saved_data() file.close() open(self.FILE_NAME, 'w').close() file = open(self.FILE_NAME, 'w') data[id] = latest_metrics file.write(json.dumps(data)) file.close()</p><p>def post_metrics(key, metrics): headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"} params = urllib.urlencode( {'field1': metrics['views'], 'field2': metrics['favorites'], 'field3' : metrics['comments'], 'field4' :metrics['popularity'], 'key': key} ) conn = httplib.HTTPConnection("api.thingspeak.com:80") try: conn.request("POST", "/update", params, headers) response = conn.getresponse() response.read() conn.close() return True except: return False instrucatables_parser = InstructablesParser() statistics = Statistics() for crawl_item in config.crawl_list: raw_metrics = instrucatables_parser.parse_raw_metrics(crawl_item['url']) latest_metrics = statistics.get_since_last(crawl_item['id'], raw_metrics) print 'Latest metrics for instructable: " {0} " '.format(crawl_item['url']) print latest_metrics statistics.update_with_latest(crawl_item['id'], raw_metrics) status = post_metrics(crawl_item['write_api_key'], latest_metrics) if not status: print 'Could not get instructable {0} statistics'.format(crawl_item['url']) time.sleep(16)
3.configuring the project
For the configuration you need to know your full instructable urls, and a write API key for each of them.
in the config.py i've created a demo configuration file with two of my instructables:
crawl_list = [ { 'url' : 'https://www.instructables.com/id/Simple-and-Cheap-Phone-Controlled-Fireworks-Ignite/', 'id' : '223292', 'write_api_key': 'some_key' }, { 'url' : 'https://www.instructables.com/id/Automated-Windows-Shades/', 'id' : '223001', 'write_api_key': 'some_other_key' }, ]
So for each instructable you need to parse, you have to specify the instructable url, the instructable id (can be anything unique), and the "write api key".
To locate an api key do the following:
- go to https://thingspeak.com/channels
- click on your channel name (we created that earlier) ex in my case: "Instructables: Automated windows shades"
- go to "Api keys" tab
- copy the "Write API Key" and paste it into your config
4. add an entry in the crontab for the script and save logs to a file
# log in to your server # the following command will open crontab in your favorite editor # note a small menu might ask you what editor to use crontab -e -u your_user # the following lines will create an empty log file, and set rights for it echo "" > /your_log_location/instructables.txt chmod 776 /your_log_location/instructables.txt
Add this to your crontab:
0 13 * * * python /our_parent_folder_location/instructablesStatistics/server.py >> /your_log_location/instructables.txt
Then exit crontab, the schedule will be set.
The last line will use python to run our script at 13:00 every day, and save the log to "/your_log_location/instructables.txt"
Note: After the first run the script will upload 0 for all metrics, it doesn't know what the previous period metrics wore.
Downloads
Detailed Code Explanations
This step is optional, you can skip it if you're not curious about how it works.
First of all let's talk about the how the script works, and then we'll analyze the code bit by bit:
Below there is a brief explanation of what happens in the script:
read config for every instructable defined in the config get the current totals from the instructable website read the last totals from a file on disc if no file on disc create a file with current totals computes a difference between current totals and saved totals send the differences(statistics) to thingspeak using an api call update file with current totals
The current totals are available on the instructables website, but not with the initial page load. If you take a closer look sometimes the page loads with an older version of the views and favorites (also comments number is missing), and the numbers get updated quickly asynchronous by a script.
I've checked the page using Google Chrome network tab and saw a call to the following url: "https://www.instructables.com/json-api/getIbleStats?id=hasgKey". The haskey was different for every instructable. So i've looked up the haskey on the page source and found it in the page, then i've created a regex to capture the key.
It looks like this: "LogHit\(\'([A-Z0-9]*)\". The call to this url gives nice statistics in a json format like this: "{"views":3777,"favorites":46,"comments":0}" so easy to interpret and use.
The instructable parser object:
class InstructablesParser: STATS_KEY_REGEX = "LogHit\(\'([A-Z0-9]*)\'" STATS_URL = 'https://www.instructables.com/json-api/getIbleStats?id={0}' def parse_raw_metrics(self, url): page_html = requests.get(url) found_matches = re.findall(self.STATS_KEY_REGEX, page_html.content) instructables_stats_key = found_matches[0] stats_data = requests.get(self.STATS_URL.format(instructables_stats_key)) return json.loads(stats_data.content)</p>
After the class definitions, there are two strings defined in the class one holding the regex used to parse the statistics key, and other is the statistics url.
There is also a function "parse_raw_metrics(self, url)" that receives one parameter, the instructable url and returns the statistics data as a dictionary. To do that it first loads the instructable page into a string, and runs the regex expression over it, the regex contains a capture group where it will capture the key. With the key fetched, now a request is made to "STATS_URL" to retrive the statistics as a json. The piece of code: "self.STATS_URL.format(instructables_stats_key)" replaces {0} inside the "STATS_URL" with our found key.
After the retrieval a dictionary is returned using "json.loads(stats_data.content)"
Below we see the Statistics object:
class Statistics: FILE_NAME = 'history.txt' DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0} def get_since_last(self, id, latest_metrics): ... def __read_last_saved_data(self, id = None): ... def update_with_latest(self, id, latest_metrics): ...
it contains two strings, "FILE_NAME" is used to store the name of the file which will hold the last instructable statistics, and "DEFAULT_METRICS" will contain an empty dictionary with the the metric keys.
Our object has the following methods:
* get_since_last(self, id, latest_metrics), this one gets the computed metrics, given the tutorial id from the config and the raw metrics parsed using the "InstructablesParser" class explained above.
def get_since_last(self, id, latest_metrics): last_stats = self.__read_last_saved_data(id) if len(set(last_stats.items()) & set(self.DEFAULT_METRICS.items())) == 4: return last_stats stats = {} stats['views'] = latest_metrics['views'] - last_stats['views'] stats['comments'] = latest_metrics['comments'] - last_stats['comments'] stats['favorites'] = latest_metrics['favorites'] - last_stats['favorites'] stats['popularity'] = round((latest_metrics['favorites'] / float(latest_metrics['views'])) * 100, 3) return stats
- first thing we're reading the last saved statistics based on our "id" as a dictionary an storing in into "last_stats" variable
- the big if condition checks that the returned dictionary is
- views, comments and favorites are calculated as a difference between latest_metrics and last_stats corresponding dictionary keys
- popularity is calculated by dividing latest_metrics['favorites'] to latest_metrics['views'], the multiplication by 100 is done to make it a percent, and then "round" is applied to keep only 3 decimals
* __read_last_saved_data(self, id = None) this is used internally to retrive last saved metrics from the file
- the first thing is to check if the "history.txt" exists on disk, if not is created empty
- the file is opened for reading, and one line read into the variable "line" then closed
- if the file is empty and the method is called with an id, we return the default metrics dictionary, that is: DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0}
- if the file is empty and the method is called without an id we return an empty dictionary {}
- then the "line" is transformed into a dictionary wit the json.loads(line)
- if our method is called with an id and the id is present in the dictionary (we just transformed it into a dictionary) then the data[id] key is returned, that will be our metrics for the specified id
- if our method is called with an id and the id is not found, then default metrics are returned
- if our method is called without an id, all keys are returned
def __read_last_saved_data(self, id = None): if not os.path.isfile(self.FILE_NAME): open(self.FILE_NAME, 'w').close() file = open(self.FILE_NAME, 'r+') line = file.readline() file.close() if len(line) < 2: if id != None: return self.DEFAULT_METRICS else: return {} data = json.loads(line) if id != None: if id in data: return data[id] else: return self.DEFAULT_METRICS return data
* update_with_latest(self, id, latest_metrics) this updates the file used by the Statistics object with the latest metrics
- we open the history file for reading
- use __read_last_saved_data() to read all file contents
- close the file
- delete it's contents with: open(self.FILE_NAME, 'w').close()
- reopen it for writing
- replace the specified metric key with the specified metric
- encode the json as string and save it
- close the file
def update_with_latest(self, id, latest_metrics): file = open(self.FILE_NAME, 'r') data = self.__read_last_saved_data() file.close() open(self.FILE_NAME, 'w').close() file = open(self.FILE_NAME, 'w') data[id] = latest_metrics file.write(json.dumps(data))
The post_metrics function:
This will publish the metrics to your thingspeak account
- we define the headers to be sent into headers drtionary
- we define parameters as a dictionary, and then urlencode them along with the thingspeak write key
- we establish an httpconnection "api.thingspeak.com:80" and post the data
- if there wore no exceptions returns True else False
def post_metrics(key, metrics): headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"} params = urllib.urlencode( {'field1': metrics['views'], 'field2': metrics['favorites'], 'field3' : metrics['comments'], 'field4' :metrics['popularity'], 'key': key} ) conn = httplib.HTTPConnection("api.thingspeak.com:80") try: conn.request("POST", "/update", params, headers) response = conn.getresponse() response.read() conn.close() return True except: return False
The main body:
- here we instantiate our objects the instructable parser and statistics object
- we loop through our config and for every defined instructable we do the following:
1. get the raw_metrics using the InstructablesParser
2. compute the latest_metrics using Statistics.get_since_last(id, raw_metrics)
3. post the metrics to thingspeak using post_metrics(key, latest_metrics)
4. print a message if the request failed
5. sleep for 16 seconds (free accounts can make a request every 15 seconds), you can remove the sleep if you have a paid account
instrucatables_parser = InstructablesParser() statistics = Statistics() for crawl_item in config.crawl_list: raw_metrics = instrucatables_parser.parse_raw_metrics(crawl_item['url']) latest_metrics = statistics.get_since_last(crawl_item['id'], raw_metrics) print 'Latest metrics for instructable: " {0} " '.format(crawl_item['url']) print latest_metrics statistics.update_with_latest(crawl_item['id'], raw_metrics) status = post_metrics(crawl_item['write_api_key'], latest_metrics) if not status: print 'Could not get instructable {0} statistics'.format(crawl_item['url']) time.sleep(16)
This is the end of our script, if you think that something is unclear, pleas post a comment and i'll try to improve the explanation.