How to Chart Your Instructable Statistics

by danionescu in Circuits > Software

1279 Views, 19 Favorites, 0 Comments

How to Chart Your Instructable Statistics

logo.png

In this instructable we'll be talking about how you can generate graphs with your instructable statistics. We'll be tracking views, favorites, comments and popularity. I've defined popularity as a percent of favorites / views.

  I've searched for this subject on instructables, and found only one instructable which addresses this issue, but unfortunately the solution doesn't work now

  Instructables itself does not have any graphs to track your personal instructable so i've decided to build a python script to do just that for me and anyone interested. It works by collecting data for your instructable, and sending it over your thingspeak.com personal account (it's free). Thingspeak it's great for sending any kind IOT data and plot graphs.

Also i'll try to explain step by step how the script works, how it parses the data and sends it to thingspeak. So if you're interested about the programming behind it, Step 3 is for you.

The prerequisites are:

- one or more personal instructables to track

- a thingspeak.com account (you can create one free)

- a development board like RaspberryPi, C.H.i.P, or anything running linux, you can even set it up on your personal computer or a personal server where you have SSH access

- basic unix and programming knolledge, i'll try to explain everything step by step but basic understanding is required

Creating Thingspeak Account, and Channels

create-channel.png
Screenshot from 2017-02-08 21-55-25.png

In this step we'll be creating an thingspeak account and also one channel for every instructable we'll need to track.

ThingSpeak is an open source “Internet of Things” application and API to store and retrieve data from things using HTTP over the Internet or via a Local Area Network.

If you don't have an account visit the account create page. After submitting this simple register page we'll need to create a channel for every instructable that needs to be tracked.

A channel in thingspeak it's a place where similar data can live, in our case: views, favorites comments, and poplarity. We'll be using a channel to monitor all relevant parameters of an instructable. They provide an API to update channel data.

I've also attached a picture of how the channel create page should be filled.

The steps for creation of a channel with one instructable statistics are:

1. login using your username and password that you previously created

2. go to channels page and click "New Channel" button

3. enter a suggestive name for your instructable that is to be tracked like : "Instructables: Automated windows shades"

4. check field1, field2, field3 and field4 checkboxes

5. name field1 to "Views", field2 to "Favorites", field3 to "Comments" and field4 to "Popularity"

6. you can add a description if you like, and make the channel public (there is a checkbox on the page)

7. click "Save channel" button

You can repeat this step for all of your instructables that you want to track.

Server Environment Set Up Instructions and Configuration

The steps are :

1.checking dependencies and installing missing ones

2.downloading the project

3.configuring the project

4. add an entry in the crontab for the script and save logs to a file


1. checking dependencies and installing missing ones

- for this project we will need python 2.7.x

issue this command into a terminal to find out your version

python --version
# it will output  something like
Python 2.7.9

If your version doens't match then install it, for raspberryPi, ubuntu you can issue:

sudo apt-get install -y python2.7

Then we'll need PIP, it's python package manager

sudo apt-get install python-pip

2.downloading the project

Download the "instructablesStatistics.tar.gz" attached file, copy it to your final location

cd our_parent_folder_location
tar -xvzf instructablesStatistics.tar.gz 
#this will extract the archive in a folder called instructablesStatistics
cd instructablesStatistics
sudo pip install requests==2.10.0
#the last line will install request library<br>
import urllib, httplib, requests, re, json, time
import os.path
import config

class InstructablesParser:
    STATS_KEY_REGEX = "LogHit\(\'([A-Z0-9]*)\'"
    STATS_URL = 'https://www.instructables.com/json-api/getIbleStats?id={0}'

    def parse_raw_metrics(self, url):
        page_html = requests.get(url)
        found_matches = re.findall(self.STATS_KEY_REGEX, page_html.content)
        instructables_stats_key = found_matches[0]
        stats_data = requests.get(self.STATS_URL.format(instructables_stats_key))

        return json.loads(stats_data.content)

class Statistics:
    FILE_NAME = 'history.txt'
    DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0}

    def get_since_last(self, id, latest_metrics):
        last_stats = self.__read_last_saved_data(id)
        if len(set(last_stats.items()) & set(self.DEFAULT_METRICS.items())) == 4:
            return last_stats
        stats = {}
        stats['views'] = latest_metrics['views'] - last_stats['views']
        stats['comments'] = latest_metrics['comments'] - last_stats['comments']
        stats['favorites'] = latest_metrics['favorites'] - last_stats['favorites']
        stats['popularity'] = round((latest_metrics['favorites'] / float(latest_metrics['views'])) * 100, 3)
        
        return stats

    def __read_last_saved_data(self, id = None):
        if not os.path.isfile(self.FILE_NAME):
            open(self.FILE_NAME, 'w').close()
        file = open(self.FILE_NAME, 'r+')
        line = file.readline()
        file.close()
        if len(line) < 2:
            if id != None:
               return self.DEFAULT_METRICS
            else:
                return {}
        data = json.loads(line)
        if id != None:
            if id in data:
                return data[id]
            else:
                return self.DEFAULT_METRICS

        return data

    def update_with_latest(self, id, latest_metrics):
        file = open(self.FILE_NAME, 'r')
        data = self.__read_last_saved_data()
        file.close()
        open(self.FILE_NAME, 'w').close()
        file = open(self.FILE_NAME, 'w')
        data[id] = latest_metrics
        file.write(json.dumps(data))
        file.close()</p><p>def post_metrics(key, metrics):
    headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
    params = urllib.urlencode(
        {'field1': metrics['views'], 'field2': metrics['favorites'],
         'field3' : metrics['comments'], 'field4' :metrics['popularity'], 'key': key}
    )
    conn = httplib.HTTPConnection("api.thingspeak.com:80")
    try:
        conn.request("POST", "/update", params, headers)
        response = conn.getresponse()
        response.read()
        conn.close()
        return True
    except:
        return False

instrucatables_parser = InstructablesParser()
statistics = Statistics()

for crawl_item in config.crawl_list:
    raw_metrics = instrucatables_parser.parse_raw_metrics(crawl_item['url'])
    latest_metrics = statistics.get_since_last(crawl_item['id'], raw_metrics)
    print 'Latest metrics for instructable: " {0} " '.format(crawl_item['url'])
    print latest_metrics
    statistics.update_with_latest(crawl_item['id'], raw_metrics)
    status = post_metrics(crawl_item['write_api_key'], latest_metrics)
    if not status:
        print 'Could not get instructable {0} statistics'.format(crawl_item['url'])
    time.sleep(16)

3.configuring the project

For the configuration you need to know your full instructable urls, and a write API key for each of them.

in the config.py i've created a demo configuration file with two of my instructables:

crawl_list = [    
   {
        'url' : 'https://www.instructables.com/id/Simple-and-Cheap-Phone-Controlled-Fireworks-Ignite/',
        'id' : '223292',
        'write_api_key': 'some_key'
    },
    {
        'url' : 'https://www.instructables.com/id/Automated-Windows-Shades/',
        'id' : '223001',
        'write_api_key': 'some_other_key'
    },
]

So for each instructable you need to parse, you have to specify the instructable url, the instructable id (can be anything unique), and the "write api key".

To locate an api key do the following:

- go to https://thingspeak.com/channels

- click on your channel name (we created that earlier) ex in my case: "Instructables: Automated windows shades"

- go to "Api keys" tab

- copy the "Write API Key" and paste it into your config

4. add an entry in the crontab for the script and save logs to a file

# log in to your server
# the following command will open crontab in your favorite editor
# note a small menu might ask you what editor to use
crontab -e -u your_user

# the following lines will create an empty log file, and set rights for it
echo "" > /your_log_location/instructables.txt
chmod 776 /your_log_location/instructables.txt

Add this to your crontab:

0 13 * * *  python /our_parent_folder_location/instructablesStatistics/server.py >> /your_log_location/instructables.txt

Then exit crontab, the schedule will be set.

The last line will use python to run our script at 13:00 every day, and save the log to "/your_log_location/instructables.txt"

Note: After the first run the script will upload 0 for all metrics, it doesn't know what the previous period metrics wore.

Detailed Code Explanations

This step is optional, you can skip it if you're not curious about how it works.

First of all let's talk about the how the script works, and then we'll analyze the code bit by bit:

Below there is a brief explanation of what happens in the script:

read config
for every instructable defined in the config
	get the current totals from the instructable website
	read the last totals from a file on disc
	if no file on disc
		create a file with current totals
	computes a difference between current totals and saved totals
	send the differences(statistics) to thingspeak using an api call
	update file with current totals

The current totals are available on the instructables website, but not with the initial page load. If you take a closer look sometimes the page loads with an older version of the views and favorites (also comments number is missing), and the numbers get updated quickly asynchronous by a script.

I've checked the page using Google Chrome network tab and saw a call to the following url: "https://www.instructables.com/json-api/getIbleStats?id=hasgKey". The haskey was different for every instructable. So i've looked up the haskey on the page source and found it in the page, then i've created a regex to capture the key.

It looks like this: "LogHit\(\'([A-Z0-9]*)\". The call to this url gives nice statistics in a json format like this: "{"views":3777,"favorites":46,"comments":0}" so easy to interpret and use.

The instructable parser object:

class InstructablesParser:
    STATS_KEY_REGEX = "LogHit\(\'([A-Z0-9]*)\'"
    STATS_URL = 'https://www.instructables.com/json-api/getIbleStats?id={0}'

    def parse_raw_metrics(self, url):
        page_html = requests.get(url)
        found_matches = re.findall(self.STATS_KEY_REGEX, page_html.content)
        instructables_stats_key = found_matches[0]
        stats_data = requests.get(self.STATS_URL.format(instructables_stats_key))

        return json.loads(stats_data.content)</p>

After the class definitions, there are two strings defined in the class one holding the regex used to parse the statistics key, and other is the statistics url.

There is also a function "parse_raw_metrics(self, url)" that receives one parameter, the instructable url and returns the statistics data as a dictionary. To do that it first loads the instructable page into a string, and runs the regex expression over it, the regex contains a capture group where it will capture the key. With the key fetched, now a request is made to "STATS_URL" to retrive the statistics as a json. The piece of code: "self.STATS_URL.format(instructables_stats_key)" replaces {0} inside the "STATS_URL" with our found key.

After the retrieval a dictionary is returned using "json.loads(stats_data.content)"

Below we see the Statistics object:

class Statistics:
    FILE_NAME = 'history.txt'
    DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0}

    def get_since_last(self, id, latest_metrics):
	...
    def __read_last_saved_data(self, id = None):
	...
    def update_with_latest(self, id, latest_metrics):
	...

it contains two strings, "FILE_NAME" is used to store the name of the file which will hold the last instructable statistics, and "DEFAULT_METRICS" will contain an empty dictionary with the the metric keys.

Our object has the following methods:

* get_since_last(self, id, latest_metrics), this one gets the computed metrics, given the tutorial id from the config and the raw metrics parsed using the "InstructablesParser" class explained above.

def get_since_last(self, id, latest_metrics):
    last_stats = self.__read_last_saved_data(id)
    if len(set(last_stats.items()) & set(self.DEFAULT_METRICS.items())) == 4:
        return last_stats
    stats = {}
    stats['views'] = latest_metrics['views'] - last_stats['views']
    stats['comments'] = latest_metrics['comments'] - last_stats['comments']
    stats['favorites'] = latest_metrics['favorites'] - last_stats['favorites']
    stats['popularity'] = round((latest_metrics['favorites'] / float(latest_metrics['views'])) * 100, 3)

    return stats

- first thing we're reading the last saved statistics based on our "id" as a dictionary an storing in into "last_stats" variable

- the big if condition checks that the returned dictionary is

- views, comments and favorites are calculated as a difference between latest_metrics and last_stats corresponding dictionary keys

- popularity is calculated by dividing latest_metrics['favorites'] to latest_metrics['views'], the multiplication by 100 is done to make it a percent, and then "round" is applied to keep only 3 decimals

* __read_last_saved_data(self, id = None) this is used internally to retrive last saved metrics from the file

- the first thing is to check if the "history.txt" exists on disk, if not is created empty

- the file is opened for reading, and one line read into the variable "line" then closed

- if the file is empty and the method is called with an id, we return the default metrics dictionary, that is: DEFAULT_METRICS = {'views' : 0, 'favorites' : 0, 'comments' : 0, 'popularity' : 0}

- if the file is empty and the method is called without an id we return an empty dictionary {}

- then the "line" is transformed into a dictionary wit the json.loads(line)

- if our method is called with an id and the id is present in the dictionary (we just transformed it into a dictionary) then the data[id] key is returned, that will be our metrics for the specified id

- if our method is called with an id and the id is not found, then default metrics are returned

- if our method is called without an id, all keys are returned

def __read_last_saved_data(self, id = None):
    if not os.path.isfile(self.FILE_NAME):
        open(self.FILE_NAME, 'w').close()
    file = open(self.FILE_NAME, 'r+')
    line = file.readline()
    file.close()
    if len(line) < 2:
        if id != None:
           return self.DEFAULT_METRICS
        else:
            return {}

    data = json.loads(line)
    if id != None:
        if id in data:
            return data[id]
        else:
            return self.DEFAULT_METRICS

    return data


* update_with_latest(self, id, latest_metrics) this updates the file used by the Statistics object with the latest metrics

- we open the history file for reading

- use __read_last_saved_data() to read all file contents

- close the file

- delete it's contents with: open(self.FILE_NAME, 'w').close()

- reopen it for writing

- replace the specified metric key with the specified metric

- encode the json as string and save it

- close the file

def update_with_latest(self, id, latest_metrics):
        file = open(self.FILE_NAME, 'r')
        data = self.__read_last_saved_data()
        file.close()
        open(self.FILE_NAME, 'w').close()
        file = open(self.FILE_NAME, 'w')
        data[id] = latest_metrics
        file.write(json.dumps(data))

The post_metrics function:

This will publish the metrics to your thingspeak account

- we define the headers to be sent into headers drtionary

- we define parameters as a dictionary, and then urlencode them along with the thingspeak write key

- we establish an httpconnection "api.thingspeak.com:80" and post the data

- if there wore no exceptions returns True else False

def post_metrics(key, metrics):
    headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
    params = urllib.urlencode(
        {'field1': metrics['views'], 'field2': metrics['favorites'],
         'field3' : metrics['comments'], 'field4' :metrics['popularity'], 'key': key}
    )
    conn = httplib.HTTPConnection("api.thingspeak.com:80")
    try:
        conn.request("POST", "/update", params, headers)
        response = conn.getresponse()
        response.read()
        conn.close()
        return True
    except:
        return False

The main body:

- here we instantiate our objects the instructable parser and statistics object

- we loop through our config and for every defined instructable we do the following:

1. get the raw_metrics using the InstructablesParser

2. compute the latest_metrics using Statistics.get_since_last(id, raw_metrics)

3. post the metrics to thingspeak using post_metrics(key, latest_metrics)

4. print a message if the request failed

5. sleep for 16 seconds (free accounts can make a request every 15 seconds), you can remove the sleep if you have a paid account

instrucatables_parser = InstructablesParser()
statistics = Statistics()

for crawl_item in config.crawl_list:
    raw_metrics = instrucatables_parser.parse_raw_metrics(crawl_item['url'])
    latest_metrics = statistics.get_since_last(crawl_item['id'], raw_metrics)
    print 'Latest metrics for instructable: " {0} " '.format(crawl_item['url'])
    print latest_metrics
    statistics.update_with_latest(crawl_item['id'], raw_metrics)
    status = post_metrics(crawl_item['write_api_key'], latest_metrics)
    if not status:
        print 'Could not get instructable {0} statistics'.format(crawl_item['url'])
    time.sleep(16)

This is the end of our script, if you think that something is unclear, pleas post a comment and i'll try to improve the explanation.