DIY AI Study Focus Assistant

Ever curious about how much time you're really spending studying, or how often you're getting distracted? That's why I built this project: an intelligent prototype that detects if you're focused, distracted, or not at your desk. It only records your actual study time, so it's easy to track and improve your habits. I built it using a Raspberry Pi, a camera, and some clever coding, so you can keep an eye on your progress right from your laptop. Perfect for anyone wanting to make the most out of their study time!

Supplies

For this project I used the following supplies, some of them are essential others are optional according to your needs:

Raspberry PI 5 kit
USB Camera
LCD Display
Freenove Projects Board (Optional)
Button
Cables and resistor
RGB Led (optional)
Passive Buzzer (Optional)

I did not use one, but you might need to use a bread board if you do not have a projects board.

Freenove Projects Kit for Raspberry Pi 5 4 B 3 B+ 400 Zero 2 W

The project board costs 60€ and comes with everything you need except the extra button you will need to buy and the resistor and cables.

Raspberry Pi 5 - 8Gb - Starter Pack - wit

The raspberry pi 5 kit costs 113,18€

A suitable USB camera costs between 15€ to 30€, on amazon or local electronics store like MediaMarkt

A kit with multiple bundles costs around 10€

Gathering Data

To begin with, you have to obtain many pictures of people in different situations, bent over studying, spaced out, or away from the desk are or standing up. There are several ways of doing it: you can either upload pictures from open datasets on the internet or just film yourself or some friends and then split the video into separate frames.

If you are going to do the video route, you can simply use some basic Python to get frames from your videos. That will give you a collection of images that you can then use for your model. Make sure the images are clear and that they show the individual in the correct state, whether they're focused, distracted, or not at their desk.

You should also include some edge cases, for example when someone's glancing at a book over on the side but isn't necessarily distracted. Those examples enable the model to figure out the distinction between actual distractions and someone simply looking away for a second.

Annotating the Data

Once you have all your images, the second thing to do is to organize them into folders. I just made three separate folders, one for each class, so one for "distracted", one for "studying", and one for "away". That's called folder structure annotation, and that's a common way to annotate data for classification models.

When you've completed sorting the images, you can upload the whole dataset folder into Roboflow. It will then automatically identify what class a picture is in according to the labels of the folders.

Alternatively you can also upload the images to Roboflow and manually attribute the labels to the images yourself.

After uploading and annotating your data, you’ll want to create a new version of your dataset. Here, you get to pick which augmentations to use. I kept it pretty simple for my model:

Flip: Horizontal
Rotation: Between -15° and +15°
Hue: Between -20° and +20°
Brightness: Between -25% and +25%
Blur: Up to 2.5px

When you’re happy with your settings, you can create the version and then download your dataset. You have two options: either download a zip file to your computer, or copy the download code and use it in your training script to download the data directly in the right directory. I recommend using the download code, since it’s a bit quicker and you don’t have to worry about moving files around manually.

This way, your data is ready to use for training your model right away.

Training the Model

Once you’ve downloaded your Roboflow dataset or copied the download code, with the already preprocessed and augmented images, you’ll need to train your model. For my project, I trained everything on Google Colab, which made it easy to use a powerful GPU and keep everything running smoothly. The only setback is that Colab runs on credits which cost 10 euros per 100 credits. With 100 credits you can train your model 2 times or more on the A100 GPU depending on your code.

I decided to use a TinyViT model for this project. It’s a vision transformer that’s small but really accurate, and it comes pretrained on ImageNet, so you don't need a big dataset. To make the training process easier and more organized, I used a Lightning module from PyTorch. This helped simplify the code and made the training pipeline better.

To squeeze out even more accuracy, I used Optuna for hyperparameter tuning. This let me automatically test different settings for things like learning rate, batch size, and optimizer, so I didn’t have to guess what would work best. After all the trials, Optuna gave me the best combination of parameters, which I then used for my final model.

You can definitely try other models like ResNet or EfficientNet if you want, but after comparing a bunch of them, I found that TinyViT was not only the most accurate for my task but also small enough to run on a Raspberry Pi without any trouble. That made it perfect for my project, where I needed good performance but also wanted to keep things lightweight for edge deployment

Downloads

model_training.txt

Raspberry Pi

Raspberry Pi Setup

To make your project work, you’ll need at least two essential components connected to your Raspberry Pi: an LCD screen and a button. Of course, you’ll also need a USB camera for video input. While these are the minimum requirements, you can enhance your setup with additional components like an RGB LED and a buzzer, both of which I included in my prototype, though they’re optional for basic functionality.

Buzzer Types:

Passive buzzer: Can play different tones and melodies since you control the frequency.

Active buzzer: Only produces a single tone when powered and can’t play melodies.

For my prototype, I used the Freenove Project Board for Raspberry Pi. This board simplifies wiring by providing labeled GPIO pins and convenient breakout connectors for components like the LCD and buttons. If you’re not using a project board, you’ll need to connect your components directly to the Raspberry Pi’s GPIO pins, either using a breadboard or by wiring them straight to the Pi.

Software and Connectivity

For controlling the hardware, you can use either the lgpio or RPi.GPIO Python libraries. My prototype uses RPi.GPIO, but lgpio is a good and more modern alternative, especially for newer Raspberry Pi models.

I also added a Bluetooth Low Energy (BLE) connection from the Raspberry Pi to my laptop. This allows for wireless communication and monitoring. Which I used to send data from the raspberry pi to the laptop to store in the local database, for use in metrics, and to control the raspberry pi system from the Gradio interface in the laptop. Additionally, I used Flask to stream the live camera feed from the Raspberry Pi to my laptop, making it easy to view the video in the Gradio interface.

Project Board:

The Freenove Project Board streamlines the connection of multiple components and is compatible with most Raspberry Pi models

Freenove Projects Kit for Raspberry Pi 5 4 B 3 B+ 400 Zero 2 W

Code Libraries:

Use RPi.GPIO or lgpio for hardware control

Connectivity:

BLE for wireless communication

Flask for live video streaming to a laptop or other device

Interface

For my prototype, I built a Gradio interface in Python to display user metrics and to control the Raspberry Pi straight from my laptop. I went with Gradio because it makes it easy to create interactive dashboards, and I customized the look with some CSS to match my project’s vibe.

I also set up FastAPI to connect the interface to a database. That way, all the predictions and values coming from the Raspberry Pi get stored and can be used to show different types of metrics in the dashboard. Everything’s running locally for now—so no cloud or external servers involved.

The interface is split into a few different sections:

Last session metrics: Shows how things went during the previous study session.
Today’s metrics: Gives a quick overview of today’s activity.
Weekly metrics: Lets you see trends over the past week.
Global metrics: Shows overall stats since the project started.
Real-time metrics: Updates live while the timer is running, so you can see your progress as it happens.

For the charts and graphs, I used Plotly, which makes it simple to create nice-looking, interactive visualizations right inside the Gradio app. All in all, the setup lets me monitor and control everything from my laptop, and it’s all running smoothly on my local network

Maker Part

In the maker section of this project, you must create a model with all the components, such as the camera, Raspberry Pi, and project board. There are no restrictions in this section: you can create your model any way you'd like. You may decide to laser cut a plain box, 3D print a custom case, or even create your own prototype using items at home. Be sure your model protects all the components.

I created a simple box out of 4 mm multiplex wood with a laser cutter. I also put a little notice on the box for users to maintain the prototype at least 20 cm away from them. This allows the camera to zoom in on it and have a clear view of it.

In order to create a box similar to mine using a laser cutter, you can go to a website called makercase.com. Here, you can personalize your box by selecting the suitable size and material thickness you want. Once you design your box, save the file and open it in Inkscape or any other program in order to further edit and customise it.

Note: Red lines are for cutting and blue lines are for engraving. The lines should be 0.025 thick.

Once you assemble your physical model and insert all the pieces, you will want to secure them in place. I simply used electrical tape and wood glue to hold some of my pieces to the box, but you can use screws or whatever you have available if you prefer.

Downloads

box_multiplex_4mm.svg