AI-Powered Hand Sign Recognition System
by Wiktoria Przykucka in Circuits > Raspberry Pi
32 Views, 0 Favorites, 0 Comments
AI-Powered Hand Sign Recognition System



This project is an AI-powered system that recognizes ASL (American Sign Language) hand signs in real time using a USB webcam and a Raspberry Pi.
It uses MediaPipe to extract 21 hand keypoints, which are then classified by a trained deep learning model (MLP) into letters A–Z. The result is displayed on a 7” touchscreen, with RGB LEDs indicating system status and a push button to confirm the selected letter. The setup is enclosed in a 3D-printed frame with a wooden base for stability and aesthetics.
Everything runs locally on the Raspberry Pi, making it a fully standalone, interactive ASL recognition device.
I created this project to explore how AI and physical computing can support real-time communication. ASL is a visual language that works perfectly with machine learning, and I was curious whether a Raspberry Pi combined with a webcam would be enough to build a working recognition system. It was also a fun way to connect artificial intelligence with accessibility and education.
You can explore the full source code, model, and scripts here:
Supplies

Core Components
Display & Vision
Input & Output
- RGB Indicator LEDs – 12mm Mounting (GUUZI)
- Momentary Push Button (GUUZI)
- Resistor Kit – 1/4W Metal Film
- (used: 220Ω for LEDs, 470Ω for button)
- Jumper Wires / GPIO Connectors
Mounting & Structure
- 3D Printed Custom Frame – designed to house the Pi, screen, camera, button, and LEDs
- Wooden Base – 210 mm wide for stability and vibration damping
Full cost: 247,25 euro + 3D printer frame
Overview & Understand the Logic

Before diving into the details of the build, let’s take a quick look at how the whole system works together. The diagram below shows the complete flow of data and hardware interactions.
Here’s a step-by-step breakdown:
- Camera (detecting hand)
- A USB webcam connected to the Raspberry Pi captures the hand in real time. The image is processed using MediaPipe, which extracts 21 keypoints representing the hand pose.
- Raspberry Pi 5
- Acts as the central hub. It:
- Collects the keypoints from the camera
- Sends them to the laptop over a socket connection
- Receives back the predicted ASL letter
- Controls the LED indicators, push button, and display output
- Laptop (AI Processing)
- The laptop runs the trained MLP classification model. It:
- Accepts hand keypoints from the Pi
- Classifies them into a corresponding ASL letter (A–Z)
- Sends the prediction back to the Pi
- RGB LEDs
- Red LED → idle state
- Yellow LED → countdown/processing
- Green LED → prediction confirmed
- Push Button (Switch)
- Lets the user confirm a prediction or reset the process for the next letter.
- Touchscreen (Screen Output)
- Displays the live camera feed, system feedback (e.g., “Too Close”, “Letter: A”), and the sequence of predicted letters.
- Power Supply
- Powers both the Raspberry Pi and the laptop, ensuring a stable, continuous system runtime.
Collecting Data

To collect the dataset, I wrote a Python script using OpenCV and MediaPipe that captures real-time video from a USB webcam and extracts 21 hand landmarks as (x, y) coordinates using the MediaPipe Hand Tracking model.
Each detected hand is saved as a row in a .csv file, containing:
- 42 features → x0–x20, y0–y20
- A label → the ASL letter being shown (A–Z)
The labeling is fully manual and interactive:
- Press a letter key (A–Z) to set the label
- Press c to toggle data collection
- Press q to quit the script
Using this method, I collected over 120,000 samples across all letters, creating a diverse and robust dataset directly from live camera input. This data was later used to train the AI classification models.
Downloads
Training Model

To train the AI model, I used a dataset of hand keypoints collected using my custom Python script with OpenCV and MediaPipe. The script captured the (x, y) coordinates of 21 hand landmarks for all 26 ASL letters, with each sample labeled manually during collection.
I experimented with two different models:
- RandomForestClassifier
- MLPClassifier (neural network)
Both models were trained on 80% of the dataset and evaluated on the remaining 20%. The MLP model, built using a scikit-learn pipeline with StandardScaler and a two-layer architecture (100 and 50 neurons), achieved excellent performance with an accuracy close to 1.0.
Although such high accuracy raised concerns about potential overfitting, live testing confirmed that the model remained fast, responsive, and accurate under real-time conditions. A confusion matrix was also used to verify prediction consistency across all letters.
Finally, the trained models were saved using joblib, making it easy to deploy and run predictions directly on the Raspberry Pi.
Live Demo (testing)

For live testing, I developed a Python script that uses a webcam to capture real-time video and predict ASL letters based on hand keypoints.
The script loads the previously trained MLP model and uses MediaPipe to detect and extract 21 hand landmarks (x, y coordinates). It then calculates the bounding box and surface area of the detected hand to check whether the hand is too close, too far, or in an ideal position for prediction.
If the hand is correctly positioned, the model predicts the corresponding ASL letter and displays it directly on the screen. The system also provides real-time visual feedback, including:
- Bounding boxes around the hand
- Text messages like “Go back”, “Come closer”, or the predicted letter (e.g., “Letter: A”)
- Helpful cues for adjusting hand distance and position
This setup ensures an intuitive and user-friendly experience. The testing phase demonstrated that the model delivers fast and accurate predictions, confirming its effectiveness in real-world conditions.
Adding All Components for Raspi




To enhance interactivity and provide clear system feedback, I connected RGB LEDs, a push button, and a 7-inch touchscreen to the Raspberry Pi.
Each LED indicates a specific system state:
- Red – idle
- Yellow – countdown or waiting
- Green – letter confirmed
The LEDs are wired to GPIO pins through 220Ω resistors and are controlled using the RPi.GPIO Python library, synchronized with the recognition logic.
The momentary push button is connected via a 470Ω resistor and configured with an internal pull-down resistor in code. It allows the user to confirm a detected letter and trigger a new recognition cycle, giving control over input selection and improving usability.
The 7-inch Freenove touchscreen is connected via HDMI for display and USB for touch input. It serves as the main visual interface, showing the camera feed, predicted letter, status messages like “Too Close” or “Letter: A”, and the full constructed word.
The entire system runs directly on the Raspberry Pi, forming a compact, interactive, and fully self-contained real-time ASL recognition device.
Live Demo Raspi

The Raspberry Pi acts as the client, capturing hand keypoints in real-time using MediaPipe. These 21 keypoints (x, y) are serialized into a list of 42 float values and sent over a direct Ethernet connection to the laptop.
On the other end, the laptop runs the server, which receives the keypoints, classifies them using a pre-trained MLP model, and sends the predicted ASL letter back to the Pi. This setup offloads the computationally heavier model inference to the laptop, keeping the Raspberry Pi responsive and lightweight.
To ensure smooth communication, I used a fixed IP address and port and implemented a simple message protocol to keep data flow synchronized and real-time. This architecture gave me better performance while maintaining a clean separation between data collection (on Pi) and prediction (on laptop).
Case



To make the project stable, compact, and visually appealing, I designed and 3D printed a custom case that holds all key components: the Raspberry Pi, 7-inch touchscreen, USB camera, RGB LEDs, and a push button.
The case was created in a 3D modeling program, with careful measurements to ensure a perfect fit. I included 12 mm holes in the front panel for mounting the LEDs, button, and webcam module, so that everything could be easily installed and accessible. The rear part of the case has space and cutouts for the Raspberry Pi, allowing full access to ports and proper cable routing.
I printed the case using a dual-color filament in a blue and pink gradient, simply because I liked the aesthetic and wanted the device to reflect my personal style. Of course, the case can be printed in any color or material to match different preferences.
The final result is a solid, ergonomic design that neatly houses all electronics in one clean and portable unit.
Final Result
Common Issues

Tips for Better Model Accuracy
While the model works well in basic conditions, here are a few essential tips that can help you improve accuracy, avoid overfitting, and make the system more robust for real-life use.
Use More Than One Hand During Training
One of the most common problems in hand gesture recognition is overfitting to your own hand. If the dataset contains only your hand gestures, the model will likely learn the exact proportions, motion, and finger positions that are unique to you. As a result, it might perform very poorly when someone else uses the system.
How to fix it:
Involve as many people as you can in the data collection process. Ask your friends, classmates, or family members to perform each ASL letter. Even small differences in hand size, finger shape, movement speed, or gesture style add important variability to your dataset. This makes the model more general and helps it adapt to hands it has never seen before.
You don’t need thousands of samples from each person—even 100–200 well-labeled examples per person per letter can dramatically improve model performance.
Additional Tips for Improving the Model
- Ensure balanced data
- Try to collect a similar number of samples for each letter. Otherwise, the model may become biased toward more common classes.
- Collect in different lighting conditions
- Bright rooms, shadows, and backlit environments will affect detection. Training with multiple conditions improves robustness.
- Be consistent in gesture shape
- When recording data, make sure each letter follows the ASL standard. Consistency helps the model focus on important visual cues.
- Visualize and clean your dataset
- Occasionally inspect the .csv file and remove any corrupted or misclassified entries. Clean data leads to better models.
- Try different models
- While MLP works well, you can also experiment with other classifiers like Random Forest, SVM, or even convolutional networks if using images.
Congratulations!

If you’ve made it this far — congrats!
You’ve just built your own AI-powered ASL hand sign recognition system from scratch. That’s no small achievement!
Along the way, you’ve combined:
- Computer vision (MediaPipe),
- Machine learning (MLP model),
- Hardware integration (Raspberry Pi, touchscreen, LEDs, button),
- And a custom-designed, 3D-printed physical setup.
You’ve also created a real-time, interactive, and fully standalone system that can help support communication, learning, and accessibility. That’s awesome.
If you build your own version, remix it, or improve it — I’d love to see it.
Leave a comment or share a photo!