AI-Powered Hand Sign Recognition System

by Wiktoria Przykucka in Circuits > Raspberry Pi

99 Views, 0 Favorites, 0 Comments

AI-Powered Hand Sign Recognition System

WhatsApp Image 2025-06-19 at 16.08.19_1fd6b170.jpg

WhatsApp Image 2025-06-15 at 22.41.10_6a42ab78.jpg

ChatGPT Image Jun 19, 2025, 10_58_09 AM.png

This project is an AI-powered system that recognizes ASL (American Sign Language) hand signs in real time using a USB webcam and a Raspberry Pi.

It uses MediaPipe to extract 21 hand keypoints, which are then classified by a trained deep learning model (MLP) into letters A–Z. The result is displayed on a 7” touchscreen, with RGB LEDs indicating system status and a push button to confirm the selected letter. The setup is enclosed in a 3D-printed frame with a wooden base for stability and aesthetics.

Everything runs locally on the Raspberry Pi, making it a fully standalone, interactive ASL recognition device.

I created this project to explore how AI and physical computing can support real-time communication. ASL is a visual language that works perfectly with machine learning, and I was curious whether a Raspberry Pi combined with a webcam would be enough to build a working recognition system. It was also a fun way to connect artificial intelligence with accessibility and education.

You can explore the full source code, model, and scripts here:

🔗 GitHub Repository

Supplies

Core Components

Display & Vision

Input & Output

RGB Indicator LEDs – 12mm Mounting (GUUZI)
Momentary Push Button (GUUZI)
Resistor Kit – 1/4W Metal Film
(used: 220Ω for LEDs, 470Ω for button)
Jumper Wires / GPIO Connectors

Mounting & Structure

3D Printed Custom Frame – designed to house the Pi, screen, camera, button, and LEDs

Full cost: 247,25 euro + 3D printer frame

Overview & Understand the Logic

Before diving into the details of the build, let’s take a quick look at how the whole system works together. The diagram below shows the complete flow of data and hardware interactions.

Here’s a step-by-step breakdown:

Camera (detecting hand)
A USB webcam connected to the Raspberry Pi captures the hand in real time. The image is processed using MediaPipe, which extracts 21 keypoints representing the hand pose.
Raspberry Pi 5
Acts as the central hub. It:
Collects the keypoints from the camera
Sends them to the laptop over a socket connection
Receives back the predicted ASL letter
Controls the LED indicators, push button, and display output
Laptop (AI Processing)
The laptop runs the trained MLP classification model. It:
Accepts hand keypoints from the Pi
Classifies them into a corresponding ASL letter (A–Z)
Sends the prediction back to the Pi
RGB LEDs
Red LED → idle state
Yellow LED → countdown/processing
Green LED → prediction confirmed
Push Button (Switch)
Lets the user confirm a prediction or reset the process for the next letter.
Touchscreen (Screen Output)
Displays the live camera feed, system feedback (e.g., “Too Close”, “Letter: A”), and the sequence of predicted letters.
Power Supply
Powers both the Raspberry Pi and the laptop, ensuring a stable, continuous system runtime.

Collecting Data

To collect the dataset, I wrote a Python script using OpenCV and MediaPipe that captures real-time video from a USB webcam and extracts 21 hand landmarks as (x, y) coordinates using the MediaPipe Hand Tracking model.

Each detected hand is saved as a row in a .csv file, containing:

42 features → x0–x20, y0–y20
A label → the ASL letter being shown (A–Z)

The labeling is fully manual and interactive:

Press a letter key (A–Z) to set the label
Press c to toggle data collection
Press q to quit the script

Using this method, I collected over 120,000 samples across all letters, creating a diverse and robust dataset directly from live camera input. This data was later used to train the AI classification models.

Downloads

collecting_data.pdf

Training Model

ChatGPT Image Jun 19, 2025, 10_52_38 AM.png

To train the AI model, I used a dataset of hand keypoints collected using my custom Python script with OpenCV and MediaPipe. The script captured the (x, y) coordinates of 21 hand landmarks for all 26 ASL letters, with each sample labeled manually during collection.

I experimented with two different models:

RandomForestClassifier
MLPClassifier (neural network)

Both models were trained on 80% of the dataset and evaluated on the remaining 20%. The MLP model, built using a scikit-learn pipeline with StandardScaler and a two-layer architecture (100 and 50 neurons), achieved excellent performance with an accuracy close to 1.0.

Although such high accuracy raised concerns about potential overfitting, live testing confirmed that the model remained fast, responsive, and accurate under real-time conditions. A confusion matrix was also used to verify prediction consistency across all letters.

Finally, the trained models were saved using joblib, making it easy to deploy and run predictions directly on the Raspberry Pi.

Live Demo (testing)

For live testing, I developed a Python script that uses a webcam to capture real-time video and predict ASL letters based on hand keypoints.

The script loads the previously trained MLP model and uses MediaPipe to detect and extract 21 hand landmarks (x, y coordinates). It then calculates the bounding box and surface area of the detected hand to check whether the hand is too close, too far, or in an ideal position for prediction.

If the hand is correctly positioned, the model predicts the corresponding ASL letter and displays it directly on the screen. The system also provides real-time visual feedback, including:

Bounding boxes around the hand
Text messages like “Go back”, “Come closer”, or the predicted letter (e.g., “Letter: A”)
Helpful cues for adjusting hand distance and position

This setup ensures an intuitive and user-friendly experience. The testing phase demonstrated that the model delivers fast and accurate predictions, confirming its effectiveness in real-world conditions.

Adding All Components for Raspi

WhatsApp Image 2025-06-15 at 22.41.10_17088ba4.jpg

To enhance interactivity and provide clear system feedback, I connected RGB LEDs, a push button, and a 7-inch touchscreen to the Raspberry Pi.

Each LED indicates a specific system state:

Red – idle
Yellow – countdown or waiting
Green – letter confirmed

The LEDs are wired to GPIO pins through 220Ω resistors and are controlled using the RPi.GPIO Python library, synchronized with the recognition logic.

The momentary push button is connected via a 470Ω resistor and configured with an internal pull-down resistor in code. It allows the user to confirm a detected letter and trigger a new recognition cycle, giving control over input selection and improving usability.

The 7-inch Freenove touchscreen is connected via HDMI for display and USB for touch input. It serves as the main visual interface, showing the camera feed, predicted letter, status messages like “Too Close” or “Letter: A”, and the full constructed word.

The entire system runs directly on the Raspberry Pi, forming a compact, interactive, and fully self-contained real-time ASL recognition device.

Live Demo Raspi

The Raspberry Pi acts as the client, capturing hand keypoints in real-time using MediaPipe. These 21 keypoints (x, y) are serialized into a list of 42 float values and sent over a direct Ethernet connection to the laptop.

On the other end, the laptop runs the server, which receives the keypoints, classifies them using a pre-trained MLP model, and sends the predicted ASL letter back to the Pi. This setup offloads the computationally heavier model inference to the laptop, keeping the Raspberry Pi responsive and lightweight.

To ensure smooth communication, I used a fixed IP address and port and implemented a simple message protocol to keep data flow synchronized and real-time. This architecture gave me better performance while maintaining a clean separation between data collection (on Pi) and prediction (on laptop).

Case

WhatsApp Image 2025-06-19 at 16.08.15_d576c166.jpg

WhatsApp Image 2025-06-19 at 16.08.18_0f4fa9fb.jpg

To make the project stable, compact, and visually appealing, I designed and 3D printed a custom case that holds all key components: the Raspberry Pi, 7-inch touchscreen, USB camera, RGB LEDs, and a push button.

The case was created in a 3D modeling program, with careful measurements to ensure a perfect fit. I included 12 mm holes in the front panel for mounting the LEDs, button, and webcam module, so that everything could be easily installed and accessible. The rear part of the case has space and cutouts for the Raspberry Pi, allowing full access to ports and proper cable routing.

I printed the case using a dual-color filament in a blue and pink gradient, simply because I liked the aesthetic and wanted the device to reflect my personal style. Of course, the case can be printed in any color or material to match different preferences.

The final result is a solid, ergonomic design that neatly houses all electronics in one clean and portable unit.

Final Result

WhatsApp Image 2025-06-19 at 16.08.33_d4d6450c.jpg

After weeks of designing, coding, testing, and refining — the final result is a fully working, standalone AI-powered ASL recognition system!

What it does:

Detects your hand in real time using a USB camera
Extracts 21 keypoints using MediaPipe
Sends the data to a trained AI model (MLP) to recognize ASL letters
Displays the predicted letter on a 7” touchscreen
Uses RGB LEDs and a push button for real-time interaction and feedback
All housed in a custom 3D-printed case

The system responds instantly, gives intuitive feedback, and works reliably for most of the alphabet. It’s designed to be modular and extendable — so you can easily build on it with features like:

Full-word prediction
ASL learning mode
Voice feedback
Web or mobile interface

This project was a fun and challenging way to combine AI, hardware, software, and physical design into one meaningful tool. It proves that even with limited hardware, it’s possible to create real-time AI applications that serve a useful, inclusive purpose.

Downloads

Final_result.mp4

Common Issues

ChatGPT Image Jun 19, 2025, 01_26_58 PM.png

Tips for Better Model Accuracy

While the model works well in basic conditions, here are a few essential tips that can help you improve accuracy, avoid overfitting, and make the system more robust for real-life use.

Use More Than One Hand During Training

One of the most common problems in hand gesture recognition is overfitting to your own hand. If the dataset contains only your hand gestures, the model will likely learn the exact proportions, motion, and finger positions that are unique to you. As a result, it might perform very poorly when someone else uses the system.

How to fix it:

Involve as many people as you can in the data collection process. Ask your friends, classmates, or family members to perform each ASL letter. Even small differences in hand size, finger shape, movement speed, or gesture style add important variability to your dataset. This makes the model more general and helps it adapt to hands it has never seen before.

You don’t need thousands of samples from each person—even 100–200 well-labeled examples per person per letter can dramatically improve model performance.

Additional Tips for Improving the Model

Ensure balanced data
Try to collect a similar number of samples for each letter. Otherwise, the model may become biased toward more common classes.
Collect in different lighting conditions
Bright rooms, shadows, and backlit environments will affect detection. Training with multiple conditions improves robustness.
Be consistent in gesture shape
When recording data, make sure each letter follows the ASL standard. Consistency helps the model focus on important visual cues.
Visualize and clean your dataset
Occasionally inspect the .csv file and remove any corrupted or misclassified entries. Clean data leads to better models.
Try different models
While MLP works well, you can also experiment with other classifiers like Random Forest, SVM, or even convolutional networks if using images.

Congratulations!

ChatGPT Image Jun 19, 2025, 01_55_26 PM.png

If you’ve made it this far — congrats!

You’ve just built your own AI-powered ASL hand sign recognition system from scratch. That’s no small achievement!

Along the way, you’ve combined:

Computer vision (MediaPipe),
Machine learning (MLP model),
Hardware integration (Raspberry Pi, touchscreen, LEDs, button),
And a custom-designed, 3D-printed physical setup.

You’ve also created a real-time, interactive, and fully standalone system that can help support communication, learning, and accessibility. That’s awesome.

If you build your own version, remix it, or improve it — I’d love to see it.