AI Gesture Recognition Car With Computer Vision and Raspberry Pi: Part 1 in Building a Showcase

My teammate and I used this robot(which we decorated to look like The Lorax) in a recent showcase, as it is one of many robots we created to assemble a robotic performance. Be on the lookout for more Instructables to recreate our Lorax in Wonderland performance, and gain many new skills along the way.

This human gesture recognition car is controlled with your fingers, each signal responsible for a different function. While I used a gesture recognition model to control a car, you can use the technology for any project you desire.

The gesture recognition was accomplished using open Computer Vision(CV) and a pre-trained machine learning model: mediapipe's hand_landmark model. This allowed the camera to track the position of finger tips respective to the position of other fingers based on their location on a coordinate plane.

Skill Level:

Familiar with RaspberryPi
Familiar with circuit components like ultrasonic sensors and motor controller
understanding of Python coding skills.

Supplies

RaspberryPi 4B with the latest version of Raspian OS installed : link
Monitor
Wired keyboard and mouse with RaspberryPi are optional. If you are familiar with VNC/remote access of Pi, you won't need these.
RaspberryPi camera (link)
RaspberryPi camera tripod (optional) link

If you choose to make the robot into a mobile car:

Ultrasonic Sensor
Uninterrupted Power Supply(a mobile robot that is using resource-intensive CV requires more than one power source)
Motor driver L298N
2 Wheel Drive Chassis(to support the car)(link)
Optional: You can also design your own chassis like I did by drilling through an acrylic sheet and soldering wires but that requires more hardware skills.
9V battery pack

Optional: If you want to see whether the webcamera sensed your hand gesture while the robot is running "headless", connect a LED.

Why We Are Using 2 Different Power Supplies

The robot uses both a Uninterrupted Power Supply and a motor driver L298N. The reason being is managing power consumption to run resource-intensive machine-learning models on RaspberryPi while keeping it mobile.

The UPS is powered by a lithium battery pack, and is a backup powerhouse for RaspberryPi.

Install Latest RaspberryPi OS 64-bit

You can set up Pi using their official instructions.

Connect and Enable Picamera

On a RaspberryPi desktop: go to Main menu --> Preferences -> RaspberryPi Configuration -> Interfaces -> Camera -> select Enabled ->OK.

After this, reboot your Pi.

You can look for in depth instruction on how to set up PiCamera on Raspberry Pi's official instructions

Install Mediapipe Packages

Use the following command to install Mediapipe:

$ python -m pip install mediapipe

You need to have a monitor connected to see the video stream.

2. Now, install the dependencies from Mediapipe for handlandmark trained model.

Clone the contents of this Github folder to your RaspberryPi.

Run this script in the command prompt of Raspberry Pi:

cd mediapipe/examples/hand_landmarker/raspberry_pi

sh setup.sh

Coding: Understanding Computer Vision

Understanding the code:

We are using the Hand_landmark model from Mediapipe which consists of 2 pre-trained modules: hand_landmark_detection model and palm detection model. What this does is it assigns coordinate points to 21 locations on our hand, respective to their location to one another.

We used these coordinate points(which are assigned as (x,y,z) points to create a Python array for the tips of the fingers. In our take of the code, we determine whether a finger is "up" or "down" based on whether the tip of the finger is "greater than" or above the other locations on the finger.

The z in the ordered pair is the landmark depth, with the origin of the depth being the wrist.

The tips and middle points of the fingers consist of the following points and are arranged in the following array:

tip=[8,12,16,20]

mid=[6,10,14,18]

tipname=[8,12,16,20]

midname=[6,10,14,18]

fingers=[]

finger=[]

Recognizing Hand Gestures

In order to recognize different hand gestures using computer vision, we used the values in the array above(with the tip values and mid values) to interpret the hand.

The algorithm compares the y coordinates of the these points. If the tip of the finger is higher than the lowest point, then the count for the tip is 1, else 0.

Then, in order to recognize a gesture, we created the if statements in the second image. This assigns a gesture(ie: "all up" means all fingers are raised), which we can then use to control the robotic car.

Setting Machine Learning Parimeters

The parameters we set for this project include only sensing 1 hand in the camera's frame, ensuring that the image in the live stream being sensed was static, and a minimum tracking confidence and detecting confidence of 0.7:

with handsModule.Hands(static_image_mode=False, min_detection_confidence=0.7, min_tracking_confidence=0.7, max_num_hands=1) as hands:

The min detection and tracking confidence are generally within the range of 0-1, and are used to set how "accurate" the hand should be sensed and to what success rate. In a 0.7 value, there would be 70% accuracy.

A tracking confidence that is too low would mean that a hand is sensed very frequently(even if nothing is present) whereas a tracking confidence that is too high would be highly selective. You may experiment with different tracking confidences to see what meets your purposes best.

Setting Gestures

For the purpose of controlling a car, the different gestures must do different tasks. For example, in our case, we made 4 fingers up = go forward, 1 finger up = do a dance, and 2 fingers up = reset gesture.

The reset gesture is particularly important as it allows all control variables to be reset to 0, and lets us run the sequence again(without having to do it manually off of a monitor).

OPTIONAL: I have added different LED blinks in order to determine whether the hand gestures were detected, given that the robot is headless during its performance(a.k.a runs on its own once started, without computer). This reset action(which occurs when I raise 2 fingers) is not programmed to move the robot, simply to light an LED.

Code Reference

Refer to a copy of the entire code for the gesture recognition robot here. It has been tested and run many times:

Note: This code includes the element of locomotion as well.

Downloads

motor and guesture working code-april 9 2024.py

Locomotion

In the diagram above, you can see how to connect 2 DC motors to raspberry pi, in order to make your robot mobile.

A big part of controlling the 2WD is using GPIO's Pulse with Modulation(PWM), which allows us to increase and decrease the power given to each motor without simply turning them "off" or "on".

Adding an Ultrasonic Sensor

The ultrasonic sensor is used to detect any walls or obstacles in front of it once it starts moving(to prevent any crashes).

You may change the logic of how/ what to do when an obstacle within a certain distance is sensed for your own purposes.

Callibration

Now that all the components are assembled, its time to test the wheels! Test the code many times in order to optimize the hand recognition, distance from obstacles, and the direction the wheels are going in.

The wheels in particular, being that it is a DIY robot, will need to be calibrated in order to maximize how straight it can go.

Add More!

While this code sticks to three gestures, you can play around and add code for more! For example, a thumbs up or thumbs down could initiate some other form of locomotion or action on the robot.

You can also try implementing other machine learning models such as posture recognition, color recognition, or any other AI model. Happy Building!!