Real-Time Push-Up & Squat Recognition With Form and Deepness Scoring

This project is a complete AI-powered fitness tracker built from scratch. It recognizes exercises (push-ups and squats) in real-time using a webcam, calculates repetition count, evaluates form and depth using pose estimation, and displays feedback on a small LCD screen.

The system is optimized for low latency and minimal hardware overhead, running real-time inference and visual feedback with only a webcam, Raspberry Pi, and a display module.

Supplies

Supplies

Hardware

Raspberry Pi 5 with active cooling (fan or heatsink)
Logitech C270 USB webcam
LCD1602 I2C display module
Jumper wires
5V/3A USB-C power supply for Raspberry Pi
MicroSD card (16GB or more)

Software and Tools

Python 3 (on Pi and laptop)
OpenCV for image processing
MediaPipe or MoveNet for pose estimation
YOLOv8 (Ultralytics) for exercise classification
Roboflow for dataset annotation
Jupyter Notebook for training
Git + VS Code or any preferred IDE

Define the Project Goal and Requirements

The goal is to create a real-time exercise recognition system that:

Detects squats and push-ups
Counts repetitions
Evaluates form and depth
Displays feedback on an LCD1602

Tools Required: Raspberry Pi 5 (139.90€), webcam (Logitech C270 (24.99€)), LCD1602 I2C(8.95€), Python, YOLOv8, pose model (MediaPipe, MoveNet, BlazePose...).

Collect the Dataset

Gathering Data:
Collected a variety of push-up and squat images from public datasets on Kaggle and Roboflow.
Additionally recorded some custom footage using a webcam to increase variation and improve generalization.
Labeling:
Labeled the data using Roboflow (if needed).
Defined two classes: pushup and squat.
Exported labeled data in YOLO format, ensuring consistent folder structure and naming (train/val/test with /pushup and /squat in all of them).

Train YOLOv11 Classifier

Environment Setup:
Used a Jupyter Notebook for step-by-step monitoring.
Installed dependencies and used CUDA for faster training
Model Training:

model = YOLO('yolov11n-cls.pt') # lightweight for faster inference

model.train(data=data_path, epochs=40, imgsz=640)

Export and Test:
Saved the best model as exercise_classifier.pt
Verified performance on test images and confusion matrix.

Write Core System Components

Server (Laptop):
Handles classification and pose analysis.
Receives compressed frames from client.
Runs inference using YOLO model and pose model.
Sends back exercise name, rep count, form score, and depth analysis.
Client (Raspberry Pi):
Captures frames using the webcam.
Sends them to the server.
Receives feedback and displays it on the LCD.
Camera Handler Module:
Designed to handle video capture using OpenCV.
Includes error logging, resolution control, and compression.
LCD Display Logic:
Created a dedicated class for LCD control.
Displayed current rep count, detected exercise, and form score in real time.

Implement the Analysis Engine

YOLO Classification:
Used the YOLO model to detect whether the person is doing a push-up or a squat in each frame.
Smoothed results using frame intervals to improve stability.
Pose Estimation with MediaPipe (or MoveNet):
Extracted keypoints such as hips, knees, shoulders.
Wrote mathematical functions to calculate:
Joint angles (e.g., knee bend)
Vertical displacement to determine depth
Smooth rep detection logic using Savitzky-Golay filter based on angle thresholds and movement patterns
Repetition Counter:
Implemented state machine logic to count reps only when full motion is completed.
Applied filters to avoid miscounts from jitter or false poses.
Output Integration:
Combined results (exercise type + form score + depth) and sent them back to the Raspberry Pi client.
Displayed feedback on the LCD in real time

Final Testing and Debugging

Ensured network communication over TCP
Tested webcam capture speed and server processing latency
Verified LCD updates with correct rep count and exercise form feedback
Adjusted thresholds for squats and push-ups based on live testing