Dancing Game Using Custom Pose Detection Model in Python

279 Views, 3 Favorites, 0 Comments

Dancing Game Using Custom Pose Detection Model in Python

Hello everyone!

I'm excited to share the journey of making my first ever AI project as a first-year student in the Creative Tech and AI course at HOWEST University in Kortrijk. I created a dancing game that uses a custom pose detection AI model in Python.

In this game, a choreography video plays on the laptop screen, and the player needs to mimic the dance moves in front of a webcam. The application uses a pose detection model to compare the player's moves with the original choreography and calculates a similarity score. At the end of the dance, the player receives a performance score on a scale of 100%.

I’m thrilled to take you through my entire process of building this game! It was my first project of this kind, and I can’t wait to share my experiences and insights with you all.

If you’re interested in creating a similar project, here are a few prerequisites:

Proficiency in Python
Basic understanding of AI

If you’re new to Python or AI, this project might be a bit advanced, so I recommend starting with the basics first.

Additionally, I added some extra features to my project, such as:

Connecting to an LCD display via Raspberry Pi
Creating a custom box for the webcam

These extras are optional, but they add a fun challenge if you’re up for it!

Supplies

For the main application, you only need your laptop and a webcam! You can use a build in one, but I bought this NIVEOLI Full HD 1080P Web-Camera with Microphone.

For connecting an LCD display, I additionally used:

Raspberry Pi 5 - 8GB Starter pack
Freenove Projects Kit for Raspberry Pi 4 B 3 B+ 400 - you do not need the whole kit, I used it solely for the LCD display.
SANDISK MicroSDHC Ultra 32 GB - but any microSD card would be enough for this project.

For making a box for my webcam, I needed:

PMMA sheet of 3 mm 600x450mm - bought it in the Industrial Design Center's shop of my college.
Red polyester fabric 150x50cm
Pack of decorative pearls - you can find those in any art supply store!

My total for this project was around 282 euro, but keep in mind that it can also be much cheaper if you don't do the extras. To see how much did each product cost, I will attach my Bill Of Material file, so you can check it out if you need to!

Downloads

VIOLA_NGUYEN_BOM_bill-of-materials-2.pdf

Collecting Data

First things first, we need a pose detection model to use for the game. You can use any already existing ready-to-use model, such as one from MediaPipe, OpenPose, or YOLO, but I had to train my own model for this school project. To do that, I retrained a YOLO model using custom data. If you want to read how to do that from the official YOLO website, you can click here.

I aimed to gather a small dataset of people in motion, preferably dancing, with around 4000-5000 photos. To do that, I started searching for some datasets on Kaggle. I found a Mini Human Dataset, which consists of 3045 photos in COCO format.

COCO is a huge dataset used for object detection, key point detection and other computer vision tasks. The original YOLO model was trained on COCO, but since it's enormous (330K pictures), it was too big for my project. The Kaggle dataset is a smaller version of COCO, making it just right for my needs.

One last thing I had to do was to gather more data to hit my goal, and label it myself afterwards. Since I was in a dance team a couple of years ago, I used videos from my training as a source of data.

Annotating Data

To annotate my data, I used the free version of Roboflow, a great tool for computer vision tasks.

TIP! You can also use Roboflow Universe to search for datasets.

I started by creating a new project and setting up a key points template (see attached images). I followed the COCO template, which includes 17 key points for accurate pose estimation.

Next, I uploaded some videos that I thought would be useful for training. Roboflow lets you choose the sample frequency for the videos you upload, which is really handy.

From there, I began applying the key points template to all the images. It might sound simple: just move the key points to the right spots. But remember, manually preparing data for a pose estimator is a detailed and time-consuming process that needs precision and patience! It took me over 20 hours to annotate around 750 images.

Then I generated a dataset and applied augmentation to increase the number of photos I will get as a result, and make my model more robust to noise and different lighting.

Since my initial dataset is relatively small, I occasionally went back to annotate more data to make the final model even better.

Merging Datasets

Screenshot_28-5-2024_10913_app.roboflow.com.jpeg

I wanted to combine my own annotated data with an existing dataset I found on Kaggle, but I ran into a small issue. I needed to train the model using YOLO, which meant I needed all my data in the YOLOv8 format. However, the Kaggle dataset was only available in COCO format. When I tried converting the format using online tools, key points that were initially marked as deleted would randomly reappear, messing up the entire label document. This problem had me stuck for a few days. :(

The solution, however, was surprisingly simple! I used the COCO converter by YOLO to successfully convert and merge my datasets. After the conversion, I manually organized all the labels and images into the correct folders and then trained my first model on this data.

Model Training

First thing I did was to import the necessary libraries, namely YOLO.

I imported torch as well to be able to train the model on GPU, which makes the whole process faster! Be aware that if you want to use GPU, you need the PyTorch version that supports CUDA for your system. You can get to know more on the official website.

from ultralytics import YOLO 
import torch # To train the model on GPU

Training the model itself requires just three lines of code! :)

model = YOLO('yolov8n-pose.pt') # getting the pretrained YOLO model
model.to('cuda') # getting the model to train on GPU for efficiency
results = model.train(data = <path/to/your/data.yaml/file>, epochs = 100)

Epochs parameter here means how many times the model will run over the entire dataset. There are a ton of parameters you can apply, which you can read more about here.

Now you can go watch you favourite movie or do something fun with your friends, because training may take a while!

During the training you will get intermediate results like the ones on the picture above. Those results will tell you how well a model performs on a test set. Here you can read more about what each performance metric mean.

I would often go back and retrain my model with different parameters or with an enhanced dataset to make it more robust, so be patient and do not stop after the first training if you want the best results!

Coding

For this project, coding took up most of my time. I'll give you a general overview of what my code does first, and then we will dive a little bit deeper into each function I wrote.

A quick look at the main parts of my code:

Audio Extraction
Video Preprocessing
Capturing and displaying a choreography video and webcam feed simultaneously
Scoreboard
Menu and Navigation

Audio Extraction

Before this project, I had never worked with audio in Python, so extracting and syncing audio and video was a new challenge for me. Thankfully, I found a lot of helpful answers on Stack Overflow from people with the same questions! :)

To sync audio and video, you first need to extract the audio. I used the VideoFileClip class from the moviepy library for this task.

Video Preprocessing

This part ensures the choreography video displays smoothly on the laptop screen. Since it's a dancing game, it's crucial that the dance video doesn't lag. That's where video preprocessing comes in!

Pose detection takes longer to process a frame compared to, for example, object detection models. To provide a smooth user experience, I preprocess the choreography video before the game starts instead of doing it in real time. This way, when the game begins, the video runs smoothly.

Here's how the function works:

Create a Video Writer Object: Using the cv2 library, I create a new video from the choreography video.
Open and Process the Video: I open the choreography video and iterate over each frame.
Apply the Model: For each frame, I apply my pose detection model and get the results. These results include key point skeletons, which I plot on the frame.
Save the Processed Frame: I write the processed frame to a new video.
Store Key Points: I create a list to store the key points objects from each frame. I save this list using the pickle module to maintain the structure of the object in a separate file.

By the end of the function, I have a new video with key point skeletons plotted on it and a pickle file with a list of key points objects from each frame.

Capturing and displaying a choreography video and webcam feed simultaneously

This is the main function for my game. I definitely learn a ton about how to work with videos while creating it! This function is designed to handle both the video and audio playback for the game, as well as overlaying the webcam feed and calculating the player's performance score. Here is the breakdown of what is happening there:

Loading Data: First, I load the key points data from a pickle file and initialize video capture object for both the choreography video and the webcam.
Audio Setup: Then I use Pygame library to initialize and play the audio that goes with the dance video. I sync it with the video in the main while loop.
Display Setup: The function sets the application to run in full-screen mode for a better user experience. I also get the frame rate of the main video to sync the video and audio correctly later.
Main While Loop: While the video and webcam are open, I keep doing the following steps:
Get the current playback time of the audio.
Calculate the corresponding frame number in the video.
Read frames from both the video and the webcam.
Mirror and resize the webcam feed.
Use the model to predict key points from the webcam feed.
Overlay the webcam frame on the main video frame.
Display the combined frame in a window.
Performance Scoring: Every 12 iterations, I calculate a similarity score between the key points from the video and the webcam by getting the normalized x,y coordinates from the key points objects and comparing them.
Cleanup: After the loop ends, I release all resources, close files, and calculate the final performance score. This final score is written down in a separate csv-file that I created to create a scoreboard later.

Scoreboard

I have created two classes to make it easier for a user to track their progress and compare achieved results over time: Score and Scoreboard classes. These classes handle storing, reading the csv-file, and displaying performance scores for the dance game.

Menu and Navigation

As the name suggests, these functions display the menu of the application, and they help to get user input for further processing.

All this functionality is later combined in the main function, where all the magic happens! If you're curious about the details and want to explore the code line by line, I invite you to my GitHub, where you can check out the whole code.

Extra: LCD Display

Here's a fun challenge: try connecting a Raspberry Pi to your application! I did it as part of my project requirements, and it definitely added a cool factor to my game.

I used the Raspberry Pi to display current performance scores on an LCD screen. During my second semester, I have created a whole class for displaying text on LCD, so I just copied it to the project repository. Essentially, this class handles the sending of data with the right instructions to the display using I2C bus. I learned all about it from the Sensors and Interfacing course, which is a part of the programme for my first year, so I legally can't share the learning materials here. However, you can use an existing library to operate an LCD display, as shown in this tutorial.

To display performance scores on LCD, however, I had to receive the data on my Raspberry Pi first. I needed fast communication between my laptop and the Raspberry Pi, which I achieved using sockets. If you're new to sockets, don't worry — there are plenty of guides out there to help you get started, like this one. The base code for this setup was provided by one of my teachers, but I customized it to fit my project. You can take a look at the modified version on my GitHub!

Extra: Making a Fancy Webcam Box

One of my project requirements was to create a physical asset, like housing for one of the devices. I decided to make a fancy box for my webcam! Luckily, I had access to HOWEST's Industrial Design Center's materials and machinery, but feel free to adapt this project to your available resources.

Sketch and Materials

First, I made a quick sketch. Since my project is all about dancing, I thought a red curtain resembling a theater stage would be a fun touch. The main goal was to make a simple box to fit my camera and elevate it a bit. I measured my webcam and added these dimensions to the sketch. I also wanted to paint some decorative elements with gouache paints but didn't get around to it.

Initially, I planned to laser cut the box from a 3mm wooden plank, but the IDC shop was out of wood, so I opted for a transparent 3mm PMMA sheet instead. This ended up being a good choice, as the legs of the box were more stable than they would have been with wood.

Laser Cutting

To prepare for laser cutting, I used a website called MakerCase to create a base box model with finger edge joints. Then, I used Adobe Illustrator to customize the model by adding legs and a hole for the webcam wire.

The next day, I went to the IDC and laser cut all the parts. The whole process took less than 10 minutes!

Assembling and Adding Details

Finally, it was time to put everything together and add some decorations. Here's what I did:

Spray Painting: I didn't want my box to be transparent, so I bought dark gray spray paint in a local art supply store. Living in a student dorm, I followed an indoor spray-painting tutorial with all the necessary precautions. First, I lightly sanded the surfaces to help the paint stick. I did one layer of paint on each side, which gave a nice colour. For better and more durable results, you could use a primer before painting and a sealant afterward.
Sewing the Curtain: I cut two squares of fabric, sealed the edges with a lighter, and sewed the top to create a loop for the thread.
Gluing the Parts: Since the acrylic parts didn't hold together well with just the finger joints, I used super glue for plastic materials to assemble the box.
Adding the Curtain: I cut a piece of thread to the desired length, glued one end to the box, threaded the curtains, and then glued the other end to the opposite side. To make it look nicer, I glued burgundy-colored pearls on top of the thread edges.
Decorations: I initially tried painting with gouache, but it turned out patchy, so I washed it off. Instead, I used the leftover burgundy pearls to create a star, which looked much neater.

This fancy webcam box added a nice touch to my project and was a fun hands-on activity!

Dance!

Everything is set up and ready to go! Now comes the best part—enjoying the dancing game!

Remember, this is not just about the score—it's about having a blast and celebrating all the hard work that was put into making this game. Happy dancing! :)