Control Music Player App Using Hand Gesture or Eye Movement
by thomas9363 in Circuits > Computers
213 Views, 2 Favorites, 0 Comments
Control Music Player App Using Hand Gesture or Eye Movement
Individuals suffering from spinal cord injury (SCI) or amyotrophic lateral sclerosis (ALS) often face severe physical limitations, with many confined to wheelchairs and reliant on minimal movements, such as eye or hand gestures, for communication and control. As the disease progresses or depending on the severity of the injury, basic activities like playing an instrument or enjoying music may become increasingly difficult, limiting their ability to experience what once brought them joy.
In memory of a dear friend who loved playing guitar and music but sadly passed away from ALS, this project seeks to empower others in similar circumstances. By developing a gesture-controlled music player app, this software allows users to control music playback using either hand gestures or eye movements. This technology aims to restore a sense of autonomy and enjoyment for individuals with limited mobility, giving them a simple yet meaningful way to reconnect with music.
The project consists of several key components:
- Development of a music player
- Defining hand gestures and/or eye movements and assigning corresponding functions to control the player
- Conducting model training to recognize these gestures
- Inferencing the detected gestures to control the music player
- Creating a standalone program for distribution
I am working on this project using a 16GB laptop running Windows 11, with PyCharm as my development environment. All source codes are available at my GitHub repository.
Music Player
The music player is a GUI-based app with various buttons to perform functions such as loading, selecting, playing, pausing, and resuming songs. It uses the Tkinter, Pygame, and OS modules to simplify development. If these modules are not already installed on your system, you will need to install them.
- Tkinter is used for setting up GUI components like buttons and labels, as well as for dialogs and displaying messages.
- Pygame handles audio playback and event management.
- OS is used for interacting with the file system.
Key Features of the Player:
- Load songs in mp3 format from a specified directory.
- Update the appearance of buttons to reflect the current active one, resetting the previous button to its original state.
- Play, stop, or toggle between pause and resume for the current song.
- Select the next or previous song.
- Control the volume (increase or decrease).
- Automatically play the next song when the current one ends.
- Create a default directory if it does not exist.
- Automatically select and lock the first song from the list.
Functions for Addressing Features 1–6:
- load(): Loads all audio files from the specified directory into the Listbox.
- set_active_button(): Updates the appearance of the buttons to reflect the currently active one.
- play_song(): Handles song playback and updates the UI.
- stop_song(): Stops the currently playing song and updates the UI.
- toggle_pause(): Toggles between pause and resume for the currently playing song.
- next_song(): Plays the next song.
- previous_song(): Plays the previous song.
- volume_up(): Increases the volume.
- volume_down(): Decreases the volume.
- check_for_song_end(): Checks if the currently playing song has ended.
- auto_next_song(): Automatically proceeds to the next song when the current one ends.
As an example, here is the pause/ resume code:
Functions for Features 7 and 8:
This part of the code creates a default directory (C:\music) if it doesn’t exist. When the program starts, it automatically loads all mp3 songs from this directory, locks the first song in the list, and waits for your instructions. If you prefer a different directory, you can modify the code or use the “Load Directory” button to select it.
Interface Layout:
To organize the layout, four frames are created:
- Song Frame: Displays the current song.
- Button Frame: Contains playback buttons.
- Volume Frame: For volume control.
- Listbox Frame: Displays the playlist.
Additionally, a label is positioned at the bottom of the root window to display the status of the song.
Each button is linked to its corresponding function. For example, the play button is linked to play_song() like this:
After many iterations, the player works very well. The source code is available on my GitHub repository. You can also download the standalone executable version and use your mouse to test it.
Hand Gestures and Eye Movements Control
To avoid global installation issues and isolate dependencies, I created a virtual environment using Conda. For hand gesture and eye movement recognition and detection, I installed several modules, including TensorFlow 2.14, OpenCV 4.8.1, MediaPipe 0.10.8, NumPy 1.26.4, and Jupyter Notebook. I also configured my PyCharm project to use this environment.
I have written several scripts to control the music player:
- Eye Movement Control without Deep Learning
- Eye Movement Control with Deep Learning
- Hand Gesture Control with Deep Learning
The first script uses MediaPipe to track the pupil's position in relation to other points and detect mouth opening and closing, while the second and third scripts use TensorFlow and MediaPipe to train deep learning models.
Design Gesture:
To control the music player, you need to perform gestures in front of a webcam. A total of eight gestures are used: play, stop, pause/resume, next song, previous song, volume up, volume down, and "nothing." The "nothing" gesture allows the player to continue performing its current task. In my code and the following explanations, I use "gesture," "class," "label," and "move" interchangeably. The table below outlines the function of each gesture:
You may notice from the table that I use different gestures for volume control in the "eye control without deep learning" and "eye control with deep learning" scripts. I explained the reason for this in my earlier post, Control Robotic Eyes With My Eyes Using AI and Deep Learning.
The two tables below show the hand gestures and eye movements I used for training. Since hand gestures have many more variations, I use keys 'a' to 'z' to input up to 26 classes (0 to 25), should you decide to add more. The bottom table lists the eye movements. For eye gestures, I use keys '0' to '9' to input up to 10 classes.
Train your own tensorflow model:
I have written two articles previously: Control Robotic Eyes With My Eyes Using AI and Deep Learning. and How to Train Custom Hand Gestures Using MediaPipe. In these articles, I describe how to collect data, train models, and deploy the trained models for inference. You can follow these guides if you wish to create your own hand gesture or eye movement models.
The table below lists the files used and generated in each step:
The scripts iris_create_csv.py and hand_create_csv.py collect training data for eye movement control and hand gesture control. The data are stored in iris_gesture_data.csv and hand_gesture_data.csv respectively. I captured 100 data sets for each hand gesture/eye movement. If you want to build upon my data, you'll need to use the same class and key input sequence shown above.
The Jupyter Notebook files iris_train.ipynb and hand_train.ipynb are used for neural network training. During training, the CSV files are fed into a neural network model built with TensorFlow and Keras. The training is fast and can be completed within a minute. After training, the model is saved in TFLite format as iris_gesture_model.tflite and hand_gesture_model.tflite, ready for inference. I have also written two small scripts, iris_detect.py and hand_detect.py, to test the models' validity.
Connecting Gesture Detection With Music Player
Using thread:
I repackaged earlier scripts (eye_detect.py, iris_detect.py, and hand_detect.py) into a function called process_frame(). The root.mainloop() call, responsible for starting the Tkinter GUI, is placed at the end of the script after the process_frame() function. Since process_frame() runs continuously in an infinite loop to process camera frames and detect gestures, it blocks the execution of root.mainloop(), preventing the GUI from appearing.
To resolve this, I run the MediaPipe and OpenCV processing (the detection part) in a separate thread so that the Tkinter GUI can run concurrently without interference. Here is the code that achieves this:
Event Counting:
Gesture detection, whether hand or iris-based, operates at around 30 frames per second. When transitioning from one gesture to another, the camera may detect unintended gestures in between and trigger unwanted actions. For example, when switching from “right eye close” (play) to “left eye close” (pause), the camera might detect a “both eyes closed” (stop) gesture in between. This unintended detection would result in stopping the music instead of pausing it.
To prevent this, I introduced an event counting mechanism. The program only triggers an action after a specific number of consistent gestures are detected in succession. This works by maintaining a counter that increments with each event, and once the counter reaches a specified threshold, the intended action is triggered.
Debouncing Time:
Another issue arises when gestures are detected continuously across multiple frames. This can result in the same button being triggered repeatedly. For example, when detecting the “next” gesture, the song list may continue to scroll down until a different gesture is detected.
To solve this, I implemented a debouncing mechanism. Once a gesture is recognized and an action is performed, the same gesture cannot trigger another action until a short interval has passed. Instead of using time.sleep()—which could block other critical parts of the program like camera capture or the event loop—I track the time of the last action. I compare the current time with the time of the last action, and only trigger the next event if the set debounce period has passed. I introduced a debounce period of two seconds to prevent rapid consecutive triggering.
Summary of Files:
The table below summarizes the files created after combining the detection scripts with the music player, along with the names of the executables that will be discussed in the next step. All components are working well in my PyCharm development environment.
Create Executables
Now it's time to create standalone executables using PyInstaller. First, you need to install PyInstaller via pip from your PyCharm terminal. PyInstaller packages your Python script, along with the Python interpreter and all necessary modules and libraries, into a single file or folder. This makes it easy to distribute Python programs to users who may not have Python or external libraries installed on their systems.
The executable created by PyInstaller includes:
- Your Python script(s)
- The Python interpreter and all libraries
- Any dependencies required by your script
Custom Icons
If you want a custom icon for your executable, you'll need to create one. I use free online tools for this. First, design and save your icon in PNG format. If you prefer a square icon, you can simply convert it to .ico format. For a more polished look, you can create an icon with a transparent background using a free service "Lunapic" to remove the background, then convert the PNG to .ico using a tool like "Ico Convert". My designs are listed in the table above.
Icon Caching Issue in Windows
A common issue in Windows Explorer is that it caches the icons of executable files when they're first displayed. If you rebuild your .exe with a new icon, the icon may not change until you move the executable to another directory. This happens because Explorer shows the cached version of the original icon. To force Windows to refresh its icon cache, follow these steps:
- Navigate to C:\Users\<Your-User-Name>\AppData\Local\Microsoft\Windows\Explorer
- Delete the files that start with iconcache
- Restart your computer
Packaging a Standalone Music Player
For a standalone music player that doesn't require external modules, you can use the following PyInstaller command:
Options explained:
- --onefile: Combines everything into a single executable (you can omit this if speed is more important).
- --noconsole: Hides the console window (omit this if you need console output for debugging).
- --icon=your_icon.ico: Adds a custom icon for the executable (optional).
- --name your_program_name: Specifies the name of the output .exe file.
Packaging with MediaPipe and TensorFlow
For the other executables that require MediaPipe and TensorFlow, you'll need to include custom data files (e.g., .tflite models and MediaPipe modules) using the --add-data option. These modules are usually found inside your virtual environment. For example, the face_landmark and face_detection modules can be located at: C:\Users\Your-User-Name\anaconda3\envs\mediapipe\Lib\site-packages\mediapipe\modules. If you're using a trained model, include the .tflite file during packaging. Here’s the command for irisPlayer:
For hand gesture detection, make sure to bundle hand_landmark, palm_detection, and your .tflite model.
Performance Considerations
When using --onefile, PyInstaller bundles everything into a single executable. During execution, it extracts all the files to a temporary directory, which may cause a delay in starting the program. For example, irisPlayer.exe with the --onefile option takes about 1 minute and 54 seconds to start.
To improve startup speed, you can omit the --onefile option when using PyInstaller. This will generate a folder containing all the necessary files instead of bundling them into a single executable. The folder includes a subfolder named "internal" and the executable itself. If you are using a trained TFLite model, make sure to copy it into this folder as well.
This approach reduces load time since the program won't need to extract everything on each run. For example, without the --onefile option, my program now loads in about 33 seconds.
Note that these executable files tend to be large and often exceed GitHub's size limits. Therefore, you'll need to follow the steps above to package the program on your own system.
Conclusions
You can watch a demo video showcasing the use of eyes, hands, and mouth to control the music player. While using these programs may require some practice to master the gestures and timing, the “handPlayer” is designed to work reliably for most users. However, the eye movement control may require fine-tuning to ensure optimal performance. The data used for the iris_gesture_model.tflite in irisPlayer.py was collected using my own eyes, and the thresholds in eyePlayer.py are calibrated based on the specific aspect ratio of my eyes. If other users experience discrepancies, they can retrain the models or adjust the thresholds to improve performance and compatibility.
This project opens the door to a wide range of possibilities beyond just controlling a music player. For instance, one could develop an editor app where eye movements control the mouse pointer, enabling people with limited mobility to express themselves in writing. Such applications have the potential to greatly enhance accessibility for individuals with disabilities, such as those with spinal cord injuries or ALS.
Through this project, the integration of AI-based gesture recognition with accessible technology shows the potential to create more inclusive and empowering user interfaces, offering new ways for individuals to interact with digital devices. This is only the beginning, and future development could expand gesture-controlled applications across various fields, such as communication tools, gaming, and creative software.