Build Your Own Pong Game With Q-Learning AI on ESP32!

by abidulhaqahnafahnaf in Circuits > Microcontrollers

360 Views, 5 Favorites, 0 Comments

Build Your Own Pong Game With Q-Learning AI on ESP32!

image_2024-08-25_130015929.png
image_2024-08-25_130321190.png

In this project we'll walk through building a Pong game using an ESP32 microcontroller, an ST7735 TFT display, and an MPU6050 gyro sensor. The unique aspect of this project is the implementation of a Q-learning-based AI opponent, making the game more challenging and engaging. I will cover everything from setting up the hardware to understanding and implementing the Q-learning algorithm. By the end of this project, you'll have a fully functional Pong game with an AI opponent that learns from its mistakes.

Supplies

What You'll Need

  1. ESP32 microcontroller
  2. ST7735 TFT display (160x80 pixels) 0.96 inch
  3. MPU6050 gyro sensor
  4. Jumper wires and breadboard
  5. Arduino IDE for programming the ESP32

Libraries:

  1. Adafruit ST77XX Library
  2. Adafruit MPU6050 Library

Setting Up the Hardware

Start by connecting your ST7735 TFT display and MPU6050 gyro sensor to the ESP32. The connections are straightforward:

  1. TFT_CS (Chip Select) connects to ESP32 pin D6.
  2. TFT_RST (Reset) connects to ESP32 pin D8.
  3. TFT_DC (Data/Command) connects to ESP32 pin D7.
  4. TFT_MOSI (Master Out Slave In) connects to ESP32 pin D9.
  5. TFT_SCLK (Serial Clock) connects to ESP32 pin D10.
  6. MPU6050 connections are straightforward with the standard I2C communication pins (Which for Xiao Esp32 S3 is D4 with SDA and D5 with SCL of MPU6050.

Initializing the Display and Sensor

image_2024-08-29_223914138.png

Before diving into the game logic, ensure that your TFT display and MPU6050 sensor are initialized correctly. The TFT will display the game, while the MPU6050 will control the player's paddle movement.

Understanding and Implementing Q-Learning

Q-learning is a type of reinforcement learning where the AI learns by interacting with the environment. It uses a Q-table to store the expected future rewards for different actions in different states. Over time, the AI learns to choose the action that maximizes the reward.

In our Pong game, the AI paddle's position and the ball's position form the state space, while moving up, down, or staying put are the possible actions.

Here's how the Q-learning process works:

  1. State Representation: We discretize the positions of the AI paddle and the ball to form the state space. In this example, we've divided both the AI paddle's position and the ball's position into 8 discrete states.
  2. Action Choices: The AI can choose to move up, move down, or stay in its current position.
  3. Reward System: The AI receives a reward of +1 for intercepting the ball and -1 for missing it. This reward system guides the AI to learn how to block the ball more effectively.
  4. Q-Table Update: The AI updates its Q-table based on the reward received after taking an action. The update formula is:

Q(s, a) = Q(s, a) + α × [reward + γ × max(Q(s', a')) - Q(s, a)]

Here, α is the learning rate, and γ is the discount factor, which balances immediate and future rewards.

  1. Exploration vs. Exploitation: The AI occasionally chooses random actions (exploration) instead of the action with the highest Q-value (exploitation) to discover better strategies.

Coding the Q-Learning AI

image_2024-08-29_224616348.png
image_2024-08-29_224721602.png

Implement the Q-learning logic in your code. The aiPaddleControl function handles the AI's decisions based on the Q-learning algorithm and chooseAction function choses appropriate action for the AI paddle.

Testing and Tuning

Upload the code to your ESP32 and observe the AI in action. You can adjust the learning rate, discount factor, and exploration rate to tune the AI's performance.

float learningRate = 0.1;

float discountFactor = 0.99;

float explorationRate = 0.2; // Probability of exploration

With each game played, the AI should improve its ability to block the ball.

Conclusion


Congratulations! You've successfully built a Pong game with a Q-learning AI opponent. This project not only demonstrates the basics of game development on an ESP32 but also provides a practical introduction to reinforcement learning techniques. Feel free to experiment further and enhance the AI or add more features to the game.

For full code: GitHub