ESP32 Voice Assistant With Gemini AI

by circuitsmiles in Circuits > Microcontrollers

8880 Views, 64 Favorites, 0 Comments

ESP32 Voice Assistant With Gemini AI

ESP32 AI assistant - version 1

This project combines an ESP32 microcontroller with a Python server (using Google's Gemini AI for smart responses and gTTS for speech) to create a device that talks to you without ever listening. It's a fantastic way to learn about microcontrollers, AI APIs, and text-to-speech, all while keeping your AI token usage super low!

Supplies

Hardware Components:

  1. Microcontroller: ESP32 Dev Kit C
  2. Display: 0.96" OLED Display (SSD1306, I2C interface)
  3. Audio Output: MAX98357A I2S Class-D Amplifier + Small 8-ohm Speaker
  4. User Input: 2x Tactile Buttons
  5. Visual Cues: 1x Red LED, 1x Green LED
  6. Miscellaneous: Breadboard, Jumper Wires (male-to-male), USB Power Supply (at least 1A)

Software & Accounts:

  1. Arduino IDE (for ESP32 firmware)
  2. Python 3 (for the server)
  3. A Google API Key (for Gemini API access)

The Wiring - Connecting Everything Up

pinout.jpg
fritz.png

This is where the physical build comes together. Take your time, double-check connections, and ensure your ESP32 is powered off while wiring. All GND pins from components should connect to a common ground rail on your breadboard.

Firmware Flash - Programming the ESP32


Now that the hardware is connected, let's load the brain into the ESP32. Use github repo for code.

Important - ensure Wi-Fi credentials are updated

  1. Install Arduino IDE: If you don't have it, download and install the Arduino IDE.
  2. Add ESP32 Board: Go to File > Preferences and add this URL to "Additional Boards Manager URLs": https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
  3. Install Board: Navigate to Tools > Board > Boards Manager, search for "esp32", and install the package.
  4. Install Libraries: Go to Sketch > Include Library > Manage Libraries, search for and install:
  5. Adafruit GFX Library
  6. Adafruit SSD1306 Library
  7. Open Code: Open the provided ESP32 firmware .ino file.
  8. Upload: Select your ESP32 board and port (Tools > Board and Tools > Port), then click the "Upload" arrow.

The AI Server - Python & Gemini

server running.jpg

This Python server runs on your computer (or a Raspberry Pi) and acts as the intelligence hub. Use github repo for code.

  1. Install Python: Ensure you have Python 3 installed.
  2. Virtual Environment (Recommended):
  3. python3 -m venv venv
  4. source venv/bin/activate (macOS/Linux) or venv\Scripts\activate (Windows)
  5. Install Dependencies: run - pip install -r requirements.txt
  6. Get Gemini API Key: Go to the Google AI Studio to get your GEMINI_API_KEY.
  7. Create .env file: In the same directory as your server.py file, create a new file named .env and add: GEMINI_API_KEY="YOUR_API_KEY_HERE"
  8. Run the Server: Open a terminal in your server's directory and run: python server.py The server will now be running, waiting for requests from your ESP32!


Putting It All Together & How to Use

cover_image.png

Operation: Your Offline AI Is Ready!

  1. Power Up: Connect power to your ESP32. It should connect to Wi-Fi, and the OLED will display "Ready" with the green LED solid.
  2. "Next" Button: Press this button to cycle through the predefined phrases on the OLED display.
  3. "Speak" Button: When you've found the phrase you want, press "Speak."
  4. The OLED will show "Thinking..." (red LED solid) as the ESP32 contacts the server.
  5. Once the server responds, it will switch to "Speaking..." (green LED solid, red LED blinks) as the audio plays.
  6. After playback, it returns to "Ready."

The Token-Saving Trick: Remember, the Python server deliberately limits the length of the Gemini response to keep your API token usage (and potential costs!) down. It's an efficient little system!

Conclusion & What's Next?

Congratulations! You've built a functional, privacy-conscious AI voice assistant. This project demonstrates how versatile the ESP32 is when combined with powerful APIs.

Ideas for improvement:

  1. Add a local web interface for custom prompt configuration.
  2. Integrate other sensors or actuators.
  3. Explore different Text-to-Speech engines or even local voice models.

I hope you enjoyed this build! If you have any questions or run into issues, leave a comment!