DIY ESP32-S3 Voice Assistant V0.2: Upgrade for Real Voice Input (INMP441 Mic + 16MB Memory)

by circuitsmiles in Circuits > Microcontrollers

198 Views, 1 Favorites, 0 Comments

DIY ESP32-S3 Voice Assistant V0.2: Upgrade for Real Voice Input (INMP441 Mic + 16MB Memory)

ESP32 AI assistant - version 2: Real Voice Input with INMP441! (16MB Memory Upgrade)
IMG_20251005_162055.jpg

Welcome to the V0.2 Major Upgrade of our ESP32 AI Chat Bot!

In the previous version, our "assistant" could only respond to predefined text prompts selected by a button. This was great for efficiency and privacy, but let's face it: we want to talk to our devices!

This guide will walk you through the essential steps to transform the project into a true, conversational Voice Assistant capable of real-time speech processing. The key changes involve a critical hardware upgrade to manage audio memory and the addition of a high-quality digital microphone.

Supplies

components.jpg

Hardware Components:

  1. Microcontroller: ESP32 S3 N16R8
  2. Display: 0.96" OLED Display (SSD1306, I2C interface)
  3. Audio Output: MAX98357A I2S Class-D Amplifier + Small 8-ohm Speaker
  4. Audio Input: INMP441 I2S Microphone
  5. User Input: 2x Tactile Buttons
  6. Visual Cues: onboard RGB LED
  7. Miscellaneous: Breadboard, Jumper Wires (male-to-male), USB Power Supply (at least 1A)

Software & Accounts:

  1. Visual Studio Code with PlatformIO
  2. Python 3 (for the server)
  3. A Google API Key (for Gemini API access)

The Wiring - Connecting Everything Up

circuit_diagram_fritz.png
Screenshot 2025-10-05 164357.png
Screenshot 2025-10-05 164526.png

This is where the physical build comes together. Take your time, double-check connections, and ensure your ESP32 is powered off while wiring. All GND pins from components should connect to a common ground rail on your breadboard.

Firmware Flash - Programming the ESP32

We are using PlatformIO for easy management of the large firmware and the ESP32-S3's unique memory configuration.

  1. Download Project: Clone the V0.2 code base from GitHub
  2. PlatformIO Setup: Open the project in VS Code with the PlatformIO extension installed.
  3. Configuration: Update the platformio.ini file to correctly specify the ESP32-S3-N16R8 with appropriate memory/partition settings.
  4. Modify: update server address (you can get it after launching the server)
  5. Compile & Upload: Use the PlatformIO Build and Upload buttons to flash the firmware onto your ESP32-S3 board.
  6. WiFi: if your wifi credentials aren't already setup, you would be prompted to set it up after the first upload


The AI Server - Python & Gemini

Screenshot 2025-10-05 175103.png

The server now needs to handle raw audio data (or transcribed text) coming from the ESP32, which is captured by the INMP441. Get code from GitHub

  1. Install Python: Ensure you have Python 3 installed.
  2. Virtual Environment (Recommended):
  3. python3 -m venv venv
  4. source venv/bin/activate (macOS/Linux) or venv\Scripts\activate (Windows)
  5. Install Dependencies: run - pip install -r requirements.txt
  6. Get Gemini API Key: Go to the Google AI Studio to get your GEMINI_API_KEY.
  7. Create .env file: In the same directory as your server.py file, create a new file named .env and add: GEMINI_API_KEY="YOUR_API_KEY_HERE"
  8. Run the Server: Open a terminal in your server's directory and run: python server.py The server will now be running, waiting for requests from your ESP32!


Testing the Voice Assistant

Operation: Your AI Assistant Is Ready!

  1. Status Check: The OLED should display "Ready" and the inbuilt RGB LED should show the "idle" color.
  2. Start Recording: Press Button 1. The inbuilt RGB LED should change color (e.g., to green) to indicate it is listening.
  3. Speak: Ask your question!
  4. Stop/Timeout: Recording will stop after 6 seconds or when you press Button 2.
  5. AI Response: The OLED will show "Thinking..." and then the response audio will play via the speaker.

Conclusion

Conclusion: The Power of the Upgrade


The ESP32 Voice Assistant V0.2 represents a massive leap forward. By making the strategic switch to the memory-rich ESP32-S3-N16R8 and integrating the INMP441 I2S Microphone, we successfully overcame the memory hurdles of V0.1. We have transformed a button-driven prompt machine into a truly conversational, voice-input-enabled AI device, all while keeping the build clean by utilizing the onboard RGB LED for status cues. This project proves that high-performance AI hardware is fully accessible using powerful, modern microcontrollers.

Future Enhancements (V0.3 Preview)

While the two-button system provides reliable, user-controlled recording, the ultimate goal for a voice assistant is completely hands-free interaction. For V0.3, our focus will be on removing the buttons entirely by implementing Wake Word Detection.

Planned V0.3 Enhancements:

  1. Hands-Free Activation: Implement a wake word model (e.g., using TinyML or a platform like Edge Impulse) to allow the ESP32-S3 to constantly monitor audio from the INMP441 microphone.
  2. Button Removal: The current Start and Stop buttons will be eliminated. Recording will automatically begin when the wake word is detected and end after a pause in speech (or a defined timeout).
  3. Optimized Power: Explore deep sleep modes or highly optimized wake word libraries to ensure the always-listening state doesn't drain the power source excessively.

Stay tuned for the next evolution of the project, where we finally achieve a fully seamless, voice-activated AI experience!