MaTouch Work With Open Ai & ChatGPT
by Lan_Makerfabs in Design > Software
440 Views, 4 Favorites, 0 Comments
MaTouch Work With Open Ai & ChatGPT
The MaTouch AI ESP32S3 2.8" TFT ST7789V board integrate I2S voice input/I2S speaker/ 3 million camera OV3660/ 320*240 resolution display, with ESP32S3 strong processor& Wifi ability, to make this board a good tool/platform for AI development with ESP32.
Recently, we successfully connected the MaTouch AI 2.8" board to OpenAI, enabling real-time voice interaction. With just your voice, you can talk directly to the device — it listens, understands, thinks, and responds with natural speech output.
Supplies
Hardware:
- MaTouch AI ESP32S3 2.8" TFT ST7789V*1
- Type-C USB Cable*1
Software:
- ESP-IDF Development Environment
- OpenAI API Key
What Is Open Ai?
OpenAI provides a powerful suite of AI models capable of understanding natural language, generating human-like responses, it integrates STT, TTS, and access to AI APIs. By integrating OpenAI’s API, developers can easily bring intelligent conversational abilities into embedded systems -- turning traditional hardware into truly “smart” devices.
How to Implement in MaTouch?
The MaTouch AI board communicates with OpenAI through three major steps: Speech-to-Text (STT) -- AI model model (GPT) -- Text-to-Speech (TTS).
- Speech-to-Text (STT)
The user’s voice is recorded through the microphone and sent to OpenAI’s ’s STT model, which converts the audio into accurate text in real time.
- AI model Processing (GPT)
The recognized text is transmitted to OpenAI’s AI model(Such as GPT-3.5). The model understands the context and generates a response.
- Text-to-Speech (TTS)
The AI-generated text is sent to OpenAI’s TTS model, which produces a voice response. The MaTouch AI board then plays this voice output through the I2S speaker.
Set Up the ESP-IDF Development Environment
Before you begin, please ensure that esp-idf is installed on your computer. If not, click Get Started with esp-idf to complete the installation.
Get Open AI API Keys
- Sign in or register on the OpenAI platform.
- Click Start building, fill in the relevant information.
- Enter the project name and key name. You may also use the default.
- Copy your key and click “Continue”.
- Please ensure your account has sufficient funds; otherwise, the key will not function.
- You can return to the overview page, click the Settings button, and view the API information.
How the Code Works?
ai_task() is the core function that implements the entire AI dialogue system, completing:
- 1.Speech-to-Text (STT)-- Sends audio recorded by the microphone to OpenAI to obtain text results.
- Language Understanding and Generation-- Passes the recognized text to the GPT model to generate a brief response.
- Text-to-Speech (TTS)-- Converts the GPT response back into speech and plays it aloud.
Key part of the code
- Create the OpenAI client.
Initializes the OpenAI client using your API key. This client handles all communication with the OpenAI cloud services.
- Create functional modules
Audio Transcription (STT) – Converts recorded speech into text.
Chat Completion (GPT) – Generates a response based on recognized text.
Audio Speech (TTS) – Converts the response text back into speech.
- Set module parameters
STT settings: Set the language to English, with an output stability of 0.2 (lower values indicate greater stability).
AI setting: Set the chat model to gpt-3.5-turbo and define it as an assistant.
TTS settings: Set the voice model to tts-1-hd and the voice type to alloy.
- Speech-to-Text (STT)
Sends the recorded audio buffer to OpenAI for transcription, receiving text output.
- Text-to-Response (ChatCompletion)
Passes the transcribed text to the GPT model, which generates a text response.
- Text-to-Speech (TTS)
Converts the GPT-generated response into speech and plays it through the speaker.
Upload the Code
- Open the stt_llm_tts file by VS Code.
- Paste the key you copied earlier from OpenAI into the code.
- Set the target chip to ESP32S3.
- Change your WiFi information.
- Set Partition to “Custom partition table CSV”.
- Set Flash size to 16MB.
- Enable “Support for external, SPI-connect RAM” and set mode to “Octal Mode PSRAM”, finally click “Save”.
- Use Type-C USB cable to connect the board and PC, select the corresponding port and Flash Device.
Result
For a demonstration of OpenAI, you can check YouTube.
- Click “RECORD” to start a conversation;
- Click “PLAY RECORD” to play back the recent recording.