AI Voice Assistant

by Orange Digital Center in Circuits > Raspberry Pi

2825 Views, 43 Favorites, 0 Comments

AI Voice Assistant

This project was developed within the Orange Digital Center Morocco , a space dedicated to fostering innovation, creativity, and rapid prototyping. At the FabLab, individuals and teams have access to state-of-the-art tools, including 3D printers, laser cutters, and a variety of electronic and mechanical resources. The center provides a collaborative environment where innovators, entrepreneurs, and students can transform their ideas into tangible products. By focusing on sustainable and impactful solutions .

Imagine walking into the Orange Digital Center’s FabLab and being greeted by an intelligent voice assistant ready to answer your questions, guide you to the right resources, and inform you about upcoming events and workshops. This project brings that idea to life by creating a custom AI-powered voice assistant tailored to the needs of FabLab visitors.

In this tutorial, you’ll learn how to build a voice assistant from scratch using a combination of cutting-edge AI technologies and hardware components.

What Does This AI Assistant Do?

The Orange Digital Center AI Voice Assistant is designed to:

Answer Visitor Questions: Provide accurate and helpful answers about the FabLab, its facilities, and its resources.
Inform About Events: Share details about upcoming training workshops, events, and schedules.
Streamline Assistance: Reduce the workload of FabLab staff by automating visitor support.

Why We Built This Project?

Enhanced Visitor Experience: An interactive assistant makes navigating the FabLab easy and fun.
Practical AI Application: Gain hands-on experience in integrating technologies like Retrieval-Augmented Generation (RAG), Text-to-Speech (TTS), and Speech-to-Text (STT).
Expandability: This project serves as a foundation for future enhancements, such as adding a camera for visitor recognition or developing an advanced user interface.

Supplies

Hardware Components

To build the Orange Digital Center AI Voice Assistant, you’ll need the following hardware:

Raspberry Pi 4 (8GB RAM): The core device to host and run the AI assistant.
3.5-Inch Touchscreen Display: A compact display for user interaction or future UI enhancements.
Microphone: For capturing visitor questions and commands.
Speaker: To deliver audio responses to visitors.
MicroSD Card (16GB or Higher): To install the operating system and store the project files.
Power Supply: A reliable 5V 3A power adapter to ensure smooth operation.
Keyboard and Mouse: For initial setup and configuration.
Fan: To prevent the Raspberry Pi from overheating during extended use, ensuring stable performance.
HDMI Cable: For connecting the Raspberry Pi to external displays, useful for setup and debugging.

Setting Up the Raspberry Pi 4

In this section, we’ll prepare the Raspberry Pi, install the necessary software, and configure the hardware components to build the Orange Digital Center AI Voice Assistant.

Step 1: Set Up the Raspberry Pi

1.1 Flash Raspbian OS

Download Raspberry Pi Imager:
Visit the official Raspberry Pi website to download Raspberry Pi Imager.
Select the OS Version:
For general usage, choose either Raspberry Pi OS (32-bit) or Raspberry Pi OS (64-bit), depending on your performance needs and compatibility with the hardware.
The 64-bit version is recommended for systems with 4GB or more RAM and provides better performance.
Write the OS to the MicroSD Card:
Insert the microSD card into your computer.
Open Raspberry Pi Imager, select the OS version, and choose the SD card. Click Write to begin.

1.2 Initial Configuration

Insert the microSD card into the Raspberry Pi, connect a display using an HDMI cable, and power it on.
Follow the setup wizard to:
Connect to Wi-Fi.
Set your keyboard and regional settings.
Update the system when prompted.

Note : This is a complete guide to install the raspberry pi OS :

https://www.youtube.com/watch?v=vxmO_a5WNI8

Step 2: Accessing the Raspberry Pi

There are several ways to connect to your Raspberry Pi for headless or remote operation:

2.1 Direct HDMI Connection

Use an HDMI cable to connect the Raspberry Pi to a monitor or TV.
Plug in a keyboard and mouse to interact directly with the desktop environment.

2.2 Remote Access via SSH

Enable SSH:

Open the terminal and type:

sudo raspi-config

Navigate to Interface Options > SSH and enable it.

Find the Raspberry Pi’s IP address:

hostname -I

Connect from another device:

Use an SSH client like PuTTY (Windows) or the terminal (Linux/macOS):

ssh pi@<IP_ADDRESS>

Default credentials are pi (username) and raspberry (password). Change the password after logging in.

2.3 Remote Access via VNC

Enable VNC:

Open the terminal and type:

sudo raspi-config

Navigate to Interface Options > VNC and enable it.
Install a VNC viewer on your computer (e.g., RealVNC Viewer).
Connect to the Raspberry Pi using its IP address and the default credentials.

This is the complete guide of how to use vnc viewer :

https://www.youtube.com/watch?v=carRkTXv_8c

Installing Python and Required Dependencies

Check-raspberry-pi-version-and-system-info-board-model.jpg

Step 1: Update the System

sudo apt update && sudo apt upgrade -y

Step 2: Install Python and Pip

sudo apt install python3 python3-pip -y

Step 3: Install Project Dependencies

Clone the project repository and install the required libraries:

git clone https://github.com/abdel2000-dply/ODC-AI-Assistant.git

cd ODC-AI-Assistant

pip install -r requirements.txt

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Configuring the Touch Display

3-5-ips-dpi-capacitive-touchscreen-display-for-raspberry-pi-waveshare-wav-19173-28990468456643_800x.jpg

Install LCD Drivers:

Most 3.5" LCDs come with a driver script. Download the driver from the manufacturer’s website or GitHub repository.
Example for Waveshare 3.5" LCD:

git clone https://github.com/waveshare/LCD-show.git

cd LCD-show/

sudo ./LCD35-show

This script will configure the Raspberry Pi to use the LCD as the primary display.

Reboot the Raspberry Pi:

sudo reboot

Verify the LCD:

After rebooting, the Raspberry Pi desktop should appear on the 3.5" LCD.

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Microphone and Speaker Setup

Set Up the Microphone:

Connect the microphone to the Raspberry Pi.

Test if the microphone is detected:

arecord -l

Record and play back a test audio file:

arecord -D plughw:1,0 -f cd test.wav

aplay test.wav

Set Up the Speaker:

Connect the speaker and test the audio output:

speaker-test -t wav -c 2

Adjust volume using:

alsamixer

Setup Verification

Check Installed Libraries:

Run a simple script to verify the installation of required libraries:

import cohere

import langchain

import whisper

import speech_recognition as sr

from edge_tts import Communicate

print("All libraries installed successfully!")

Test Basic Speech-to-Text:

Use the microphone to transcribe a test input:

import speech_recognition as sr

recognizer = sr.Recognizer()

with sr.Microphone() as source:

print("Say something...")

audio = recognizer.listen(source)

try:

print("You said:", recognizer.recognize_google(audio))

except sr.UnknownValueError:

print("Could not understand the audio")

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Running the AI Voice Assistant

Step 1: Clone the Project

If you haven’t already, clone the project repository:

git clone https://github.com/abdel2000-dply/ODC-AI-Assistant.git

Step 2: Install Project Dependencies

Install the required libraries:

cd ODC-AI-Assistant

pip install -r requirements.txt

Step 3: Run the Project

Run the AI Voice Assistant:

python3 main.py

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Breakdown of the Key Libraries and Techniques Used in This Project

Here’s a breakdown of the key libraries and their roles in the project :

1. Speech Recognition and Audio

speech_recognition: For capturing and transcribing voice input.
pyaudio: For recording audio from the microphone.
edge_tts: For text-to-speech (TTS) functionality.
mpv: For playing audio files.

2. AI and Language Processing

langchain: For handling conversational AI and context management.
cohere: For advanced language model integration.
sentence-transformers: For generating embeddings for text similarity.

3. Web Scraping

selenium: For scraping event data from the Orange Digital Center website.
webdriver_manager: For managing the Chrome WebDriver.

4. Vector Storage and Search

faiss-cpu: For efficient vector storage and similarity search.
langchain_huggingface: For integrating Hugging Face embeddings.

5. Utilities

dotenv: For managing environment variables.
asyncio: For asynchronous programming.
tkinter: For the graphical user interface (GUI).

Here's the Used techniques to handle the project :

1. Asynchronous Programming

The project uses asyncio to handle asynchronous tasks like speech recognition and text-to-speech. This ensures smooth interaction without blocking the main thread.

2. Vector Storage and Search

The FAISS library is used for efficient vector storage and similarity search. This allows the assistant to retrieve relevant information quickly.

3. Context Management

The LangChain library is used to manage conversation context, ensuring the assistant can handle follow-up questions and maintain context across interactions.

4. Modular Design

The project is designed in a modular way, with separate handlers for speech recognition, conversational AI, and GUI. This makes the code easier to maintain and extend in the future.

Speech Recognition

The project uses the speech_recognition library to capture voice input. Here’s the key code snippet with comments to explain it:

import speech_recognition as sr

# Speech Recognition

# The project uses the speech_recognition library to capture voice input. Here’s the key code snippet:

def recognize_speech_from_mic(language='en-US', device_index=3):

# Create a Recognizer instance to process speech

recognizer = sr.Recognizer()

# Use the specified microphone device for capturing audio

# Ensure that the correct `device_index` is passed as an argument

with sr.Microphone(device_index=device_index) as source:

print("Please say something:")

# Adjust for ambient noise to improve recognition accuracy

recognizer.adjust_for_ambient_noise(source, duration=1)

try:

# Listen for speech input from the microphone with a timeout

audio = recognizer.listen(source, timeout=10)

# Convert the speech audio to text using Google Web Speech API

text = recognizer.recognize_google(audio, language=language)

print(f"You said: {text}")

return text

except sr.UnknownValueError:

# Handle the case where the speech is unintelligible

print("Unable to recognize speech")

return None

except Exception as e:

# Handle other potential errors (e.g., device or network issues)

print(f"An error occurred: {e}")

return None

# Example usage (ensure the correct device_index for your microphone):

# recognize_speech_from_mic()

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Web Scraping

The selenium library is used to automate a web browser and scrape event data from the Orange Digital Center website. In this snippet, the Chrome WebDriver is configured with headless options to run in the background, navigates to the events page, and extracts event details such as titles and dates using specified class names. Finally, the script prints the extracted data and closes the browser.

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.by import By

import time # Importing time to allow for pauses (e.g., while waiting for elements to load)

# Function to scrape event data from the Orange Digital Center website

def scrape_events():

# Set up Chrome options for headless browsing

chrome_options = Options()

chrome_options.add_argument("--headless") # Ensures the browser runs in the background (no UI)

chrome_options.add_argument("--no-sandbox") # Disables sandboxing for security; often needed in server environments

chrome_options.add_argument("--disable-dev-shm-usage") # Prevents memory-related crashes in Docker containers

chrome_options.add_argument("--ignore-certificate-errors") # Ignores SSL certificate errors for seamless browsing

# Create a new instance of the Chrome WebDriver with the specified options

driver = webdriver.Chrome(options=chrome_options)

# Navigate to the Orange Digital Center events page

driver.get("https://www.orangedigitalcenters.com/country/ma/events")

# Wait for events to load completely

time.sleep(5) # This is a simple pause; consider using WebDriverWait for a more robust solution

# Extract event details from the loaded webpage

# Find all elements with the class name "event-detail"

events = driver.find_elements(By.CLASS_NAME, "event-detail")

for event in events: # Iterate through each event element

# Extract the event title

title = event.find_element(By.CLASS_NAME, "event-title").text

# Extract the event date

date = event.find_element(By.CLASS_NAME, "event-date").text

# Print the extracted details to the console

print(f"Event: {title}, Date: {date}")

# Close the browser once the extraction is complete

driver.quit()

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

Conversational AI

The project leverages LangChain and Cohere to create an advanced conversational AI system. In this snippet, LangChain handles memory and retrieval functionality, while Cohere powers the underlying language model. The system uses a ConversationalRetrievalChain for contextual responses, maintaining chat history with a ConversationBufferMemory and retrieving relevant information using a vector store. The get_response method processes user questions and provides intelligent, context-aware answers.

from langchain.chains import ConversationalRetrievalChain # Import the ConversationalRetrievalChain class for handling conversation-based chains

from langchain.memory import ConversationBufferMemory # Import the ConversationBufferMemory class to store chat history

from langchain_cohere import ChatCohere # Import the ChatCohere class to integrate Cohere's language model

class LangChainHandler:

def __init__(self):

# Initialize the handler with Cohere's language model (replace "your_cohere_api_key" with actual API key)

self.llm = ChatCohere(api_key="your_cohere_api_key")

# Initialize the memory buffer to store the conversation history (key used to reference memory)

self.memory = ConversationBufferMemory(memory_key="chat_history")

# Set up the conversational chain, integrating the LLM (Cohere), memory, and retriever from the vector store

self.chain = ConversationalRetrievalChain.from_llm(

llm=self.llm, # Pass the Cohere language model

memory=self.memory, # Pass the memory buffer to track conversation context

retriever=vector_store.as_retriever() # Use the vector store retriever to fetch relevant information

)

def get_response(self, question):

# Process the user's question using the conversational chain and retrieve the response

response = self.chain({"question": question})

# Return the answer from the response object

return response["answer"]

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

GUI With Tkinter

The AI Voice Assistant features a clean and user-friendly interface displayed on the Raspberry Pi screen, designed for simplicity and ease of use. The interface allows users to interact seamlessly with the assistant. It’s the perfect blend of functionality and simplicity, tailored to enhance your experience.

Note : This is the link to the github repository where you can find the complete code :

https://github.com/abdel2000-dply/ODC-AI-Assistant/tree/main/tests

3d Model Case

How to Assemble Your AI Voice Assistant Case!

As part of our AI assistant project, we utilized 3D printing to create a custom case (boitier) for the Raspberry Pi and the 3.5-inch display. The 3D printed case securely houses both components, offering protection against dust and physical damage while allowing easy access to necessary ports. Additionally, we designed and printed a stand to hold the Raspberry Pi and display at the optimal angle for visibility and convenience. This custom 3D printed housing and stand not only enhances the aesthetics of the project but also provides a functional and protective solution for the components.

Downloads

Daring Turing.stl

CCR10MAX_RaspberryPi4 Gehause_V1.1_Bigger-SD_Slot.stl

Holder.stl

Holder1.stl

Adding a Fan to Prevent Overheating in the AI Assistant

To prevent the Raspberry Pi (RPI) from overheating during the operation of the AI assistant, we added a fan to the project. The fan is linked with a 3.5-inch display, which is used to monitor the RPI’s temperature in real-time.

Executing the AI Voice Assistant Code

This project delivers a custom AI voice assistant designed to answer user questions and provide event information. By integrating speech recognition, text-to-speech, and conversational AI, it creates a seamless and interactive experience. Built with modularity and practicality in mind, it’s a functional tool that enhances user support and showcases the potential of AI in real-world applications.

While this tutorial covers the core functionality, the project is designed with scalability in mind. In future updates, we plan to:

Develop the User Interface Further: Enhance the existing UI for a more intuitive and engaging user experience.
Improve Language Support: Expand support for additional languages.
Add a Camera: Incorporate a camera for facial recognition or visual interaction, adding new dimensions to the assistant’s capabilities.

These enhancements will make the assistant even more versatile and user-friendly, opening up new possibilities for its application.