Voice Assistant Head Gear

by hannu_hell in Circuits > Raspberry Pi

1183 Views, 10 Favorites, 0 Comments

Voice Assistant Head Gear

This is a project where I go out of my way to design a head gear to assist me in some daily tasks. The head gear is voice activated and some basic functions have been programmed such as translating text, reading out documents, playing YouTube videos and so on. It houses a Raspberry Pi 4B with 4GB of RAM and an Arduino Nano. The 3D model was designed using Fusion 360 for this project. The Raspberry Pi uses VOSK offline speech recognition which implements an English language model to recognize speech. Its worth noting that to run the VOSK speech recognition toolkit you would need a minimum configuration of a Raspbian OS running on a Raspberry Pi 3.

Apart from the voice commands the head gear has a headset and a 5 inch 800 x 480 screen which supports HDMI interface. The screen is able to lower down and retract back up on voice command. The Vosk toolkit can be installed using the pip command ( pip3 install vosk ) and can be programmed in python as well as many other languages. In this project I have used python as it is easy to get up and running fast.

Supplies

For this project you would need the following items but you may interchange an item with a similar one or according to availability and price.

1 x Raspberry Pi 4B - 4GB
1 x 3D Printer and PLA filament
1 x Small USB Microphone
1 x Generic Headset ( Any headset and if connectivity by Bluetooth is available it would be better, but since mine was unable to connect via Bluetooth I opted for an audio jack.
1 x 5 inch HDMI interface screen
1 x Arduino Nano
Several M2.5 Hex Screws and Nuts
Several Washers (5 mm outer diameter)
1 x ON/OFF Switch
1 x 5v cooling fan (40 x 40 mm)
2 x 3 LED strips ( I used green )
1 x 40cm HDMI cable (micro to normal)
Some 22 AWG wires
1 x 7.4 V LIPO Battery
1 x Power Bank and a cable with Type C and Type A
1 x Micro USB Cable with open end wires ( You can strip the wires of a normal cable if you have one lying around)
1 x 25kg/cm Digital Servo Motor
1 x Radial Bearing (8x14x4 mm - Inner x Outer x Height)
1 x 10 cm USB Type A to Type C cable ( Arduino Nano and Raspberry Pi connection)
1 x Voltage Regulator
1 x Nylon straps for keeping the head gear in place
2 x Aluminum angle brackets (10 cm length)
Some flexible wire guides to house the HDMI and screen power cables (optional)
1 x Aluminum plate ( 8 x 5 x 2 cm - Length x Width x Thickness )

Build Design - Part 1

The head gear was designed and printed in parts and finally assembled. The main arc like structure that goes over the head is the base in which I started to build on. This arc was dimensioned according to my head measurements and then the rest of the components there after. Once the arc structure was 3D printed, it was attached to the headset at the midpoint. I drilled four 3 mm holes in the middle section of the headset and then placed the Aluminum plate in between sandwiching the arc like structure and the headset. This was done to increase the structural strength. Its important to keep in mind when you design the arc like structure to give just enough room on the sides so that when the headset is worn it had enough room to stretch outwards.

Next up was the design of the moveable arms where one is connected to the servo and the other is hinged with a radial bearing in place to the arc structure. An aluminum rod ( 8 mm diameter and 60 mm length) was used to hinge the joint through the bearing. The two hinge arms are not connected yet but will be assembled with the structure that houses the 5 inch screen. Once the screen structure was designed and printed all three parts were assembled and joined together with the hex screws, nuts and washers.

The Raspberry Pi is powered externally with a power bank unlike the Arduino and the other peripherals such as LEDs, cooling fan, screen, and the servo motor. They are powered by the 7.4V Lipo battery which can be switched on and off with a switch.

Build Design - Part 2

The top part of the arc structure is flat in shape and a base plate is glued on top and all the electronics get housed in a casing on top of the base plate. Its worth noting that at this point the placing of the components inside the electronics case depended a lot on the achieving a good balance to determine the center of gravity and ergonomics, hence the Raspberry Pi was shifted a bit back to compensate for the weight of the hinged arms and the screen which when lowered would create a moment about the hinge to topple the headgear forwards. The Lipo battery was also moved accordingly to achieve a good center of gravity.

After packing the electronics in the compartment it was time to seal it up. The top panel covering the electronics housed a small cooling fan to provide sufficient cooling for the Raspberry Pi.

Build Design - Part 3

The two strips (each having 3 LEDs) of LEDs were fitted on a slot on both sides of the headgear. The Top panel where the electronics were placed needed added support and so I designed a small structure connecting the arc structure to the top plate on both sides and the LED strips were slotted on these support structures.

The top of the right hinge arm has a magnet and when retracted to 90 degrees in the upright position it comes into contact with another magnet fixed to the top panel which holds the screen upright. The magnets are encased in plastic to provide the right amount of gap between them so that when the screen is in the upright position it has enough attraction to keep it in position but not too strong so that the servo can separate them when the screen is lowered. You would need to find a balance by testing with different material to weaken and strengthen the magnetic field or increasing or decreasing the distance between the magnets.

Once all the structural assemblies were complete the HDMI cable and screen power cables were routed through a flexible material in a way that it doesn't interfere with the motion of the hinged arms. Finally, two aluminum brackets were bolted to the ends of both sides of the arc structure to prevent the hinge arms from moving further down from the line of vision of the person wearing it. When the screen is in the viewing position, the hinged arms rest on the aluminum brackets preventing the arms going further down.

Electronics

The servo moves the hinged arms to desired locations and the signal to the servo is detached when it is in the upright or lower positions. This meant that on both resting positions the hinged arms were supported without a force from the servo. At the retracted position the magnets hold the arms in position and when the screen is lowered, the aluminum brackets are used to rest the hinged arms on them. This was necessary as the servo would need just a few hundred milliamps to move the arms to the desired locations and the rest was taken care of without the need for a holding torque from the servo.

The Raspberry Pi communicates via UART (Serial Communication) with the Arduino Nano to control the LEDs, cooling fan, power to the screen and the servo motor. Besides the servo motor, all the others peripherals were switched on and off electronically using 3 TIP31C transistors. The power to the Arduino and the components mentioned above were supplied by the 7.4 V Lipo Battery which also provides the 5V via a voltage regulator for the LEDs, screen, and the cooling fan. The servo runs on 7.4 V.

Downloads

Schematic_Head_Gear.pdf

Arduino Programming

Voice Assisted Head Gear #circuits #arduino #raspberrypi #virtualassistant

/*

Arduino code for serial communication with the Raspberry Pi and control of peripherals.

*/

// Include Servo Library
#include <Servo.h>

// Create Servo Object
Servo myservo;

// Definitions
const int led = 3;
const int fan = 6;
const int screen = 5;

// Variables
bool fan_on;
bool screen_down = false;
String command;
int pos = 70;
bool actuate  = false;

void setup() {
  Serial.begin(9600);
  pinMode(led, OUTPUT);
  pinMode(screen, OUTPUT);
  pinMode(fan, OUTPUT);
  digitalWrite(led, HIGH);
}


void loop() {
// Read serial data from Raspberry Pi as strings
  if (Serial.available()){
    command = Serial.readStringUntil('\n');
    command.trim();
    if (command.equals("fanon")){
      fan_on = true;  
    }
    else if (command.equals("fanoff")){
      fan_on = false;
    }
    else if (command.equals("screendown")){
      screen_down = true;
      actuate = true;
    }
    else if (command.equals("screenup")){
      screen_down = false;
      actuate = true;
    }
  }
  if (fan_on){
    digitalWrite(fan, HIGH);
  }
  else if (!fan_on) {
    digitalWrite(fan, LOW);
  }

// actuate variable is true when screen motion commands are issued
// after moving the screen the servo is detached 
  if (actuate){
    if (screen_down){
      myservo.attach(9);
      delay(100);
      for (pos = 160; pos >= 50; pos -= 1){
        myservo.write(pos);
        delay(15);
      }
      delay(500);
      myservo.detach();
      digitalWrite(screen, HIGH);
    }
    if (!screen_down){
      digitalWrite(screen, LOW);
      myservo.attach(9);
      delay(100);
      for (pos = 50; pos <= 169; pos += 1){
        myservo.write(pos);
        delay(15);
      }
      delay(500);
      myservo.detach();
    }
    actuate = false;
  }


}

Raspberry Pi Programming

# Raspberry pi code issuing commands via Serial to the Arduino

#Import serial and time modules
import serial
import time

if __name__ == '__main__':
    # Setup serial connection and flush the buffer of any data before transmission
    # Rest of the code just sends out data typed by the user in the console to the Arduino
    # which are for controlling the peripherals.
    ser = serial.Serial('/dev/ttyUSB0',9600, timeout=1)
    ser.flush();
    command = input('type')
    if command == "fanon":
        ser.write(b"fanon\n")
        print("fan on")
    elif command == "fanoff":
        ser.write(b"fanoff\n")
    elif command == "screenon":
        ser.write(b"screenon\n")
    elif command == "screenoff":
        ser.write(b"screenoff\n")
    elif command == "screenup":
        ser.write(b"screenup\n")
    elif command == "screendown":
        ser.write(b"screendown\n")

The Setup

Basically I have setup a main python program which run on startup of the Raspberry Pi to autonomously run several smaller python scripts to to certain tasks based on the voice commands from the person wearing the head gear. As of now it is able to traverse the file system on voice commands and do certain tasks directed by the user. As of now I have setup several smaller scripts which do tasks such as reading a pdf or a text document in the reading list folder and translating text from pictures or files. In addition to this I have used the pyautogui python library to navigate the browser and perform certain operations such as going to a specific web page such as searching YouTube for a video. This would play the first video that comes up that matches the search criteria dictated by the user.

The script below uses pyautogui library to navigate the OS and run the Raspberry Pi script to issue commands to control the peripherals over the serial communications with the Arduino nano.

import pyautogui
import sys
import time

command = sys.argv[1]
# print(command)

# Open the terminal 
pyautogui.hotkey('win')
pyautogui.write('term')
pyautogui.hotkey('enter')
time.sleep(1)

# Type in the command to run the console_control.py script
pyautogui.write('python3 /home/console/Documents/console_control.py')
pyautogui.hotkey('enter')
time.sleep(1)

# Enter the command to send to to the Arduino
pyautogui.write(command)
time.sleep(1)
pyautogui.hotkey('enter')
time.sleep(2)

# Clean exit
pyautogui.write('exit')
time.sleep(1)
pyautogui.hotkey('enter')

The next script is for translating a text from an image which is also run from the main program as a sub script. In order for the voice commands to work you need to install espeak and pyttsx3 which can be both installed simply using pip install. There are some other options which you can explore but I would caution that some are OS specific and I found these two work well on Linux although the voice may sound a bit weird. For translating text you can use the deep_translator library which works with many translating engines such as google, bing etc. Finally you need to install pytesseract to extract text from images for this to work. It may sound daunting to but the installations are very straight forward and you will be up and running in a click.

from deep_translator import GoogleTranslator
from PIL import Image
import pytesseract
import pyttsx3

engine = pyttsx3.init("espeak")
engine.setProperty('voice', 'english_rp+f3')
engine.setProperty('rate', 160)
text = pytesseract.image_to_string(Image.open('/home/console/Pictures/chinesetext.png'), lang="chi_sim")
translated = GoogleTranslator(source='zh-CN', target='en').translate(text)
print(translated)
engine.say(translated)
engine.runAndWait()

Another script is used to scrape data from wikipedia if a page exists for an inquired search criteria by a user. You need the python native OS module and the wikipediaapi library for this. Below you can see the calls to wikipediaapi to get search data and gets stored into a text file which then can be narrated back to the user.

import wikipediaapi
import os

wiki_wiki = wikipediaapi.Wikipedia('en')
page_py = wiki_wiki.page('Autodesk')

if page_py.exists():
	print("Page Exists")
	print(page_py.title)
	text = page_py.summary
	with open("Autodesk.txt", "w") as f:
		f.write(text)
else:
	print("Page not found!")

The next script is to search videos on YouTube which I often do specially music videos while I am multitasking. For this script we need the modules webbrowser to open the browser and traverse, pyautogui to mimic keypresses and mouse clicks, sys module for getting the argument passed when running the script which will be the search query for YouTube, time module and youtubesearchpython module for searching the video.

import webbrowser
import pyautogui
import sys
from time import sleep
from youtubesearchpython import VideosSearch


args = sys.argv
query = args[1]


Search = VideosSearch(query, limit=1)
url = Search.result()["result"][0]["link"]
print(url)
webbrowser.open(url)
sleep(2)
pyautogui.moveTo(762, 153)
sleep(2)
pyautogui.click()
pyautogui.moveTo(800, 5)
sleep(8)
pyautogui.hotkey('f')
sleep(2)

These are the scripts i have come up with so far, but this is only the beginning. You can expand on automating it for as much as you like as long as you keep your main script clean and organized. Now! for the fun stuff. Lets jump in the main script where you can customize the feedback and response from your virtual assistant.

from vosk import Model, KaldiRecognizer
import pyaudio
import pyttsx3
import os
from time import sleep
from PIL import Image
from deep_translator import GoogleTranslator
import pytesseract
from PyPDF2 import PdfReader

model = Model(r"/home/console/Downloads/vosk-model-small-en-us-0.15")
recognizer = KaldiRecognizer(model, 16000)

mic = pyaudio.PyAudio()

engine = pyttsx3.init("espeak")

engine.setProperty('voice', 'english_rp+f3')

engine.setProperty('rate', 160)


listening = False
active_mode = False
youtube_active = False
translate_text_image = False
read_file = False
wiki_search_active = False
navigate_file_system = False


def get_command():
    listening = True
    stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
    while listening:
        stream.start_stream()
        try:
            data = stream.read(4096)
            if recognizer.AcceptWaveform(data):
                result = recognizer.Result()
                response = result[14:-3]
                listening = False
                stream.close()
                return response
        except OSError:
            pass
    
while True:
    print("Waiting for command...")
    command = get_command()
    if command == "":
        pass
    elif command == "console":
        active_mode = True
        engine.say("Hello Hanoon.")
        engine.runAndWait()
    elif command == "sleep":
        active_mode = False
        engine.say("Alright, bye bye.")
        engine.runAndWait()
        listening = False
        break
    if active_mode:
        command = get_command()
        if command == "screen down":
            engine.say("Okay. I will lower the screen")
            engine.runAndWait()
            os.system('python3 /home/console/Documents/automate.py screendown')
            active_mode = False
        elif command == "screen up":
            engine.say("Okay. I will retract the screen")
            engine.runAndWait()
            os.system('python3 /home/console/Documents/automate.py screenup')
            active_mode = False
        elif command == "fan on":
            engine.say("Okay. I will enable cooling fan")
            engine.runAndWait()
            os.system('python3 /home/console/Documents/automate.py fanon')
            active_mode = False
        elif command == "fan off":
            engine.say("Okay. I will Disable cooling fan")
            engine.runAndWait()
            os.system('python3 /home/console/Documents/automate.py fanoff')
            active_mode = False
        elif command == "who are you":
            engine.say("My name is console, I was born on May 10 2023. I was created to assist Hanoon with his daily tasks")
            engine.runAndWait()
            active_mode = False
        elif command == "search you tube":
            engine.say("What would you like me to search on youtube?")
            engine.runAndWait()
            youtube_active = True
            active_mode = False
        elif command == "exit browser":
            os.system('killall -9 "chromium-browser"')
            active_mode = False
        elif command == "translate text":
            engine.say("Would you like to translate from an image or a file")
            engine.runAndWait()
            response = get_command()
            if response == "image" or response == "file":
                translate_text_image = True
                active_mode = False
        elif command == "read file":
            engine.say("I am ready to read from a file for you")
            engine.runAndWait()
            read_file = True
            active_mode = False
        elif command == "":
            pass
            active_mode = False
        else:
            engine.say("I don't understand that yet!")
            engine.runAndWait()
            active_mode = False
    if youtube_active:
        youtube_search_query = get_command()
        engine.say("do you want me to search.")
        engine.say(youtube_search_query)
        engine.runAndWait()
        command = get_command()
        if command == "yes":
            engine.say("Searching first video on youtube for.")
            engine.say(youtube_search_query)
            engine.runAndWait()
            os.system("python3 /home/console/Documents/youtubesearch.py {}".format(youtube_search_query))
            youtube_active = False
            active_mode = False
        elif command == "no":
            engine.say("Sorry, could you please repeat the search query")
            engine.runAndWait()
            active_mode = False
        elif command == "stop search":
            engine.say("Well, Alright. Aborting you tube search")
            engine.runAndWait()
            youtube_active = False
            active_mode = False
    if translate_text_image:
        engine.say("Shall I open the pictures folder for you to choose an image to translate?")
        engine.runAndWait()
        command = get_command()
        if command == "yes":
            os.system('pcmanfm /home/console/Pictures/')
            engine.say("Please tell me which image you want to select to translate")
            engine.runAndWait()
            newdir = []
            for i in os.listdir('/home/console/Pictures/'):
                newdir.append(i.split(".")[0])
            response = get_command()
            os.system('killall -9 pcmanfm')
            if response in newdir:
                ind = newdir.index(response)
                image_name = os.listdir('/home/console/Pictures/')[ind]
                os.system('gpicview /home/console/Pictures/{} &'.format(image_name))
                text = pytesseract.image_to_string(Image.open('/home/console/Pictures/{}'.format(image_name)), lang="chi_sim")
                translated = GoogleTranslator(source='zh-CN', target='en').translate(text)
                engine.say(translated)
                engine.runAndWait()
                os.system('killall -9 gpicview')
                translate_text_image = False
            elif response == "stop translate":
                translate_text_image = False
            else:
                engine.say("sorry {} is not in the folder. Please repeat the name of the image file".format(response))
                engine.runAndWait()
                translate_text_image = False
            
        elif command == "no":
            engine.say("Oh i see. my program is still in construction for this feature at the moment")
            engine.runAndWait()
            translate_text_image = False
    if read_file:
        engine.say("Would you like me to open the folder, reading list for you")
        engine.runAndWait()
        response = get_command()
        if response == "yes":
            os.system('pcmanfm /home/console/Documents/ReadingList/')
            engine.say("Which file would you like me to read from the reading list")
            engine.runAndWait()
            newdir = []
            for i in os.listdir('/home/console/Documents/ReadingList/'):
                newdir.append(i.split(".")[0])
            command = get_command()
            os.system('killall -9 pcmanfm')
            if command in newdir:
                ind = newdir.index(command)
                file_name = os.listdir('/home/console/Documents/ReadingList')[ind]
                reader = PdfReader('/home/console/Documents/ReadingList/{}'.format(file_name))
                engine.say('{} document has {} pages'.format(file_name, len(reader.pages)))
                page = reader.pages[0]
                text = page.extract_text()
                engine.say(text)
                engine.runAndWait()
                read_file = False
        elif response == "no":
            engine.say("Oh i see. my program is still in construction for this feature at the moment")
            engine.runAndWait()
            read_file = False
        elif response == "stop reading":
            engine.say("Very well, feel free to let me know if i need to change my reading speed")
            engine.runAndWait()
            read_file = False

As you might notice the first lines are importing the Vosk recognizer and the espeak and pyttsx3 modules for the virtual assistant to respond. The other modules we have discussed above except for the PyPDF2 which allows to extract text from pdf documents which can then be fed to espeak for narrating. At this point you would also need to download the model which you will be using and pass the absolute path to its location on the system. Next few lines make voice adjustments by tweaking the properties such as 'rate' at which the engine narrates.

Going into the code we set some boolean variables for what we want to do and initially set them to 'False'. Based on the user input we switch these to True which enables to run a specific function for a specific task. Make sure to reset the state back to False after performing the operation or the program will go haywire. The get_command() function gets runs repeatedly once a task is completed so the user can move to another task. Most of the task fucntions will direct to those mini scripts which keeps the main code clean and easy to manage. You might also notice that I have left responses to some commands as "I don't understand that yet!" and "Program is still in construction" to further expand those areas.

Conclusion

After having completed the project its clear that there is much that can be done on both the design and the program to improve or further develop. From a design perspective I would have preferred to keep the head gear more compact but there were factors to consider for example when viewing the screen it needed to have a certain minimum distance from the eyes to avoid any discomfort in the eyes. For this reason I had to position the screen a bit far which also increased the force on the head gear which was somewhat balanced by the placement of components further back.

Another factor which was considered was to attach a UPS to the Raspberry Pi to power it and after considering some modules available in the market that can provide a steady 3 A and the idea was dropped due to its weight. So an external power supply such as a power bank will be needed to power it.

On the programming front it can be further developed with more functions. The Raspberry Pi is a very good choice for an application like this as it has the capacity to do all the computing as opposed to a microcontroller, which brings me to point out the fact why I used an Arduino for controlling the peripherals instead of the GPIOs of the Raspberry Pi. For one a dedicated piece of hardware for precise timed operations would be best and that is where the Arduino excels.

Having documented this I hope someone benefited from my post here and was of some information. I welcome your opinions and ideas. Thanks!

Get the 3D Model Files