Wearable Text to Audio Translator

I made a (currently unfinished) headband that takes photos/video of the user's surroundings, sends them to a laptop to grab any text from the image and convert the text to an audio recording, then receives the .mp3 audio recording and plays it out of a speaker to the user.

Supplies

Arduino Uno Rev 3
ESP32-Cam WiFi Bluetooth Camera Module
Speakers (any Arduino compatible speakers, I used MakerHawk speakers)
Socket-to-jack jumper wires
Breadboard

Ideation and Brainstorming

Because we were asked to create a piece of tech for a community I don't belong to or have much knowledge of, I spent some time researching struggles that low-vision or blind people face while navigating through the world. While it is of course challenging to navigate unknown situations in general, many people are working on wearable tech like the WeWalk Smart Cane, which integrates map data to help users navigate their surroundings, I wanted to push a little further and see if there were any challenges posed to the community that were less obvious. This article mentioned how much of public information is visual: signs for out of service bathrooms, directions to the nearest subway station. My idea to create a headband that could pick up this visual text, translate it into audio, and provide this to visually impaired people stemmed from this challenge.

Install ESP32 in Arduino IDE

To use the ESP32 -CAM, an additional port add-on must be installed. I followed the following tutorial: Using ESP32 Cam with Arduino

Connect the ESP32-CAM to the Arduino

Follow the wiring diagram above to connect your ESP32-CAM to the Arduino

Photo to Text

The following code takes an image (presumably grabbed from the ESP32-CAM), reads any printed text from the photo as a string into the program using OpenCV and Pytesseract, and prints it to the terminal.

To install OpenCV and Pytesseract, run the following commands in your terminal:

pip install opencv-python
pip install pytesseract

Code:

# program: https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/
# Import required packages
import cv2
import pytesseract

img = cv2.imread('arduino_text.jpg')

print(pytesseract.image_to_string(img))

Note: this program does not work well with hand-written text! That is a great next step for improving this project!

Reading this image into the program:

Produces the following in the terminal:

Text to Audio

The following code takes text (presumably read in from the image sent by the ESP323-CAM), translates it into a voice recording of the text using Google Text-To-Speech (gTTS), then plays the .mp3 file.

To install gTTS, run the following command in your terminal:

pip install gTTS

Code

# reference: https://www.geeksforgeeks.org/convert-text-speech-python/

# Import the required module for text 
# to speech conversion
from gtts import gTTS
import vlc
  
# This module is imported so that we can 
# play the converted audio
import os
  
# The text that you want to convert to audio
mytext = 'Out of Order'
  
# Language in which you want to convert
language = 'en'
  
# Passing the text and language to the engine, 
# here we have marked slow=False. Which tells 
# the module that the converted audio should 
# have a high speed
myobj = gTTS(text=mytext, lang=language, slow=False)
  
# Saving the converted audio in a mp3 file named
# welcome 
myobj.save("out_of_order.mp3")


os.system("out_of_order.mp3")

This produces the attached .mp3 file!

Downloads

out_of_order.mp3

Connect to Arduino IDE (aKA Where It All Went Wrong)

Where this project falls off is the connections. I ran into issues getting programs to upload to my ESP32-CAM through my Arduino. I did quite a bit of troubleshooting on this issue, including switching the power source from 3.3V to 5V and holding the Reset button on the ESP32-CAM down during the connection process. A few sources recommended attaching a 10 uF electrolytic capacitor between the EN pin and GND. This is the last of the recommendations that I have yet to try, beyond switching out my module for another, and it will be the next thing I pursue to try and fix the project.

Here's some resources for troubleshooting any errors that come up with connection the ESP32-CAM:

https://randomnerdtutorials.com/esp32-cam-troubleshooting-guide/
https://rntlab.com/question/esp32-cam-a-fatal-error-occurred-failed-to-connect-to-esp32/

Next Steps

From here on out, most of my work is theoretical/yet to be tested. Ideally, once I'm able to upload sketches to the ESP32-CAM, I will be able to create a functional webserver by which I can send images over WIFI from the ESP32-CAM to my computer for processing. Those images will be processed by the text recognition and text-to-speech programs above and a .mp3 of the speech will be send back to the Arduino to be played out of the speakers. I've attached some low-detail sketches of the general system

Sewing Headband

Though the electric connections aren't... functional yet, I mocked up a headband for when this could actually be wearable! It consists of a piece of elastic threaded through cotton tubing, with a pocket at the front for the ESP32-CAM and a larger pocket in the back to house the Arduino. An additional pocket will be required to house the speaker, and I plan to thread the wires through a fabric casing when the project is finished to give it a clean look!