Vision Glasses V2 for the Blind to Transcribe Text to Audio

by akhilnagori in Circuits > Raspberry Pi

1527 Views, 16 Favorites, 0 Comments

Vision Glasses V2 for the Blind to Transcribe Text to Audio

IMG_4550.jpg

This project is the second, more advanced version of my Smart AI Glasses for the Blind to Transcribe Text to Audio in real time using Raspberry Pi Zero. These glasses scan text and broadcast the text aloud. Allowing visually impaired individuals to have access to more written text that isn’t translated in a form understandable to them (braille, audio, etc.).

The glasses still use all the same materials as the first iteration (Materials will be elaborated on later on) but, I also added a couple of other new components that make the glasses easier to use. Such as two pushbuttons that allow the user to control when the glasses turn on, and when the camera takes a picture. I also improved the code for more accurate and quick results.

I was inspired to pursue this project when I went to India. In the building where I stayed, there was a blind child who enjoyed listening to stories read to him by his parents. However, he couldn't read any stories by himself. Although he had access to a small number of braille books, there were many pieces of text that he needed help accessing. His story inspired me to create something that would enable him to access more pieces of text.

Supplies

matereaks02-10 232755.png

Materials used:


Raspberry Pi Zero 2 W

Zero Spy Camera for Raspberry Pi Zero

Two mini speakers

Any type of PLA Filament (I used Bambu Lab Filament)

PCB Circuit Board

3.7 volt Lithium Ion Battery - preferably with more than 1.2 amps

Jumper Wires - male to male

Power Booster to make power supply sufficient for Raspberry Pi



Tools Needed:


3d Printer

Solder Iron

Software

Screenshot 2025-02-10 085936.png
Screenshot 2025-02-10 090101.png
Screenshot 2025-02-10 090159.png

We completely redesigned the software: Now instead of using the local Doctr OCR for text extraction, we are using an API which is much faster, and running a subprocess which allows it to call the camera capture from inside the code:


Deploying to the Raspberry Pi:


Unless you are modifying the code, you will most likely just need to upload the code to the Raspberry Pi. To start off, enable SSH on the Raspberry Pi, so you can use the terminal directly through your other computer.

Code:

πŸ”Ή Update Your System

Before installing dependencies, update your Raspberry Pi:


bash

sudo apt update && sudo apt upgrade -y

πŸ”Ή Enable the Camera

If using Raspberry Pi OS:


Open Raspberry Pi configuration:

bash

sudo raspi-config

Go to Interfacing Options β†’ Camera β†’ Enable.

Reboot your Raspberry Pi:

bash

sudo reboot

πŸ“¦ Step 3: Install Required Software

πŸ”Ή Install Python Dependencies

bash

pip install RPi.GPIO requests espeakng opencv-python

πŸ”Ή Test if RPi.GPIO is Installed

bash

python3 -c "import RPi.GPIO; print('RPi.GPIO is installed!')"


πŸ“ Step 4: Write the Python Code

Create a new Python script:


bash

nano button_ocr.py

Paste the following complete script:


import RPi.GPIO as GPIO
import requests
import espeakng
import cv2
import subprocess
import time

# Define GPIO pin for the button
BUTTON_PIN = 17

# Set up GPIO
GPIO.setmode(GPIO.BCM) # Use BCM pin numbering
GPIO.setup(BUTTON_PIN, GPIO.IN, pull_up_down=GPIO.PUD_DOWN) # Internal pull-down

def capture_and_process():
"""Captures an image, processes it with OCR, and converts text to speech."""
image_path = "captured.jpg"
# Capture image using libcamera-jpeg
subprocess.run(["libcamera-jpeg", "-o", image_path, "--width", "640", "--height", "480"])
# Verify if image was saved
image = cv2.imread(image_path)
if image is None:
print("❌ Failed to capture image!")
return
print("βœ… Image captured successfully!")

# OCR function
def ocr_space_file(filename, api_key='helloworld', language='auto', ocr_engine=2):
api_url = 'https://api.ocr.space/parse/image'
payload = {
'isOverlayRequired': False,
'apikey': api_key,
'language': language,
'OCREngine': ocr_engine,
}
with open(filename, 'rb') as f:
response = requests.post(api_url, files={'filename': f}, data=payload)

if response.status_code == 200:
result = response.json()
if 'ParsedResults' in result and result['ParsedResults']:
return result['ParsedResults'][0].get('ParsedText', '').strip()
else:
print("⚠️ No text found in the image.")
return ""
else:
print(f"❌ OCR Error: {response.status_code}, {response.text}")
return ""

# Run OCR
text = ocr_space_file(image_path)

if text:
print(f"πŸ“ Extracted Text: {text}")
# Convert text to speech
tts = espeakng.Speaker()
tts.wpm = 100
tts.say(text.replace("\r\n", " "), wait4prev=True)
else:
print("⚠️ No text extracted from the image.")

# Main loop to wait for button press
print("πŸš€ Waiting for button press to capture an image...")

try:
while True:
if GPIO.input(BUTTON_PIN) == GPIO.HIGH: # Button is pressed
print("πŸ”˜ Button Pressed! Capturing image...")
capture_and_process()
time.sleep(1) # Debounce delay
except KeyboardInterrupt:
print("\nπŸ›‘ Program terminated.")
GPIO.cleanup() # Clean up GPIO settings
Save the file (CTRL + X, then Y, then ENTER).


πŸš€ Step 5: Run the Program

Run the script:

python3 button_ocr.py
  1. The program will wait for a button press.
  2. Press the button β†’ It captures an image.
  3. The OCR extracts text.
  4. The text is spoken using espeakng.



After following steps above, you are now ready to run the code. Upload the main.py file and tts.py file to the Raspberry Pi. Then copy the code from tts.py to the end of main.py, so they run in one execution. Now, you should have a working text to audio glasses, but you must replace the test image in main.py to <imagename>.jpg. This will be used later when setting up the sound to the raspberry pi.

Hardware

Screenshot 2025-01-05 222238.png
Screenshot 2025-01-05 214301.png
Screenshot 2025-01-05 214151.png

These are the files you will need to print on a 3d printer. Use the software for your printer to slice this stl file.

Conclusion

Screenshot 2025-01-31 184037.png

In conclusion, we successfully developed a prototype system that uses a Raspberry Pi, a camera module, and a push-button interface to capture images, extract text using Optical Character Recognition (OCR), and read the text aloud using text-to-speech (TTS) technology. This project was designed to assist individuals with visual impairments or reading difficulties by providing an easy-to-use, real-time text-reading solution. We implemented OCR using the OCR.space API and integrated espeakng for speech output. The system was optimized to function effectively in various lighting conditions and text formats, ensuring accessibility and ease of use. Through this project, we demonstrated the feasibility of a low-cost assistive device that enhances independence and daily interactions with printed text.

Demo

Visionary: AI Glasses for Real-Time Text-to-Audio Transcription for the Visually Impaired

This is a demo of the project working.