InVision ~ V2.0.0 a Smart Translating Device for Seeing the World in Your Language.

by DarkFLAME in Circuits > Assistive Tech

142 Views, 3 Favorites, 0 Comments

InVision ~ V2.0.0 a Smart Translating Device for Seeing the World in Your Language.

Hello everyone! Invision Part 2 is finally here, something I know many of my 3K viewers have been anxiously waiting!

We're going to step it up in this next stage of the project by making a wearable summer cap that runs on a Raspberry Pi 4 with 2GB of RAM. This summer cap isn't just clever; it was made with practical applications in mind.

Project Objective

Supporting tourists and residents like me who reside in areas where the language shifts every 100 kilometers is the goal. The purpose of this wearable is to improve daily interactions while on the go and close communication gaps.

Things You'll Need

You don’t need deep programming skills or any fancy hardware, jk you need fancy hardware even my pi is struggling to run this, you can just overclock your pi, or should you? No please don't, if it runs, then don't touch it. This project is built to be accessible and easy to replicate. BTW you can use this on your phone too, its built for 2 modes, one for heavy demanding hardware, and other which is minimalistic, such as for a pi.

......MATERIALS......🪛🔧🧑‍💻

OLED DISPLAY ............🔳
CAMERA (USB CAMERA) ........📷
SUMMER HAT...... 😎🤠
AND OUR RASPBERRY PI.......
AND IT'S PERIPHERALS........................

Supplies

1) Get a Single Board Computer (small and which doesn't produce a lotta heat or your head will hurt) .......

2) Configure the SBC...

3) A camera (USB/ribbon cable) ....

4) Display (In my case 128x64 0.96 Inch SSD1306) ....

5) Microphone (preferably a pair of Airpods or something like that) ....

6) Code....

🧠PLANNING!!

|: _🎯Specify the Plan and Scope🏳️_:|

Project Goals: What issue are you resolving? Specify goals.

The Invision Project is designed with a clear mission: to create an accessible, wearable technology that helps people overcome language barriers and convenient communication in diverse and mobile environments. Here are the goals:

1. Destroy Language Gaps

Enabled real-time translation and visual assistance for people traveling through multilingual regions, where the local language changes frequently.

2. Promote Accessibility

Designed a solution that’s easy to build, even for beginners — requiring minimal programming knowledge and using affordable, widely available hardware like the Raspberry Pi.

3. Wearable and Practical

Integrated the system into a summer cap, making it lightweight, comfortable, and ideal for outdoor use — especially in hot climates and during travel.

Code

Now what do we want to do?

The aim is to build a cap which translate what you hear or see.

So, we'll use VOSK and Tesseract OCR to get our job done for speech and image respectively. We'll use Helinski or Google translate for translation

Here's the code, and if you want me to explain the code, please comment as I'm running low on time.

Please install the following libraries on your pi but first create a venv.

import sounddevice as sd

import queue

import sys

import json

import threading

from vosk import Model, KaldiRecognizer

import tkinter as tk

from tkinter import ttk

import sv_ttk

import cv2

import pytesseract

from PIL import Image, ImageTk

import time

Connections:

Connect the Switch to GPIO 17 or change the code to meet the requirement of what you want to attach the toggle switch to.

The Display needs to be connected like this in the image, or you can look up the internet for that and accordingly you will have to change the code.

Do not forget to connect the camera and the Bluetooth earphones or you will be left wondering "why did that error come..."

Code

After hours of writing, researching, testing and debugging code, finally I have written the code:

Note that this code is for linux and if someone looks forward onto running this on their windows pc, then changes have to be made and please inform me as I upload the compatible instruction and code for windows.

import sounddevice as sd

import queue

import sys

import json

import threading

from vosk import Model, KaldiRecognizer

import tkinter as tk

from tkinter import ttk

import sv_ttk

import cv2

import pytesseract

from PIL import Image, ImageTk

import time

import board

import busio

from PIL import Image, ImageDraw, ImageFont

import adafruit_ssd1306

import RPi.GPIO as GPIO

pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract' # adjust if needed

model = Model("model") # path to Vosk model folder

SWITCH_PIN = 17

OLED_WIDTH = 128

OLED_HEIGHT = 64

GPIO.setmode(GPIO.BCM)

GPIO.setup(SWITCH_PIN, GPIO.IN, pull_up_down=GPIO.PUD_UP)

#Functions:

#code for clever cap mode

def cc_script():

GPIO.cleanup()

i2c = busio.I2C(board.SCL, board.SDA)

display = adafruit_ssd1306.SSD1306_I2C(OLED_WIDTH, OLED_HEIGHT, i2c)

display.fill(0)

display.show()

image = Image.new("1", (OLED_WIDTH, OLED_HEIGHT))

draw = ImageDraw.Draw(image)

font = ImageFont.load_default()

def wrap_text(text, font, max_width):

words = text.split()

lines = []

current_line = ""

for word in words:

test_line = current_line + word + " "

bbox = draw.textbbox((0, 0), test_line, font=font)

width = bbox[2] - bbox[0]

if width <= max_width:

current_line = test_line

else:

lines.append(current_line.strip())

current_line = word + " "

if current_line:

lines.append(current_line.strip())

return lines

def update_display(text):

draw.rectangle((0, 0, OLED_WIDTH, OLED_HEIGHT), outline=0, fill=0)

lines = wrap_text(text or "No text detected", font, OLED_WIDTH)

for i, line in enumerate(lines):

if i * 10 > OLED_HEIGHT - 10:

break

draw.text((0, i * 10), line, font=font, fill=255)

display.image(image)

display.show()

def text_recognition_once():

cap = cv2.VideoCapture(0)

ret, frame = cap.read()

text = ""

if ret:

gray = cv2.cvtColor(cv2.resize(frame, None, fx=0.5, fy=0.5), cv2.COLOR_BGR2GRAY)

text = pytesseract.image_to_string(gray).strip()

cap.release()

update_display(text)

print("[Text] Detected:", text)

def speech_recognition_once():

samplerate = 16000

q = queue.Queue()

def callback(indata, frames, time, status):

if status:

print(status, file=sys.stderr)

q.put(bytes(indata))

rec = KaldiRecognizer(model, samplerate)

with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',

channels=1, callback=callback):

print("🎤 Listening... Flip switch to stop.")

try:

while GPIO.input(SWITCH_PIN) == GPIO.LOW: # Loop until switch opens

data = q.get(timeout=5)

if rec.AcceptWaveform(data):

result = json.loads(rec.Result())

print("✔", result["text"])

final_text = result["text"]

update_display(final_text)

else:

partial = json.loads(rec.PartialResult())

print("…", partial["partial"])

except queue.Empty:

print("⚠️ No speech detected.")

print("\n🛑 Switch OFF — Stopping speech recognition.")

last_state = None

try:

GPIO.setmode(GPIO.BCM)

GPIO.setup(SWITCH_PIN, GPIO.IN, pull_up_down=GPIO.PUD_UP)

while True:

if GPIO.input(SWITCH_PIN) == GPIO.LOW:

print("Switch is ON (Closed)")

speech_recognition_once()

else:

print("Switch is OFF (Open)")

text_recognition_once()

time.sleep(0.1)

except KeyboardInterrupt:

print("Interrupted. Exiting...")

finally:

GPIO.cleanup()

def desktop_mode():

dp2 = tk.Toplevel(dp)

dp2.title("Desktop Script")

canvas = tk.Canvas(dp2, width=640, height=480)

canvas.grid(row=0, column=0, padx=10, pady=10)

def stop_dp2():

cap.release()

dp2.destroy()

def speech_recognition():

samplerate = 16000

q = queue.Queue()

def callback(indata, frames, time, status):

if status:

print(status, file=sys.stderr)

q.put(bytes(indata))

rec = KaldiRecognizer(model, samplerate)

with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',

channels=1, callback=callback):

print("🎤 Listening... Press Ctrl+C to stop.")

try:

while True:

data = q.get()

if rec.AcceptWaveform(data):

result = json.loads(rec.Result())

print("✔", result["text"])

final_text = result["text"]

label1.config(text=final_text)

else:

partial = json.loads(rec.PartialResult())

print("…", partial["partial"])

except KeyboardInterrupt:

print("\n🛑 Done.")

text_label = ttk.Label(dp2, text="Detected text will appear here", wraplength=220, justify="left")

text_label.grid(row=0, column=1, padx=10, pady=10, sticky="nw")

label = ttk.Label(dp2, text="Speech recognition System is also available, so to use click Start Speech recognition: (Warning: Will use more system resources)")

label.grid(row=2, column=0, padx=10, pady=10, sticky="ne")

button0 = ttk.Button(dp2, text="Stop Camera", command=stop_dp2)

button0.grid(row=1, column=0, padx=10, pady=10, sticky="nw")

button1 = ttk.Button(

dp2,

text="Start Speech Recognition",

command=lambda: threading.Thread(target=speech_recognition, daemon=True).start(),

style="Accent.TButton"

)

button1.grid(row=2, column=1, padx=10, pady=10, sticky="ne")

label1 = ttk.Label(dp2, text="Recognized text will appear here:")

label1.grid(row=3, column=0, padx=10, pady=10, sticky="nw")

cap = cv2.VideoCapture(0)

def update_frame():

ret, frame = cap.read()

if ret:

rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

img = Image.fromarray(rgb)

imgtk = ImageTk.PhotoImage(image=img)

canvas.imgtk = imgtk

canvas.create_image(0, 0, anchor="nw", image=imgtk)

small = cv2.resize(frame, None, fx=0.5, fy=0.5)

gray = cv2.cvtColor(small, cv2.COLOR_BGR2GRAY)

text = pytesseract.image_to_string(gray).strip()

if not text:

text = "No text detected:"

text_label.config(text=text)

dp2.after(100, update_frame)

def on_close():

cap.release()

dp2.destroy()

dp2.protocol("WM_DELETE_WINDOW", on_close)

update_frame()

# Main GUI

dp = tk.Tk()

sv_ttk.set_theme("dark")

dp.title("InVision 2.0")

label = ttk.Label(dp, text="Welcome to InVision 2.0")

label.grid(row=0, column=0, padx=10, pady=10, sticky="nw")

label2 = ttk.Label(dp, text="by- DarkFLAME")

label2.grid(row=0, column=1, padx=(10, 10), pady=10, sticky="ne")

label3 = ttk.Label(dp, text="This software features 2 modes, one for use on computers like laptops and other mode (headless which will be used on the raspberry pi.)")

label3.grid(row=1, column=0, padx=(10, 10), pady=(50, 10), sticky="nw")

label4 = ttk.Label(dp, text="Mode : 1) Desktop mode: To be used on computers which have displays for GUI.")

label4.grid(row=2, column=0, padx=(10, 10), pady=(50, 10), sticky="nw")

button = ttk.Button(dp, text="Desktop Mode", style="Accent.TButton", command=desktop_mode)

button.grid(row=3, column=0, padx=(10, 10), pady=(10, 10), sticky="nw")

label5 = ttk.Label(dp, text="Mode : 2) Raspberry pi Headless mode: To be used on the project, press and enjoy your clever summer cap.")

label5.grid(row=4, column=0, padx=(10, 10), pady=(20, 10), sticky="nw")

button2 = ttk.Button(dp, text="Clever Cap Mode", style="Accent.TButton", command=cc_script)

button2.grid(row=5, column=0, padx=(10, 10), pady=(10, 10), sticky="nw")

dp.mainloop()

Here's the code:

Downloads

InVision.py

Making the Headwear (Summer Cap)

First, we gather supplies, which includes:

1) A raspberry pi

2) A summer cap

3) USB Camera

4) Bluetooth headphones

5) SSD1306 OLED Display 0.96 inch

6) Jumper Wires

7) Toggle Switch

How its to be modelled:

Take a cardboard piece and bend it like in the picture

Next wire it together so it would complete the circuit and should look like image-2

Demonstration

Screenshot_20250527-211145_RVNC Viewer.jpg

Screenshot_20250527-211148_RVNC Viewer.jpg

Screenshot_20250527-211152_RVNC Viewer.jpg

This Software has 2 modes, one for use on desktop or laptops with high compute power and other for edge devices which will be mounted onto the prototype

Desktop Mode:

Future Scalability

The InVision 2.0 project holds strong growth potential across multiple domains. Designed with dual operating modes—Desktop Mode for PCs and Clever Cap Mode for Raspberry Pi—it can evolve into a versatile platform for smart interaction. By integrating features such as cloud syncing, smart home interoperability, and AI-driven conversational interfaces, the project can transition from a standalone utility to a full-fledged personal assistant or educational tool. Its modularity allows for easy extension, enabling the addition of plugins, new input methods, or custom actions without major architectural changes. With further enhancements like centralized control, remote updates, and support for multi-device deployment, InVision 2.0 can scale to institutional settings such as classrooms, smart offices, or assistive environments, making it a compelling candidate for widespread adoption.

Thanks, A version three with LLM support will be rolling out soon after I get better hardware and learn more AI, but before that feature updates will be rolling frequently, as well as bug fixes. Thanks.

Regards,

Harshit Nandan Vatsa, 14y/o