InVision ~ V2.0.0 a Smart Translating Device for Seeing the World in Your Language.
by DarkFLAME in Circuits > Assistive Tech
66 Views, 2 Favorites, 0 Comments
InVision ~ V2.0.0 a Smart Translating Device for Seeing the World in Your Language.

Hello everyone! Invision Part 2 is finally here, something I know many of my 3K viewers have been anxiously waiting!
We're going to step it up in this next stage of the project by making a wearable summer cap that runs on a Raspberry Pi 4 with 2GB of RAM. This summer cap isn't just clever; it was made with practical applications in mind.
Project Objective
Supporting tourists and residents like me who reside in areas where the language shifts every 100 kilometers is the goal. The purpose of this wearable is to improve daily interactions while on the go and close communication gaps.
Things You'll Need
You don’t need deep programming skills or any fancy hardware, jk you need fancy hardware even my pi is struggling to run this, you can just overclock your pi, or should you? No please don't, if it runs, then don't touch it. This project is built to be accessible and easy to replicate. BTW you can use this on your phone too, its built for 2 modes, one for heavy demanding hardware, and other which is minimalistic, such as for a pi.
......MATERIALS......🪛🔧🧑💻
- OLED DISPLAY ............🔳
- CAMERA (USB CAMERA) ........📷
- SUMMER HAT...... 😎🤠
- AND OUR RASPBERRY PI.......
- AND IT'S PERIPHERALS........................
Supplies





1) Get a Single Board Computer (small and which doesn't produce a lotta heat or your head will hurt) .......
2) Configure the SBC...
3) A camera (USB/ribbon cable) ....
4) Display (In my case 128x64 0.96 Inch SSD1306) ....
5) Microphone (preferably a pair of Airpods or something like that) ....
6) Code....
🧠PLANNING!!



|: _🎯Specify the Plan and Scope🏳️_:|
The Invision Project is designed with a clear mission: to create an accessible, wearable technology that helps people overcome language barriers and convenient communication in diverse and mobile environments. Here are the goals:
1. Destroy Language Gaps
Enabled real-time translation and visual assistance for people traveling through multilingual regions, where the local language changes frequently.
2. Promote Accessibility
Designed a solution that’s easy to build, even for beginners — requiring minimal programming knowledge and using affordable, widely available hardware like the Raspberry Pi.
3. Wearable and Practical
Integrated the system into a summer cap, making it lightweight, comfortable, and ideal for outdoor use — especially in hot climates and during travel.
Code



Now what do we want to do?
The aim is to build a cap which translate what you hear or see.
So, we'll use VOSK and Tesseract OCR to get our job done for speech and image respectively. We'll use Helinski or Google translate for translation
Here's the code, and if you want me to explain the code, please comment as I'm running low on time.
Please install the following libraries on your pi but first create a venv.
import sounddevice as sd
import queue
import sys
import json
import threading
from vosk import Model, KaldiRecognizer
import tkinter as tk
from tkinter import ttk
import sv_ttk
import cv2
import pytesseract
from PIL import Image, ImageTk
import time
Connections:
.png)
Connect the Switch to GPIO 17 or change the code to meet the requirement of what you want to attach the toggle switch to.
The Display needs to be connected like this in the image, or you can look up the internet for that and accordingly you will have to change the code.
Do not forget to connect the camera and the Bluetooth earphones or you will be left wondering "why did that error come..."
Code
After hours of writing, researching, testing and debugging code, finally I have written the code:
Note that this code is for linux and if someone looks forward onto running this on their windows pc, then changes have to be made and please inform me as I upload the compatible instruction and code for windows.
import sounddevice as sd
import queue
import sys
import json
import threading
from vosk import Model, KaldiRecognizer
import tkinter as tk
from tkinter import ttk
import sv_ttk
import cv2
import pytesseract
from PIL import Image, ImageTk
import time
import board
import busio
from PIL import Image, ImageDraw, ImageFont
import adafruit_ssd1306
import RPi.GPIO as GPIO
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract' # adjust if needed
model = Model("model") # path to Vosk model folder
SWITCH_PIN = 17
OLED_WIDTH = 128
OLED_HEIGHT = 64
GPIO.setmode(GPIO.BCM)
GPIO.setup(SWITCH_PIN, GPIO.IN, pull_up_down=GPIO.PUD_UP)
#Functions:
#code for clever cap mode
def cc_script():
GPIO.cleanup()
i2c = busio.I2C(board.SCL, board.SDA)
display = adafruit_ssd1306.SSD1306_I2C(OLED_WIDTH, OLED_HEIGHT, i2c)
display.fill(0)
display.show()
image = Image.new("1", (OLED_WIDTH, OLED_HEIGHT))
draw = ImageDraw.Draw(image)
font = ImageFont.load_default()
def wrap_text(text, font, max_width):
words = text.split()
lines = []
current_line = ""
for word in words:
test_line = current_line + word + " "
bbox = draw.textbbox((0, 0), test_line, font=font)
width = bbox[2] - bbox[0]
if width <= max_width:
current_line = test_line
else:
lines.append(current_line.strip())
current_line = word + " "
if current_line:
lines.append(current_line.strip())
return lines
def update_display(text):
draw.rectangle((0, 0, OLED_WIDTH, OLED_HEIGHT), outline=0, fill=0)
lines = wrap_text(text or "No text detected", font, OLED_WIDTH)
for i, line in enumerate(lines):
if i * 10 > OLED_HEIGHT - 10:
break
draw.text((0, i * 10), line, font=font, fill=255)
display.image(image)
display.show()
def text_recognition_once():
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
text = ""
if ret:
gray = cv2.cvtColor(cv2.resize(frame, None, fx=0.5, fy=0.5), cv2.COLOR_BGR2GRAY)
text = pytesseract.image_to_string(gray).strip()
cap.release()
update_display(text)
print("[Text] Detected:", text)
def speech_recognition_once():
samplerate = 16000
q = queue.Queue()
def callback(indata, frames, time, status):
if status:
print(status, file=sys.stderr)
q.put(bytes(indata))
rec = KaldiRecognizer(model, samplerate)
with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',
channels=1, callback=callback):
print("🎤 Listening... Flip switch to stop.")
try:
while GPIO.input(SWITCH_PIN) == GPIO.LOW: # Loop until switch opens
data = q.get(timeout=5)
if rec.AcceptWaveform(data):
result = json.loads(rec.Result())
print("✔", result["text"])
final_text = result["text"]
update_display(final_text)
else:
partial = json.loads(rec.PartialResult())
print("…", partial["partial"])
except queue.Empty:
print("⚠️ No speech detected.")
print("\n🛑 Switch OFF — Stopping speech recognition.")
last_state = None
try:
GPIO.setmode(GPIO.BCM)
GPIO.setmode(GPIO.BCM)
GPIO.setup(SWITCH_PIN, GPIO.IN, pull_up_down=GPIO.PUD_UP)
while True:
if GPIO.input(SWITCH_PIN) == GPIO.LOW:
print("Switch is ON (Closed)")
speech_recognition_once()
else:
print("Switch is OFF (Open)")
text_recognition_once()
time.sleep(0.1)
except KeyboardInterrupt:
print("Interrupted. Exiting...")
finally:
GPIO.cleanup()
def desktop_mode():
dp2 = tk.Toplevel(dp)
dp2.title("Desktop Script")
canvas = tk.Canvas(dp2, width=640, height=480)
canvas.grid(row=0, column=0, padx=10, pady=10)
def stop_dp2():
cap.release()
dp2.destroy()
def speech_recognition():
samplerate = 16000
q = queue.Queue()
def callback(indata, frames, time, status):
if status:
print(status, file=sys.stderr)
q.put(bytes(indata))
rec = KaldiRecognizer(model, samplerate)
with sd.RawInputStream(samplerate=samplerate, blocksize=8000, dtype='int16',
channels=1, callback=callback):
print("🎤 Listening... Press Ctrl+C to stop.")
try:
while True:
data = q.get()
if rec.AcceptWaveform(data):
result = json.loads(rec.Result())
print("✔", result["text"])
final_text = result["text"]
label1.config(text=final_text)
else:
partial = json.loads(rec.PartialResult())
print("…", partial["partial"])
except KeyboardInterrupt:
print("\n🛑 Done.")
text_label = ttk.Label(dp2, text="Detected text will appear here", wraplength=220, justify="left")
text_label.grid(row=0, column=1, padx=10, pady=10, sticky="nw")
label = ttk.Label(dp2, text="Speech recognition System is also available, so to use click Start Speech recognition: (Warning: Will use more system resources)")
label.grid(row=2, column=0, padx=10, pady=10, sticky="ne")
button0 = ttk.Button(dp2, text="Stop Camera", command=stop_dp2)
button0.grid(row=1, column=0, padx=10, pady=10, sticky="nw")
button1 = ttk.Button(
dp2,
text="Start Speech Recognition",
command=lambda: threading.Thread(target=speech_recognition, daemon=True).start(),
style="Accent.TButton"
)
button1.grid(row=2, column=1, padx=10, pady=10, sticky="ne")
label1 = ttk.Label(dp2, text="Recognized text will appear here:")
label1.grid(row=3, column=0, padx=10, pady=10, sticky="nw")
cap = cv2.VideoCapture(0)
def update_frame():
ret, frame = cap.read()
if ret:
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img = Image.fromarray(rgb)
imgtk = ImageTk.PhotoImage(image=img)
canvas.imgtk = imgtk
canvas.create_image(0, 0, anchor="nw", image=imgtk)
small = cv2.resize(frame, None, fx=0.5, fy=0.5)
gray = cv2.cvtColor(small, cv2.COLOR_BGR2GRAY)
text = pytesseract.image_to_string(gray).strip()
if not text:
text = "No text detected:"
text_label.config(text=text)
dp2.after(100, update_frame)
def on_close():
cap.release()
dp2.destroy()
dp2.protocol("WM_DELETE_WINDOW", on_close)
update_frame()
# Main GUI
dp = tk.Tk()
sv_ttk.set_theme("dark")
dp.title("InVision 2.0")
label = ttk.Label(dp, text="Welcome to InVision 2.0")
label.grid(row=0, column=0, padx=10, pady=10, sticky="nw")
label2 = ttk.Label(dp, text="by- DarkFLAME")
label2.grid(row=0, column=1, padx=(10, 10), pady=10, sticky="ne")
label3 = ttk.Label(dp, text="This software features 2 modes, one for use on computers like laptops and other mode (headless which will be used on the raspberry pi.)")
label3.grid(row=1, column=0, padx=(10, 10), pady=(50, 10), sticky="nw")
label4 = ttk.Label(dp, text="Mode : 1) Desktop mode: To be used on computers which have displays for GUI.")
label4.grid(row=2, column=0, padx=(10, 10), pady=(50, 10), sticky="nw")
button = ttk.Button(dp, text="Desktop Mode", style="Accent.TButton", command=desktop_mode)
button.grid(row=3, column=0, padx=(10, 10), pady=(10, 10), sticky="nw")
label5 = ttk.Label(dp, text="Mode : 2) Raspberry pi Headless mode: To be used on the project, press and enjoy your clever summer cap.")
label5.grid(row=4, column=0, padx=(10, 10), pady=(20, 10), sticky="nw")
button2 = ttk.Button(dp, text="Clever Cap Mode", style="Accent.TButton", command=cc_script)
button2.grid(row=5, column=0, padx=(10, 10), pady=(10, 10), sticky="nw")
dp.mainloop()
Here's the code:
Downloads
Making the Headwear (Summer Cap)



First, we gather supplies, which includes:
1) A raspberry pi
2) A summer cap
3) USB Camera
4) Bluetooth headphones
5) SSD1306 OLED Display 0.96 inch
6) Jumper Wires
7) Toggle Switch
How its to be modelled:
Take a cardboard piece and bend it like in the picture
Next wire it together so it would complete the circuit and should look like image-2
Demonstration




This Software has 2 modes, one for use on desktop or laptops with high compute power and other for edge devices which will be mounted onto the prototype
Desktop Mode:
Future Scalability
The InVision 2.0 project holds strong growth potential across multiple domains. Designed with dual operating modes—Desktop Mode for PCs and Clever Cap Mode for Raspberry Pi—it can evolve into a versatile platform for smart interaction. By integrating features such as cloud syncing, smart home interoperability, and AI-driven conversational interfaces, the project can transition from a standalone utility to a full-fledged personal assistant or educational tool. Its modularity allows for easy extension, enabling the addition of plugins, new input methods, or custom actions without major architectural changes. With further enhancements like centralized control, remote updates, and support for multi-device deployment, InVision 2.0 can scale to institutional settings such as classrooms, smart offices, or assistive environments, making it a compelling candidate for widespread adoption.
Thanks, A version three with LLM support will be rolling out soon after I get better hardware and learn more AI, but before that feature updates will be rolling frequently, as well as bug fixes. Thanks.
Regards,
Harshit Nandan Vatsa, 14y/o