InVision: a Smart Translating Device for Seeing the World in Your Language.
by Binary_Flame in Circuits > Raspberry Pi
975 Views, 7 Favorites, 0 Comments
InVision: a Smart Translating Device for Seeing the World in Your Language.

*Note: the current model does not incorporate wearing and runs only as a software*
MADE BY @ElectraFlame and @DarkFLAME
InVision is an advanced pair of smart glasses designed to break language barriers effortlessly. Equipped with cutting-edge AI and real-time text recognition technology, InVision allows users to instantly translate foreign languages into their native tongue. Whether navigating unfamiliar streets, reading menus, or engaging in cross-cultural conversations, this device ensures a smooth and intuitive experience. By overlaying translated text directly onto the lens, users can seamlessly understand their surroundings without the need for external devices.
Built for travelers, students, business professionals, and language enthusiasts, InVision harnesses the power of augmented reality (AR), optical character recognition (OCR), and AI-driven translation engines to deliver fast, accurate, and context-aware translations. Its sleek, lightweight design ensures comfort, while its integration with cloud-based translation services enables continuous improvements in accuracy and language support.
Ways to Improve InVision
- AI-Enhanced Context Awareness – Implementing AI to detect contextual meanings, idioms, and tone to provide more accurate translations.
- Speech-to-Text Integration – Adding voice recognition for spoken language translation, making communication even smoother.
- Offline Translation Mode – Enabling translation without internet access for remote areas and travel convenience.
- Customizable UI/UX – Allowing users to adjust text size, font, and display styles for readability.
- Multi-Language Support – Supporting rare and indigenous languages to cater to a broader audience.
- Real-Time Conversation Mode – Enabling interactive dialogues with on-screen subtitles for face-to-face communication.
- Gesture-Based Controls – Letting users control the device with simple hand gestures or eye-tracking.
- AI-Powered Learning Mode – Assisting language learners by providing grammar insights and pronunciation tips.
Supplies
The Supplies:
- raspberry pi 4 2 GB ram (pls suggest better hardware)
- Webcam Module - for Computer Vision
- keyboard - peripheral
- mouse - peripheral
- power cable - power
- SD card - OS and data storage
Installing Pycharm

Install PyCharm community edition with ubuntu app store.
Writing Code
Code is provided at the end:
Before writing code:
- be clear of classes to use and functions to call
- make a detailed sketch on what should be executed in order
Installing Libraries Downloading and Setup

import cv2
import
pytesseract
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
import string
from transformers import pipeline
# Path to your Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Students\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'
# Set up translation pipeline (for translating from English to German as an example)
translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")
# Open the webcam (camera 0 is usually the default camera)
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not access the camera.")
exit()
# Ensure that NLTK stopwords are downloaded
nltk.download('stopwords')
Translating Function

def translate_text(text, model_name='Helsinki-NLP/opus-mt-en-de'):
"""Translate text to another language using Hugging Face's translation pipeline."""
# Initialize the translation pipeline
translation_pipeline = pipeline("translation", model=model_name)
# Translate the text
translated = translation_pipeline(text, max_length=400)
# Return the translated text
return translated[0]['translation_text']
Define Processing
def preprocess_text(text):
# Tokenize the text using regular expression tokenizer (words only)
tokenizer = RegexpTokenizer(r'\w+') stop_words = set(stopwords.words('english'))
# Tokenize the input text
tokens = tokenizer.tokenize(text.lower())
# Convert to lowercase and tokenize # Remove stopwords and punctuation
tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]
# Join the remaining words back into a single string
return " ".join(tokens)
Opening Frame and Using Open CV

while True:
# Capture each frame from the webcam
ret, frame = cap.read()
if not ret:
print("Failed to grab frame!")
break
# Convert the frame to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Apply some preprocessing (thresholding)
_, threshold_image = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
# Use Tesseract to get the bounding boxes of text
boxes = pytesseract.image_to_boxes(threshold_image)
# Draw bounding boxes around the recognized text
h, w, _ = frame.shape # Get the frame dimensions
for b in boxes.splitlines():
b = b.split()
x, y, x2, y2 = int(b[1]), int(b[2]), int(b[3]), int(b[4]) # Coordinates of the box
cv2.rectangle(frame, (x, h - y), (x2, h - y2), (0, 255, 0), 2) # Draw the box in green
# Use Tesseract to recognize text from the processed frame
text = pytesseract.image_to_string(threshold_image)
print(text)
# Process the recognized text to remove stopwords
cleaned_text = preprocess_text(text)
# Translate the text to German (or another language)
translated_text = translate_text(cleaned_text)
# Display the cleaned text on the console
print(f"Recognized Text: {cleaned_text}")
print(f"Translated Text: {translated_text}")
# Optionally, save the cleaned and translated text to a file
with open("detectedtext_translated.txt", "w") as file:
file.write(f"Recognized Text: {cleaned_text}\n")
file.write(f"Translated Text: {translated_text}\n")
# Optionally, display the cleaned text and translated text on the video frame
cv2.putText(frame, cleaned_text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.putText(frame, translated_text, (10, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
# Show the processed video frame with bounding boxes
cv2.imshow("Real-time Text Recognition and Translation", frame)
# Break the loop if the user presses the 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()
Final Code
import cv2
import pytesseract
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
import string
from transformers import pipeline
# Path to your Tesseract executable (for Raspberry Pi, typically /usr/bin/tesseract)
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract' # Update if needed
# Set up translation pipeline (using a smaller model for efficiency)
translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de", device=0)
# Open the webcam (camera 0 is usually the default camera)
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not access the camera.")
exit()
# Ensure that NLTK stopwords are downloaded
nltk.download('stopwords')
def translate_text(text):
"""Translate text to another language using Hugging Face's translation pipeline."""
translated = translation_pipeline(text, max_length=400)
return translated[0]['translation_text']
def preprocess_text(text):
"""Preprocess text by removing stopwords and punctuation."""
tokenizer = RegexpTokenizer(r'\w+')
stop_words = set(stopwords.words('english'))
# Tokenize the input text
tokens = tokenizer.tokenize(text.lower()) # Convert to lowercase and tokenize
# Remove stopwords and punctuation
tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]
# Join the remaining words back into a single string
return " ".join(tokens)
# Set the desired frame rate (lowering frame rate to reduce processing load)
frame_rate = 5 # Process every 5th frame for example
frame_count = 0
while True:
ret, frame = cap.read()
if not ret:
print("Failed to grab frame!")
break
frame_count += 1
if frame_count % frame_rate != 0:
continue # Skip processing for most frames to reduce load
# Resize the frame for faster processing (lower resolution)
frame = cv2.resize(frame, (640, 480)) # Adjust resolution as needed
# Convert the frame to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Apply some preprocessing (thresholding)
_, threshold_image = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
# Use Tesseract to get the bounding boxes of text
boxes = pytesseract.image_to_boxes(threshold_image)
# Draw bounding boxes around the recognized text
h, w, _ = frame.shape # Get the frame dimensions
for b in boxes.splitlines():
b = b.split()
x, y, x2, y2 = int(b[1]), int(b[2]), int(b[3]), int(b[4]) # Coordinates of the box
cv2.rectangle(frame, (x, h - y), (x2, h - y2), (0, 255, 0), 2) # Draw the box in green
# Use Tesseract to recognize text from the processed frame
text = pytesseract.image_to_string(threshold_image)
print(f"Recognized Text: {text}")
# Process the recognized text to remove stopwords
cleaned_text = preprocess_text(text)
# Translate the text to German (or another language)
translated_text = translate_text(cleaned_text)
# Display the cleaned text on the console
print(f"Cleaned Text: {cleaned_text}")
print(f"Translated Text: {translated_text}")
# Optionally, save the cleaned and translated text to a file
with open("detectedtext_translated.txt", "w") as file:
file.write(f"Recognized Text: {cleaned_text}\n")
file.write(f"Translated Text: {translated_text}\n")
# Optionally, display the cleaned text and translated text on the video frame
cv2.putText(frame, cleaned_text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
cv2.putText(frame, translated_text, (10, 40), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
# Show the processed video frame with bounding boxes
cv2.imshow("Real-time Text Recognition and Translation", frame)
# Break the loop if the user presses the 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()