InVision: a Smart Translating Device for Seeing the World in Your Language.

by Binary_Flame in Circuits > Raspberry Pi

3114 Views, 28 Favorites, 0 Comments

InVision: a Smart Translating Device for Seeing the World in Your Language.

Note: the current model does not incorporate wearing and runs only as a software

InVision is an advanced pair of smart glasses designed to break language barriers effortlessly. Equipped with cutting-edge AI and real-time text recognition technology, InVision allows users to instantly translate foreign languages into their native tongue. Whether navigating unfamiliar streets, reading menus, or engaging in cross-cultural conversations, this device ensures a smooth and intuitive experience. By overlaying translated text directly onto the lens, users can seamlessly understand their surroundings without the need for external devices.

Built for travelers, students, business professionals, and language enthusiasts, InVision harnesses the power of augmented reality (AR), optical character recognition (OCR), and AI-driven translation engines to deliver fast, accurate, and context-aware translations. Its sleek, lightweight design ensures comfort, while its integration with cloud-based translation services enables continuous improvements in accuracy and language support.

Ways to Improve InVision

AI-Enhanced Context Awareness – Implementing AI to detect contextual meanings, idioms, and tone to provide more accurate translations.
Speech-to-Text Integration – Adding voice recognition for spoken language translation, making communication even smoother.
Offline Translation Mode – Enabling translation without internet access for remote areas and travel convenience.
Customizable UI/UX – Allowing users to adjust text size, font, and display styles for readability.
Multi-Language Support – Supporting rare and indigenous languages to cater to a broader audience.
Real-Time Conversation Mode – Enabling interactive dialogues with on-screen subtitles for face-to-face communication.
Gesture-Based Controls – Letting users control the device with simple hand gestures or eye-tracking.
AI-Powered Learning Mode – Assisting language learners by providing grammar insights and pronunciation tips.

Supplies

The Supplies:

raspberry pi 4 2 GB ram (pls suggest better hardware)
Webcam Module - for Computer Vision
keyboard - peripheral
mouse - peripheral
power cable - power
SD card - OS and data storage

Installing Pycharm

Install PyCharm community edition with ubuntu app store.

Writing Code

Code is provided at the end:

Before writing code:

be clear of classes to use and functions to call
make a detailed sketch on what should be executed in order

Installing Libraries Downloading and Setup

import cv2

import

pytesseract

import nltk

from nltk.tokenize import RegexpTokenizer

from nltk.corpus import stopwords

import string

from transformers import pipeline

# Path to your Tesseract executable

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Students\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'

# Set up translation pipeline (for translating from English to German as an example)

translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")

# Open the webcam (camera 0 is usually the default camera)

cap = cv2.VideoCapture(0)

if not cap.isOpened():

print("Error: Could not access the camera.")

exit()

# Ensure that NLTK stopwords are downloaded

nltk.download('stopwords')

Translating Function

def translate_text(text, model_name='Helsinki-NLP/opus-mt-en-de'):

"""Translate text to another language using Hugging Face's translation pipeline."""

# Initialize the translation pipeline

translation_pipeline = pipeline("translation", model=model_name)

# Translate the text

translated = translation_pipeline(text, max_length=400)

# Return the translated text

return translated[0]['translation_text']

Define Processing

def preprocess_text(text):

# Tokenize the text using regular expression tokenizer (words only)

tokenizer = RegexpTokenizer(r'\w+') stop_words = set(stopwords.words('english'))

# Tokenize the input text

tokens = tokenizer.tokenize(text.lower())

# Convert to lowercase and tokenize # Remove stopwords and punctuation

tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]

# Join the remaining words back into a single string

return " ".join(tokens)

Opening Frame and Using Open CV

while True:

# Capture each frame from the webcam

ret, frame = cap.read()

if not ret:

print("Failed to grab frame!")

break

# Convert the frame to grayscale

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Apply some preprocessing (thresholding)

_, threshold_image = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)

# Use Tesseract to get the bounding boxes of text

boxes = pytesseract.image_to_boxes(threshold_image)

# Draw bounding boxes around the recognized text

h, w, _ = frame.shape # Get the frame dimensions

for b in boxes.splitlines():

b = b.split()

x, y, x2, y2 = int(b[1]), int(b[2]), int(b[3]), int(b[4]) # Coordinates of the box

cv2.rectangle(frame, (x, h - y), (x2, h - y2), (0, 255, 0), 2) # Draw the box in green

# Use Tesseract to recognize text from the processed frame

text = pytesseract.image_to_string(threshold_image)

print(text)

# Process the recognized text to remove stopwords

cleaned_text = preprocess_text(text)

# Translate the text to German (or another language)

translated_text = translate_text(cleaned_text)

# Display the cleaned text on the console

print(f"Recognized Text: {cleaned_text}")

print(f"Translated Text: {translated_text}")

# Optionally, save the cleaned and translated text to a file

with open("detectedtext_translated.txt", "w") as file:

file.write(f"Recognized Text: {cleaned_text}\n")

file.write(f"Translated Text: {translated_text}\n")

# Optionally, display the cleaned text and translated text on the video frame

cv2.putText(frame, cleaned_text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

cv2.putText(frame, translated_text, (10, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# Show the processed video frame with bounding boxes

cv2.imshow("Real-time Text Recognition and Translation", frame)

# Break the loop if the user presses the 'q' key

if cv2.waitKey(1) & 0xFF == ord('q'):

break

# Release the webcam and close all OpenCV windows

cap.release()

cv2.destroyAllWindows()

Final Code

import cv2

import pytesseract

import nltk

from nltk.tokenize import RegexpTokenizer

from nltk.corpus import stopwords

import string

from transformers import pipeline

# Path to your Tesseract executable (for Raspberry Pi, typically /usr/bin/tesseract)

pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract' # Update if needed

# Set up translation pipeline (using a smaller model for efficiency)

translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de", device=0)

# Open the webcam (camera 0 is usually the default camera)

cap = cv2.VideoCapture(0)

if not cap.isOpened():

print("Error: Could not access the camera.")

exit()

# Ensure that NLTK stopwords are downloaded

nltk.download('stopwords')

def translate_text(text):

"""Translate text to another language using Hugging Face's translation pipeline."""

translated = translation_pipeline(text, max_length=400)

return translated[0]['translation_text']

def preprocess_text(text):

"""Preprocess text by removing stopwords and punctuation."""

tokenizer = RegexpTokenizer(r'\w+')

stop_words = set(stopwords.words('english'))

# Tokenize the input text

tokens = tokenizer.tokenize(text.lower()) # Convert to lowercase and tokenize

# Remove stopwords and punctuation

tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]

# Join the remaining words back into a single string

return " ".join(tokens)

# Set the desired frame rate (lowering frame rate to reduce processing load)

frame_rate = 5 # Process every 5th frame for example

frame_count = 0

while True:

ret, frame = cap.read()

if not ret:

print("Failed to grab frame!")

break

frame_count += 1

if frame_count % frame_rate != 0:

continue # Skip processing for most frames to reduce load

# Resize the frame for faster processing (lower resolution)

frame = cv2.resize(frame, (640, 480)) # Adjust resolution as needed

# Convert the frame to grayscale

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Apply some preprocessing (thresholding)

_, threshold_image = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)

# Use Tesseract to get the bounding boxes of text

boxes = pytesseract.image_to_boxes(threshold_image)

# Draw bounding boxes around the recognized text

h, w, _ = frame.shape # Get the frame dimensions

for b in boxes.splitlines():

b = b.split()

x, y, x2, y2 = int(b[1]), int(b[2]), int(b[3]), int(b[4]) # Coordinates of the box

cv2.rectangle(frame, (x, h - y), (x2, h - y2), (0, 255, 0), 2) # Draw the box in green

# Use Tesseract to recognize text from the processed frame

text = pytesseract.image_to_string(threshold_image)

print(f"Recognized Text: {text}")

# Process the recognized text to remove stopwords

cleaned_text = preprocess_text(text)

# Translate the text to German (or another language)

translated_text = translate_text(cleaned_text)

# Display the cleaned text on the console

print(f"Cleaned Text: {cleaned_text}")

print(f"Translated Text: {translated_text}")

# Optionally, save the cleaned and translated text to a file

with open("detectedtext_translated.txt", "w") as file:

file.write(f"Recognized Text: {cleaned_text}\n")

file.write(f"Translated Text: {translated_text}\n")

# Optionally, display the cleaned text and translated text on the video frame

cv2.putText(frame, cleaned_text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

cv2.putText(frame, translated_text, (10, 40), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

# Show the processed video frame with bounding boxes

cv2.imshow("Real-time Text Recognition and Translation", frame)

# Break the loop if the user presses the 'q' key

if cv2.waitKey(1) & 0xFF == ord('q'):

break

# Release the webcam and close all OpenCV windows

cap.release()

cv2.destroyAllWindows()

Downloads

InVision.py

InVision: a Smart Translating Device for Seeing the World in Your Language.