Create Your Own Language: a DIY Glyph Generator
by MarkMakies in Circuits > Software
173 Views, 1 Favorites, 0 Comments
Create Your Own Language: a DIY Glyph Generator
This project is a standalone system for glyph generation, combining random glyph creation, extracting glyphs from Unicode fonts, machine learning-based classification, and artistic transformations to produce visually captivating outputs.
Originally conceived as a learning experiment to enhance the glyph display on my Glow Cuboid (also published here), the project quickly evolved into an artistic exploration. Using both AI and machine learning, I created the "Glyphonomicon," a compendium containing 16,000 unique glyphs, as pictured above.
The project consists of two main parts:
Glyph Generation, Classification, and Artistic Transformation (first pic)
The first step involves generating a massive dataset—up to 1,000,000 glyphs, or more. From these, you’ll curate approximately 1,000 you like and another 1,000 you don’t. These "good" and "bad" samples are then used to train a machine learning model to classify glyphs into categories. To streamline this, you can base your "good" glyphs on existing fonts. For this project, I used Noto Sans, a free and open-source font family that includes nearly all symbols used worldwide.
After running the training and classification, I narrowed the dataset down to 16,000 glyphs from the original million. At this stage, the glyphs are 16x16 black-and-white JPEGs. With a modern computer, training takes only a few hours, and classification can run overnight.
Once classified, the glyphs undergo an artistic transformation phase, where effects like Gaussian blur and upscaling are applied. The process is flexible—you can incorporate tools and techniques available in programs like GIMP or Photoshop into your Python code. From here, you can turn the glyphs into tangible projects, such as a printed book or a digital art installation.
Dynamic Display (second pic)
The final (optional) phase of the project involves bringing these glyphs to life on a 16x16 LED matrix, creating a dynamic, ever-changing display. This is perfect for projects like the Glow Cuboid. The glyphs are compressed to just 32 bytes each, making them ideal for microcontrollers—you can store approximately 32,000 glyphs per MB.
A short video of the display in action https://youtu.be/QcZYYweQajA
Supplies
For Part 1
- A computer with Python installed
- Basic knowledge of Python, the ability to install libraries, and a passion for creativity.
- no parts are required
Plus all of these for Part 2
- Microcontroller coding skills
- some soldering experience
- and access to a 3D printer.
And these bits
- 1x RP2040 Pico-like MCU Board or any microcontroller that supports Python
- 4x GlowBit LED Matrix - 8x8 or any other LED matrix that fits
- USB cable and power pack to suit MCU
- 3D-Printed Diffuser
- 3D-Printed Stand
I have no commercial affiliation with any suppliers. If I include a link to a supplier, it’s simply where I sourced my parts. As I’m based in Australia, I try to shop locally whenever possible.
Setup Environment
Refer to the flow chart above for the sequence of steps for the entire process. But first you need to setup the Python environment.
- Ensure Python is installed on your computer
- Clone or download the project files from the GitHub repository. Or just use the files included in each step.
- Set up a Python virtual environment (optional but recommended) and install the necessary dependencies:
- The required libraries are listed in the requirements.txt file. If not using the requirements file to install, then use the below command instead.
Module Summary
- NumPy: Efficient numerical operations.
- Processes pixel data for glyph transformations.
- Applies filters like Difference of Gaussians during artistic enhancements (create-artistic-sheets.py).
- Manages matrix-based operations for glyph alignment and compression.
- TensorFlow: Machine learning framework for glyph classification.
- Trains a binary classification model to identify "good" and "bad" glyphs (train.py).
- Validates classification performance using accuracy and loss metrics.
- Classifies glyphs automatically in bulk with a trained model (auto-classify-directory.py).
- Pillow (PIL): Image processing and manipulation.
- Converts glyph images to black-and-white format for compression.
- Resizes, crops, and centers glyphs for uniformity.
- Creates and saves proof sheets and artistic glyph representations (create-basic-sheets.py, create-artistic-sheets.py).
- ImageHash: Detects duplicate glyphs.
- Generates perceptual hashes for glyph images to identify and remove duplicates (dupe-finder.py).
- Compares hashes with a threshold to ensure a unique dataset.
- Pygame: Simulates the LED matrix display.
- Renders glyph animations on a virtual 16x16 grid (glow-simulation.py).
- Simulates color transitions, rotations, and glyph animations before deploying to hardware.
- OpenCV-Python: Provides tools for manual glyph classification.
- Displays glyphs in a window for interactive sorting (manual-classify.py).
- Captures user input for categorizing glyphs into "good" and "bad" directories.
- Matplotlib: Visualizes data from machine learning and classification.
- Plots training and validation accuracy/loss curves to monitor model performance (train.py).
- Helps identify overfitting or underfitting in the ML model.
Downloads
Generate Random Glyphs
The first step in this project is to create a large dataset of glyphs. This involves generating random glyphs in a 16x16 black-and-white format. The goal here is quantity—producing a massive pool of glyphs to sort and classify later. Most of them will be unusable—refer to the picture above.
To generate glyphs, you’ll use the Python script create-random-glyphs.py. This script creates random patterns using strokes, curves, and symmetry, ensuring each glyph has a unique design. The settings can be adjusted to experiment with different styles and densities of glyphs. For example, you can tweak the probability of symmetry or the number of strokes per glyph to refine the output.
- Ensure Python is installed on your computer
- Clone or download the project files from my GitHub repository. Or just use the files here.
- The required libraries listed in the requirements.txt file (install them with pip install -r requirements.txt)
- Set up a Python virtual environment (optional but recommended) and install the necessary dependencies:
- Run the glyph generator script:
This will generate a default of 1,000 glyphs, saved as 16x16 black-and-white JPEGs in the specified output folder.
Tips:
- If you want more glyphs, you can increase the NUM_GLYPHS variable in the script, but at this stage you can usually tell if you're getting the style you want.
- Experiment with parameters like symmetry probabilities or stroke lengths to influence the glyph designs.
- Review the output and regenerate if needed to refine your dataset.
By the end of this step, you’ll have a folder full of random glyphs ready for sorting and classification in the next phase.
Downloads
Generate Unicode Glyphs
If you want to enhance your dataset with glyphs from existing fonts, this step involves extracting Unicode glyphs and converting them into a consistent 16x16 format (or any size). Using a well-rounded font like Noto Sans ensures you have a rich set of high-quality symbols to include in your "good" glyph category. And you can always use these in your final glyph collection.
- Choose Your Font Locate or download the desired font file (e.g., NotoSans-Regular.ttf).
- Set Up the Script Open the create-unicode-glyphs.py script and ensure the FONT_PATH variable points to the font file:
- Run the Script Execute the script to extract glyphs from Unicode code points. By default, the script generates glyphs for a range of Unicode code points (e.g., U+0021 to U+02A7). The glyphs are saved as 16x16 black-and-white images, upscaled to 128x128 JPEGs, in the input/glyphs-unclassified/ folder
Customize the Range (Recommended) If you want to extract a different range of Unicode symbols, modify the UNICODE_START and UNICODE_END values in the script:
Tips:
- Use a font manager or browser to help choose fonts
- Glyphs are centered and padded in a 16x16 grid for consistency.
- Use Unicode glyphs to guide your machine learning model by adding them to the "good" category during classification (Step 2).
- Experiment with different fonts to create unique datasets tailored to your project needs.
By the end of this step, you’ll have a fresh set of glyphs extracted from a high-quality font, ready to enrich your dataset and improve the training process. Or use as is.
Classify Glyphs
Once you have a large dataset of random glyphs, the next step is to sort them into "good" and "bad" categories. This classification is essential for training the machine learning model, allowing it to identify and filter glyphs that align with your preferences. Start with a general idea of what makes a glyph "good" or "bad" based on your project goals.
Use image sorting software like digiKam, or even a file explorer, to organize your glyphs. Move the glyphs you consider outstanding into one directory and those you dislike into another. Focus on selecting glyphs at the extremes—those that are exceptionally good or clearly bad. If needed, you can always generate additional glyphs to expand your choices.
Aim to collect around 1,000 glyphs in each category. If you’re using font-based glyphs (e.g., from Unicode), consider adding them to the "good" category to provide a clear benchmark for the model during training.
There is also a simple script you could use to manually classify glyphs:
The script will display each glyph and prompt you to classify it. Use the following keys:
- Press 1 to mark a glyph as "good."
- Press 2 to mark a glyph as "bad."
- Press N to skip the current glyph.
- Press Q to quit at any time.
Classified glyphs are automatically moved to the input/dataset/good or input/dataset/bad folders.
Downloads
Train the Machine Learning Model
With your glyphs sorted into "good" and "bad" categories, the next step is to train a machine learning model to classify glyphs automatically. This step involves feeding your dataset into the model, fine-tuning it with a few epochs, and preparing it for batch classification.
- Prepare Your Dataset Ensure the "good" glyphs are in data/input/dataset/good/ and the "bad" glyphs are in data/input/dataset/bad/. The script will automatically load images from these folders. Around 1,000 glyphs in each category (more is better, but 1,000 is a good starting point).
- Ensure Python with TensorFlow and other dependencies are installed.
- Run the Training Script
- This script:
- Rescales the images to normalize pixel values.
- Splits the dataset into 80% training and 20% validation.
- Uses a convolutional neural network (CNN) to learn the features of "good" and "bad" glyphs.
- Monitor Training Progress During training, you’ll see accuracy and loss metrics for both training and validation sets. Pay attention to the validation accuracy:
- If validation accuracy improves steadily, the model is learning well.
- If it starts to drop after a few epochs, reduce the number of epochs to prevent overfitting.
- Save the Trained Model Once training completes, the script saves the model as ClassifierV1.keras in the data/models/ directory. This file contains the trained model, ready for use in classification.
Tips:
- Keep Epochs Low: Start with 3–5 epochs. Overtraining can lead to a model that performs well on the training data but poorly on new glyphs.
- Iterate and Improve: If results are unsatisfactory, add more "good" or "bad" examples to balance the dataset and retrain the model.
- Visualize Results: The script plots training and validation accuracy/loss graphs, helping you diagnose performance issues.
By the end of this step, you’ll have a trained machine learning model that can classify glyphs efficiently. Next, you’ll use it to process large batches of glyphs automatically.
Automatically Classify Glyphs Using the Trained Model
With your machine learning model trained, the next step is to use it to classify glyphs in bulk. This allows you to quickly sort a large number of glyphs into "good" and "bad" categories based on the model’s predictions.
- Prepare the Input Folder Place all the glyphs you want to classify into the data/input/glyphs-unclassified/ directory. Ensure all files are in a compatible image format (e.g., .jpg, .png, .jpeg).
- Run the Classification Script to classify the glyphs automatically:
- The script:
- Loads the trained model from eg. data/models/classifierV1.keras.
- Processes each glyph, resizing it to the expected dimensions (128x128).
- Classifies the glyph with a probability score.
- Moves glyphs with high confidence (e.g., 95% or more) into data/output/glyphs-good/ and the rest into data/output/glyphs-bad/.
- Review the Results After the script completes:
- Check the data/output/glyphs-good/ folder for the glyphs classified as "good."
- Check the data/output/glyphs-bad/ folder for the "bad" glyphs.
Tips:
- Confidence Threshold: The script classifies glyphs as "good" by default if the probability is 95% or higher. You can adjust this threshold in the script based on your needs—I used 99% and got better results.
- Iterative Refinement: If the classifications don’t meet your expectations, retrain the model with additional examples or adjust the dataset balance. Simply move the misclassified glyphs back into the appropriate training folders ("good" or "bad") and retrain.
- Batch Processing: The script efficiently handles large datasets, allowing you to classify thousands or even millions of glyphs in a single run. However, when dealing with over 1 million files, GUI-based file managers and viewers (e.g., those that generate thumbnails) may struggle on systems like Ubuntu. Consider using command-line tools for better performance when managing such large datasets.
Downloads
Remove Duplicates and Rename Files
Remove Duplicates
Use dupe-finder.py to detect and remove duplicate glyphs from your dataset:
This script:
- Compares each glyph in the dataset using perceptual hashing.
- Moves duplicates to a designated "dupes" folder for review.
- Ensures your dataset contains only unique glyphs.
Randomize File Names
After cleaning up duplicates, run file-random-rename.py to give each file a unique and randomized name:
This script:
- Generates a random file name for each glyph based on a defined mask (e.g., G-XXXXX-F.jpg).
- Ensures all file names are unique, avoiding overwrites and ensuring randomness.
Tips:
- Adjust Hamming Threshold: In dupe-finder.py, you can modify the HAMMING_THRESHOLD to fine-tune the sensitivity of duplicate detection. A lower threshold makes the comparison stricter. It depends on how varied you want your glyphs.
- Review the "dupes" Folder: Before deleting duplicates, review the files in the duplicates folder to confirm they’re genuinely redundant.
- Choose a Consistent Naming Mask: Update the file naming mask in file-random-rename.py to match your project’s naming conventions.
- Editing: glyph-editor.py is a fast and efficient tool for making small adjustments or deletions across a large number of files, should you choose to do so. It’s particularly useful for refining your training dataset, though it was developed at a later stage in the proj
By the end of this step, you’ll have a clean, organized dataset of unique glyphs, ready for artistic enhancements or dynamic display integration.
Artistic Transformation
Now that your dataset is clean and organized, you can elevate the glyphs by applying artistic effects. These transformations bring a unique visual flair to your glyphs, allowing you to create captivating outputs such as "proof sheets" or even an entire "Book of Glyphs." aka "Glyphonomicon".
Note that the first image above is made entirely of glyphs generated from existing fonts, part of my version 2 efforts - no random glyphs were used. The second image consists of about 20% generated from Noto Sans and the rest random and classified using train.py from 1 million randoms.
- Run the Artistic Sheet Generator Execute the script to apply artistic effects and generate mosaic-style proof sheets:
- The script:
- Applies effects like Gaussian blur, smoothing, and contrast enhancement.
- Uses custom transformations such as the Difference of Gaussians filter to emphasize key features of the glyphs.
- Organizes glyphs into a mosaic-style layout for easy viewing or printing.
- Adjust Artistic Parameters Experiment with the settings in the script to customize the output:
- Glyph Size: Modify GLYPH_CANVAS_SIZE to upscale or downscale the glyphs.
- Artistic Effects: Adjust parameters for blurring, contrast, and brightness to achieve the desired look.
- Mosaic Layout: Customize the number of rows, columns, and padding for the mosaic.
- Output Location The script saves the final artistic sheets as JPEG files in the data/output/sheets/ directory.
Tips:
- Iterate for Perfection: Generate multiple variations by tweaking artistic parameters to find the most visually appealing results.
- Combine Effects: Use software like GIMP or Photoshop in conjunction with Python-generated outputs for advanced customization.
- Print Your Work: Export the artistic sheets to create physical projects, such as a printed "Book of Glyphs."
Originally, create-basic-sheets.py was designed to simulate how the glyphs would appear on an LED matrix, but it could easily be repurposed for other application an led matrix
End of Part 1: What’s Next?
Congratulations! By now, you’ve successfully generated, refined, and transformed a unique set of glyphs. You’ve even simulated their display, setting the stage for dynamic presentations. This wraps up the software-focused portion of the project.
If you’re ready for a more hands-on challenge, the next will guide you through bringing your glyphs to life on a physical 16x16 LED matrix. This part involves 3D printing, hardware assembly, soldering, and coding for microcontrollers—a rewarding step for those looking to merge software with hardware.
The video above demonstrates the transition between various glyphs, with both the glyphs and their colors being randomly generated in real time.
Part 2: Dynamic LED Display with Microcontrollers
- Print a 16x16 LED matrix diffuser and stand
- Assemble and program a 16x16 LED matrix.
- Compress glyphs for efficient storage on a microcontroller.
- Animate glyph transitions in real time for a mesmerizing display.
Whether or not you dive into Part 2, your glyph creations can now be used for a variety of creative projects, from digital art to printed compendiums. The possibilities are endless!
Compress Glyph Data
The compress_glyphs.py script takes each 16x16 glyph image and converts it into a compact binary format for efficient storage on the microcontroller. Each glyph is reduced to a 32-byte (black-and-white) representation, ready for display.
What the Compressor Does:
- Processes all images in the specified input folder.
- Converts each glyph to a binary black-and-white format.
- Packs each glyph into a compact 32-byte format (16x16 pixels, 1 bit per pixel).
- Appends the binary representations to a single output file (eg. glyphV3a.bin).
Prepare the Input Files
Ensure your 16x16 glyph images are saved in black-and-white (.jpg format) and placed in the input/glyphs-unclassified directory.
Run the Compression Script
Use the provided Python script to compress the glyphs:
This script reads all images from the input directory, processes them, and outputs a single binary file (eg. glyphV3a.bin) to the output directory.
Downloads
Simulate the Dynamic Display
Before diving into hardware, it’s a good idea to simulate your dynamic glyph display on a computer. This lets you test animations and transformations without needing an LED matrix right away.
The goal is to display your glyphs dynamically, complete with transitions and artistic transformations, on a simulated 16x16 grid. The glow-simulation.py script helps you visualize how glyphs will appear when animated, ensuring your designs and transitions look polished before moving to hardware.
The glyphs are generated and stored in black and white, but when displayed on the LED matrix, the microcontroller / simulator dynamically adds colors using various methods.
The simulator and the microcontroller both use the same compressed input file, which is why the compression step must be completed first.
Run the Simulation Use glow-simulation.py to test the animation:
The script:
- Simulates the glyphs on a virtual 16x16 grid.
- Applies color transitions and fade effects to enhance visual appeal.
- Randomly selects and displays glyphs from your dataset.
Adjust and Iterate
Experiment with the parameters in the script to refine the look and feel:
- Fade Speed: Control how quickly colors transition between glyphs.
- Duration: Adjust how long each glyph is displayed.
- Colors: Customize the RGB values to fit your artistic vision.
Note: The simulation is not an exact match to the hardware.
For the simulator, these values work well:
- MIN_RGB_VALUE = 64
- MAX_RGB_VALUE = 255
However, on the hardware, you'll need to adjust for the lower brightness range. While higher values are possible, they would require a separate power supply; otherwise, you risk overloading and damaging the MCU onboard regulator.
- MIN_RGB_VALUE = 4
- MAX_RGB_VALUE = 32
Print the 3D Parts
I've included both STL and 3MF Prusa project files for convenience. I printed the parts on my MK3S+ in eSun PLA+HS with a 0.2mm layer height.
The diffuser front is 0.4mm thick (2 layers at 0.2mm). Start by printing four layers in white to prevent bleed-through, then switch to black for the rest of the print. Leaving the diffuser entirely white is an option, but it results in less-defined cells - an acceptable look, though not my preference.
I did the stand in black, nothing special.
Assemble and Solder the LED Matrices
Now it's time to assemble and solder your LED matrices to create a functional 16x16 display. Refer to the picture above for guidance on the wiring and assembly process.
What You’ll Need:
- Four 8x8 LED matrices (e.g., GlowBit 8x8 Matrix).
- Soldering iron and solder.
- Wires for connecting power, ground, and data lines.
- Microcontroller (e.g., RP2040).
- Your 3D-printed frame and base for mounting.
- Align the four 8x8 LED matrices in a 2x2 configuration to form a 16x16 grid. Place them in your 3D-printed frame to keep them steady during assembly. They should click in and stay in place, add weight if necessary.
- Connect the Matrices Solder the data out (Dout) pin of one matrix to the data in (Din) pin of the next, following the data flow path. Looking from the front the order is: top-left, top-right, bottom-right, bottom-left. This, however can be adjusted in the software.
- Power Connect all the Vcc and GND pins, top-bottom, or left-right.
- Wire to the Microcontroller Solder wires from the first matrix in the data chain to your microcontroller:
- Din to the microcontroller’s data output pin.
- Vcc to the 3.3V or 5V pin (I used 5V to reduce load on MCU regulator).
- GND to the microcontroller’s ground pin.
- Attach MCU I printed an open frame version and used hot glue to secure MCU
Upload Code and Compressed Glyph File
Open your preferred microcontroller interface tool (e.g., Thonny, rshell, or ampy). Upload the following files to the microcontroller's filesystem:
- Compressed Glyph File (glyphV3a.bin)
- glow.py
- main.py
Once the files are uploaded, reset the microcontroller. If everything is set up correctly, you should see the glyphs displayed on the LED matrix in varying colors.
There are lots of parameters that can be varied for different effect. This glow.py script is packed with configurable parameters, allowing you to customize the LED display to achieve a variety of visual effects. Here's a breakdown of the key parameters and how they influence the display:
1. RGB Value Range
- Parameters: MIN_RGB_VALUE, MAX_RGB_VALUE
- Effect: Controls the brightness of the colors.
- Lower values (e.g., MIN_RGB_VALUE = 4) result in dimmer colors, suitable for low-power environments.
- Higher values (e.g., MAX_RGB_VALUE = 32) create brighter and more vivid colors but may require additional power.
- Example Adjustment: If you want more pastel tones, lower MAX_RGB_VALUE and keep MIN_RGB_VALUE consistent.
2. Random Color Modifiers
- Parameters: DIM_FACTOR, BOOST_FACTOR, SATURATION_BLEND_MIN, SATURATION_BLEND_MAX
- Effect: Adds variability to color generation.
- DIM_FACTOR dims a color channel by a specified percentage (e.g., 50%).
- BOOST_FACTOR increases a color channel for brighter, more saturated colors.
- SATURATION_BLEND_MIN and SATURATION_BLEND_MAX adjust how much colors blend toward grayscale, creating softer or pastel-like effects.
- Example Adjustment: Increase BOOST_FACTOR to 2.0 for more vibrant colors or lower SATURATION_BLEND_MAX for subdued tones.
3. Rotation Probabilities
- Parameters: PROB_0, PROB_90, PROB_180, PROB_270
- Effect: Determines how likely a glyph will rotate by 0°, 90°, 180°, or 270°.
- These probabilities must sum to 1.
- Higher probability for PROB_0 will result in fewer rotations, keeping glyphs upright.
- Example Adjustment: Set PROB_90 = 0.4 to increase 90° rotations for a more dynamic display.
4. Glyph Hold Time
- Parameters: MIN_HOLD_TIME, MAX_HOLD_TIME (in milliseconds)
- Effect: Controls how long a glyph is displayed before transitioning to the next.
- Shorter times create faster animations, while longer times give each glyph more prominence.
- Example Adjustment: Increase MAX_HOLD_TIME to 10,000 for a slower, more relaxed display.
5. Fade Transition Speed
- Parameter: FADE_STEP_SIZE
- Effect: Controls the smoothness and speed of color transitions.
- A smaller value (e.g., FADE_STEP_SIZE = 1) results in gradual, smoother fades.
- A larger value (e.g., FADE_STEP_SIZE = 5) speeds up transitions, creating a snappier effect.
- Example Adjustment: Experiment with values between 1 and 5 to balance smoothness and responsiveness.
6. Glyph Data File
- Parameter: COMPRESSED_GLYPHS_FILE
- Effect: Specifies the source file containing compressed glyph data.
- You can swap this file to load a different set of glyphs or artwork.
- Example Adjustment: Replace glyphV3.bin with another binary file to instantly change the displayed content.
7. Initial Glow
- Code Line: NEO.fill((3, 3, 3))
- Effect: Sets an initial dim glow for all LEDs before the main display loop starts.
- Adjust these values to change the startup color or remove the glow entirely by setting (0, 0, 0).
Bonus
I really hope you enjoy these! Here are all the compressed glyphs, in two files, from my experiments, training, and personal preferences. If you’d rather skip the machine learning step, you can use these directly.
Note: You can’t upload .bin files here, so I’ve provided them with a .txt extension. Simply remove the .txt before uploading or using them in the simulation or with the microcontroller.
You can find additional scripts and resources in my GitHub repository that may assist with this project. For instance, the pixel-count-classifier.py script is particularly useful for filtering out glyphs with too many or too few active pixels.
License:
- This work is licensed under a Creative Commons Attribution 4.0 International License.
- Any glyphs based on Noto Sans are shared under its license
- When you share everybody wins!