Browser Voice Recognition Control

Welcome again to this new AMBURO tutorial. this is the second chapter of the voice control series. On this occasion, we will try another approach. This will be harvesting the power of the browser in our computer by creating a web page and the aid of this javascript library called Artyom. This library runs on HTML5 and gives us a special set of JS instructions to control the browser microphone input and the resources to process the words that are translated from the sound our mouth is producing. So basically, we will be able to define a set of instructions and link them to related tasks.

Of course, this task will be JS code that runs in the browser so we can control everything related to the page we are in like for example opening new tabs or windows to the desired destination changing behaviors, or triggering new ones. But I will show you how we can do much more than things related to the page we are in. And this for me is the most interesting part. With one of these instructions, we can send a message to a server located on the same machine, our local machine that runs with NODEJS and these messages will be able to trigger more complex actions like the ones am about to show you. So the idea is to take advantage of all the tools and power of the browser and this library to create a simple system for controlling everything with our voices.

Supplies

For this project, we won't need any special gear like an Arduino board, sensor, or switches... for now. We will integrate those in future steps.

So this project will be essentially a NODEJS project as we mentioned in previous tutorials we use NODEJS mainly for its simplicity. We will be coding a server that lets us serve a main page. This page will contain the HTML code showing the UI that will be controlling everything and will be importing the JS libraries needed for its functionality. Also, the main server will be in charge of receiving the signals from the page in order to trigger configurable tasks. So let's get into the code.

The Project

As you can see we have our project with a main server.js a folder called views containing our main index.html page and a public folder with a js folder containing all the javascript libraries.

browserVoiceController

Code

Going back to our server we will see that we will be importing all the NodeJS libraries required for creating an HTTP server. This server will expose our main HTML page in the first place and then will be listening for executing actions.

So first thing first, we prepare and configure the server, things like setting the port, the path for the different elements, etc. Next, we will set the two main controllers one with the verb get and the other with the verb post, the first one will be rendering the main HTML page and the other will be receiving a simple JSON body with a string that will contain the action identifier. This is our keyword to identify our actions and last but not least we will be defining our actions. To do this we will create a variable as an array and as you can see we will be defining keys in that array and we will be storing those key's functions as you may guess in these functions you can do whatever you please and you can define as many keys as you want one after another.

so basically when a request on the doSomething controller arrives a string is extracted from the do key in the JSON we check if that string exists as a key in our actions array and if that key object is a function (we will use this check in future works) and basically, we execute that function like this: actions[fn]();

As you can see by using this simple strategy we can execute in a dynamic code our tasks. If we want to add a task we can just add functions without modifying much of the code. In fact, you can just have a separate file for only functions also, you can have many files for functions separated by category. We will explore this way of working in future tutorials. By this way of working you can make your code scalable.

server.js

Code

Going back to our index.html we will see that we have a head with all the scripts and links required, am not going to go deep into this since is a basic HTML structure. I will only mention that we use jQuery for the stylesheets required from Bootstrap of course Artyom lib which will help us with the voice translation and the main javascript controller that we will be discussing in a second. We see the main body and table structure to show the information that will be rendering this interface. Ss you see is a simple display to show the basic information and a log in which we will be logging what is happening in the browser engine in a very simple fashion all the behavior of this page will be handled by voiceHandler.js.

index.html

Code

If we go to this js file we will find first the log configuration and the Ayrton library basic settings enclosed within a function. Here we will be starting the engine and setting all the instructions, fader function, and utility function to control how certain icons behave chronometer control functions, and setting another utility structure to show how can we control the page behavior a function that will be in charge of sending messages to our server with a dynamic text and finally, the functions that will be controlling the starting and stopping our recognition system.

voiceHandler.js

Coming Next

In the next tutorials, we will see how many things we can control with this piece of code. See you in the next one.