Custom Keyword-controlled Illuminating Temple

by eisebooij in Circuits > Arduino

536 Views, 5 Favorites, 0 Comments

Custom Keyword-controlled Illuminating Temple

PXL_20220810_135856030.jpg

This project is all about controlling an arduino with custom keywords. These keywords can be anything, even made up words, which works surprisingly well! In this project, the arduino is used to gradually fade in and out a light, illuminating the insides of a temple. In this project I used two made-up keywords and used machine learning to teach the arduino to listen for them.

Supplies

Arduino-ABX00031-30150886-01.jpg
5mm-LED-700x700.jpg
ROYALBLUELEADED-40.jpg
  • Arduino Nano 33 BLE Sense
  • 1 (or more) LED's (l-3wcn/1 used in this project)
  • Corresponding Resistance (49 Ohm used in this project)
  • Housing for the Arduino
  • Power source (5V power bank used in this project)
  • Microphone

Gathering Keywords

Screenshot 2022-08-14 235632.png
Screenshot 2022-08-14 235646.png

The first part of this project consists of creating a dataset in which your custom keywords are contained. Creating a dataset from scratch is way too complicated, so we're using some pre-made data provided by Google. We're also using a Python script to generate a big amount of samples that we can later use to teach our Arduino to listen to them. This is the download link to the Google Dataset:

shorturl.at/ikLZ5

Once you've downloaded and extracted it with a program like 7-Zip, put the folder somewhere on your computer inside another folder and name it something like datasets. From inside the data speech commands folder, cut the background noise folder and place it into the datasets folder.

Create a new folder named custom_keywords and then inside of it, create a folder with the name of the keyword you would like. I recommend not using more than two custom keywords for this Arduino.

----

Now it is time to record your samples! The easiest way to do this is to record one long audio file while repeating the custom keyword at least 75 times. The more recordings, the more accurate the speech recognition will be. Try to vary your pronunciation a bit for more accuracy later on.

Using a program like Audacity, cut the voice samples to exactly 1 second. Before you do this however, you need to change a few settings. First, set the sample rate (Project Rate (Hz) in the left lower corner) to 16000 and resample the track. When you're exporting the samples, set the encoding type to 32-bit float and export as a WAV-file. Once you're done, place the samples in the correct folder in the custom_keywords folder.

--

Now it's time to generate a huge amount of samples with the recordings you've just made. We're doing this by mixing our recording with the background noises provided by google. We're not doing this by hand because that would take ages. We're using python to make things a little easier. To download the script follow this link:

https://github.com/ShawnHymel/ei-keyword-spotting

In the folder are two scripts we need to place in a folder in which we also have to put the entire datasets folder.

'dataset_curation' and 'utils'

Next, we need to install a few libraries. I recommend using Anaconda for this step.

Use the following command to install Librosa, Numpy and Soundfile:

python -m pip install librosa numpy soundfile

Now navigate to the folder where you placed the two python scripts and type the following command:

python dataset_curation.py -t "KEYWORD1, KEYWORD2" -n 1500 -w 1.0 -g 0.1 -s 1.0 -r 16000 -e PCM_16 -b "BACKGROUND NOISE FOLDER PATHWAY" -o "OUTPUT DIRECTORY" "INPUT FILE PATHWAY" "INPUT FILE PATHWAY"

You'll have to fill in the parts within the parentheses yourself, depending on your folder pathways. in the input file pathway entries you should reference the 'data speech commands' by google and the custom keywords folder. That should be enough. Both of these shoud be in the datasets folder.

After the script has run there shoud be a new folder containing the curated samples.


Keyword Training

Screenshot 2022-08-15 003123.png

In this section we start training a program to recognize our keywords. We use Edge Impulse to do this. You can make an account for free!

Create a new project and navigate to data acquisition. Click the upload existing data button and choose the samples from one keyword first. Leave the other settings at their default and begin the upload. Repeat this step for the "noise", "unknown" and other keyword samples.

Navigate to Impulse Design and add a processing block with the MFCC function. Also add a learning block with the recommended function (Keras). Save the impulse and navigate to the MFCC part of the Impulse Design and click generate features. When it's done, go the NN Classifier and click Start Training.

---

We're now going to test our trained model. Go to Model Testing, select all the samples and click classify selected. The result should be higher than 60% to give somewhat reliable results.

Now we're going to export the program and we can finally bring it onto our Arduino. Under the Deployment section you can select either the Arduino Library or a ready-made project for the BLE 33 Sense. We're going to be using the library. Export it and load it into the Arduino software.


Coding the Arduino

With the library loaded in, we can create an example project to make things a lot easier. Under File and Examples you should be able to find the nano 33 ble sense speech recognition continuous example. Load this one up. You're now ready to start adding your own code! I wouldn't change any pre-set parameters unless you're running into problems. When you deploy your code and use the serial monitor you should be able to test your accuracy. When you say a keyword, you should see the value of the listed keyword go up. Find a mean value which your voice passes and write it down. You can do so by using the following code:

if (result.classification[INDEX].value > THRESHOLD){}

The index value is an integer that represents the keyword. You can find it by using the serial monitor.

When you're ready, upload your program to your Arduino.

Assembling

PXL_20220814_205210612.MP.jpg
PXL_20220814_205152442.jpg
PXL_20220814_204849482.jpg

Now everything comes together. Since most of the interesting stuff happens inside of the Arduino, it looks very simple from the outside, and it is! Simply solder the led and resistor to the board and you're set! After that, place it inside your housing and connect it to the power source. When connected, the Arduino is constantly listening for keywords and will execute your command when it (thinks to) hear one.

My Code

In this section I will post my code for you to use freely. The code makes a LED slowly get bright at the recognition of one keyword, and get dark again at the recognition of the other.


#define EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW 3


/*

 ** NOTE: If you run into TFLite arena allocation issue.

 **

 ** This may be due to may dynamic memory fragmentation.

 ** Try defining "-DEI_CLASSIFIER_ALLOCATION_STATIC" in boards.local.txt (create

 ** if it doesn't exist) and copy this file to

 ** `<ARDUINO_CORE_INSTALL_PATH>/arduino/hardware/<mbed_core>/<core_version>/`.

 **

 ** See

 ** (https://support.arduino.cc/hc/en-us/articles/360012076960-Where-are-the-installed-cores-located-)

 ** to find where Arduino installs cores on your machine.

 **

 ** If the problem persists then there's not enough memory for this model and application.

 */


/* Includes ---------------------------------------------------------------- */

#include <PDM.h>

#include <Speech_Recognition_inferencing.h>


/** Audio buffers, pointers and selectors */

typedef struct {

  signed short *buffers[2];

  unsigned char buf_select;

  unsigned char buf_ready;

  unsigned int buf_count;

  unsigned int n_samples;

} inference_t;


bool increasing;

bool decreasing;

bool listening;

int LightAmount;


const int WaitTime = 50;

const int StepSize = 3;


static inference_t inference;

static bool record_ready = false;

static signed short *sampleBuffer;

static bool debug_nn = false; // Set this to true to see e.g. features generated from the raw signal

static int print_results = -(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW);


/**

 * @brief   Arduino setup function

 */

void setup()

{

  // put your setup code here, to run once:

  Serial.begin(115200);



  Serial.println("Edge Impulse Inferencing Demo");


  // summary of inferencing settings (from model_metadata.h)

  ei_printf("Inferencing settings:\n");

  ei_printf("\tInterval: %.2f ms.\n", (float)EI_CLASSIFIER_INTERVAL_MS);

  ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);

  ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);

  ei_printf("\tNo. of classes: %d\n", sizeof(ei_classifier_inferencing_categories) /

                      sizeof(ei_classifier_inferencing_categories[0]));


  run_classifier_init();

  if (microphone_inference_start(EI_CLASSIFIER_SLICE_SIZE) == false) {

    ei_printf("ERR: Failed to setup audio sampling\r\n");

    return;

  }


  increasing = false;

  decreasing = false;

  listening = true;

  LightAmount = 0;


  pinMode(A0,OUTPUT);

}


/**

 * @brief   Arduino main function. Runs the inferencing loop.

 */

void loop()

{

  bool m = microphone_inference_record();

  if (!m) {

    ei_printf("ERR: Failed to record audio...\n");

    return;

  }


  signal_t signal;

  signal.total_length = EI_CLASSIFIER_SLICE_SIZE;

  signal.get_data = &microphone_audio_signal_get_data;

  ei_impulse_result_t result = {0};


  EI_IMPULSE_ERROR r = run_classifier_continuous(&signal, &result, debug_nn);

  if (r != EI_IMPULSE_OK) {

    ei_printf("ERR: Failed to run classifier (%d)\n", r);

    return;

  }


  if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) {

    // print the predictions

    ei_printf("Predictions ");

    ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)",

      result.timing.dsp, result.timing.classification, result.timing.anomaly);

    ei_printf(": \n");

    for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {

      ei_printf("  %s: %.5f\n", result.classification[ix].label,

           result.classification[ix].value);

    }

#if EI_CLASSIFIER_HAS_ANOMALY == 1

    ei_printf("  anomaly score: %.3f\n", result.anomaly);

#endif


    print_results = 0;

  }


  /* the underlying code lightens or darkens the led gradually after hearing a keyword

   */

   

 if (listening == true)

  {

    if (result.classification[3].value > 0.55){

   increasing = true;

   listening = false;

    }


    if (result.classification[2].value > 0.3){

   decreasing = true;

   listening = false;

    

  }

   

  }

   

  if (increasing == true){

    while ( LightAmount <= 255 )

    {

     analogWrite( A0, LightAmount );

     delay( WaitTime );

     LightAmount = LightAmount + StepSize;

      }

      

    if (LightAmount >= 255)

    {

     LightAmount = 255;

     listening = true;

     increasing = false;

    }

  }


   if (decreasing == true){

    while ( LightAmount >= 0 )

    {

     analogWrite( A0, LightAmount );

     delay( WaitTime );

     LightAmount = LightAmount - StepSize;

      }

      

    if (LightAmount <= 0)

    {

     LightAmount = 0;

     listening = true;

     decreasing = false;

    }

  }


   

}


/**

 * @brief   PDM buffer full callback

 *       Get data and call audio thread callback

 */

static void pdm_data_ready_inference_callback(void)

{

  int bytesAvailable = PDM.available();


  // read into the sample buffer

  int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);


  if (record_ready == true) {

    for (int i = 0; i<bytesRead>> 1; i++) {

      inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];


      if (inference.buf_count >= inference.n_samples) {

        inference.buf_select ^= 1;

        inference.buf_count = 0;

        inference.buf_ready = 1;

      }

    }

  }

}


/**

 * @brief   Init inferencing struct and setup/start PDM

 *

 * @param[in] n_samples The n samples

 *

 * @return   { description_of_the_return_value }

 */

static bool microphone_inference_start(uint32_t n_samples)

{

  inference.buffers[0] = (signed short *)malloc(n_samples * sizeof(signed short));


  if (inference.buffers[0] == NULL) {

    return false;

  }


  inference.buffers[1] = (signed short *)malloc(n_samples * sizeof(signed short));


  if (inference.buffers[1] == NULL) {

    free(inference.buffers[0]);

    return false;

  }


  sampleBuffer = (signed short *)malloc((n_samples >> 1) * sizeof(signed short));


  if (sampleBuffer == NULL) {

    free(inference.buffers[0]);

    free(inference.buffers[1]);

    return false;

  }


  inference.buf_select = 0;

  inference.buf_count = 0;

  inference.n_samples = n_samples;

  inference.buf_ready = 0;


  // configure the data receive callback

  PDM.onReceive(&pdm_data_ready_inference_callback);


  PDM.setBufferSize((n_samples >> 1) * sizeof(int16_t));


  // initialize PDM with:

  // - one channel (mono mode)

  // - a 16 kHz sample rate

  if (!PDM.begin(1, EI_CLASSIFIER_FREQUENCY)) {

    ei_printf("Failed to start PDM!");

  }


  // set the gain, defaults to 20

  PDM.setGain(127);


  record_ready = true;


  return true;

}


/**

 * @brief   Wait on new data

 *

 * @return   True when finished

 */

static bool microphone_inference_record(void)

{

  bool ret = true;


  if (inference.buf_ready == 1) {

    ei_printf(

      "Error sample buffer overrun. Decrease the number of slices per model window "

      "(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)\n");

    ret = false;

  }


  while (inference.buf_ready == 0) {

    delay(1);

  }


  inference.buf_ready = 0;


  return ret;

}


/**

 * Get raw audio signal data

 */

static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr)

{

  numpy::int16_to_float(&inference.buffers[inference.buf_select ^ 1][offset], out_ptr, length);


  return 0;

}


/**

 * @brief   Stop PDM and release buffers

 */

static void microphone_inference_end(void)

{

  PDM.end();

  free(inference.buffers[0]);

  free(inference.buffers[1]);

  free(sampleBuffer);

}


#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_MICROPHONE

#error "Invalid model for current sensor."

#endif