ESP32-CAM Person Detection Experiment With TensorFlow Lite

12510 Views, 3 Favorites, 0 Comments

ESP32-CAM Person Detection Experiment With TensorFlow Lite

In order to demonstrate the capability of ESP32 TensorFlow Lite Arduino library, a "person detection" example is bundled. This is something really nice!

To be picky however, I guess the example is a bit complicated for many amateurs like me. Hence, I would try to simplify running of that Deep Learning model for "person detection", with ESP32-CAM and DumbDisplay.

ESP32-CAM is used to run the DL model; and for capturing images for "person detection".
DumbDisplay acts as the UI for the experiment.

This post will only show the Arduino sketch for running the sample "person detection" DL model with TensorFlow Lite. If you are interested in training a simple TensorFlow DL model, you may want to refer to my previous post -- Trying Out TensorFlow Lite Hello World Model With ESP32 and DumbDisplay

I would like to stress that the sketch here is heavily based on the "person detection" example that comes with the ESP32 TensorFlow Lite Arduino library. Not only that code segments are copied from the sample, the exact same DL model is used as-is.

The capturing of images part is based on my previous experience with ESP32-CAM. For the steps of setting ESP32-CAM up, you may want to refer to my previous YouTube video ESP32-CAM Experiment -- Capture and Stream Pictures to Mobile Phone.

Preparation

To compile and run the sketch shown in this post, you will need the followings:

TensorFlow Lite ESP32 library. Open your Arduino IDE; go to the menu item Tools | Manage Libraries, and type "tensorflow lite esp32" in the search box there.
DumbDisplay Arduino library. Open your Arduino IDE; go to the menu item Tools | Manage Libraries, and type "dumbdisplay" in the search box there.
For your Android phone, you will need to install the DumbDisplay Android app.

The Sketch

You can download the main sketch esp32cam_person.ino here. Place it in a directory called esp32cam_person.

Additionally, you will also need the DL model code person_detect_model_data.h. It will take you a few steps to get it from the "person detection" example of TensorFlow Lite ESP32 library.

Assuming you have installed the TensorFlow Lite ESP32 library, you will have the DL model code file installed to your computer's storage, for example

C:\Users\you\Documents\Arduino\libraries\TensorFlowLite_ESP32\examples\person_detection\person_detect_model_data.cpp

Copy the file person_detect_model_data.cpp to the directory esp32cam_person.
Rename person_detect_model_data.cpp to person_detect_model_data.h.
Comment out person_detect_model_data.h the line

//#include "person_detect_model_data.h"

You need to do this due to the way how the DL model code is included with the sketch here, as the variable g_person_detect_model_data.

The DL model is for "person detection" on a grayscale image of size 96x96.
The output of the DL model inference are two metrics -- 1) the probability of "person present" in the image; and 2) the probability of "no person present" in the image. The sketch will determine whether a person is present solely by the probability of "person present" in the image.

Here is a brief walk-through of the sketch.

First, a DumbDisplay object is created.

DumbDisplay dumbdisplay(new DDBluetoothSerialIO("ESP32CAM"));

Notice that connecting with your DumbDisplay Android app is via Bluetooth, with name ESP32CAM.

Then, the DL model code is pulled in by including the file person_detect_model_data.h you prepared previously.

#include "person_detect_model_data.h"

Then, an "error reporter" object is created, which is needed by TensorFlow Lite library.

tflite::ErrorReporter* error_reporter = new DDTFLErrorReporter();

Note that DDTFLErrorReporter is specifically for reporting errors to DumbDisplay app (as comments).

Then, a tflite::Model object is created from the DL model g_person_detect_model_data.

const tflite::Model* model = ::tflite::GetModel(g_person_detect_model_data);

In the setup block. First, DumbDisplay is configured.

  // create and setup [top] graphical layer for showing candidate image for person detection;
  // clicking it will invoke person detection
  detectImageLayer = dumbdisplay.createGraphicalLayer(imageWidth, imageHeight);
  detectImageLayer->padding(3);
  detectImageLayer->border(3, "blue", "round");
  detectImageLayer->backgroundColor("blue");
  detectImageLayer->enableFeedback("fl");

  // create and setup [middle] LCD layer for showing person detection status
  statusLayer = dumbdisplay.createLcdLayer(16, 4);
  statusLayer->padding(5);

  // create and setup [bottom] graphical layer for showing the image used for image detection
  personImageLayer = dumbdisplay.createGraphicalLayer(imageWidth, imageHeight);
  personImageLayer->padding(3);
  personImageLayer->border(3, "blue", "round");
  personImageLayer->backgroundColor("blue");

  // auto pin the layers vertically
  dumbdisplay.configAutoPin(DD_AP_VERT);

At the top, a graphical layer detectImageLayer with size 96x96 is used for drawing the candidate image captured by ESP32-CAM for "person detection".
In the middle, a LCD layer statusLayer with 4 rows of 16 characters each is used for displaying the "person detection" status.
At the bottom, another graphical layer personImageLayer with size 96x96 is used for drawing the "person detected" image.

Next in the setup block, TensorFlow Lite library is prepared.

First, TensorFlow Lite version is checked to make sure it is the correct version the model expects.

  // check version to make sure supported
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    error_reporter->Report("Model provided is schema version %d not equal to supported version %d.",
    model->version(), TFLITE_SCHEMA_VERSION);
  }

Then, the needed memory (81K) is allocated from the heap.

  // allocation memory for tensor_arena
  tensor_arena = (uint8_t *) heap_caps_malloc(tensor_arena_size, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
  if (tensor_arena == NULL) {
    error_reporter->Report("heap_caps_malloc() failed");
    return;
  }

Then, the needed TensorFlow operation implementations are declared.

  // pull in only the operation implementations needed
  tflite::MicroMutableOpResolver<5>* micro_op_resolver = new tflite::MicroMutableOpResolver<5>();
  micro_op_resolver->AddAveragePool2D();
  micro_op_resolver->AddConv2D();
  micro_op_resolver->AddDepthwiseConv2D();
  micro_op_resolver->AddReshape();
  micro_op_resolver->AddSoftmax();

Note that what TensorFlow operations are needed depends on how the DL model is trained.

Then, a tflite::MicroInterpreter object is created

  // build an interpreter to run the model with
  interpreter = new tflite::MicroInterpreter(model, *micro_op_resolver, tensor_arena, tensor_arena_size, error_reporter);

The previously prepared TensorFlow Lite objects are passed to the tflite::MicroInterpreter constructor as arguments.
The created object is assigned to the variable interpreter for "person detection" in the loop block.

Then, AllocateTensors is called to allocate resources from the previously allocated memory tensor_arena.

  // allocate memory from the tensor_arena for the model's tensors
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    error_reporter->Report("AllocateTensors() failed");
    return;
  }

Last for preparing TensorFlow Lite library, "input" is acquired and assigned to the variable input.

  // obtain a pointer to the model's input tensor
  input = interpreter->input(0);

In the loop block, "person detection" on the captured image goes like

...
  // capture candidate image for person detection
  camera_fb_t* capturedImage = captureImage(false);
  if (capturedImage == NULL) {
    error_reporter->Report("Error: Camera capture failed");
    return;
  }
...
  detectImageLayer->cachePixelImageGS(imageName, capturedImage->buf, imageWidth, imageHeight);  // cache image for drawing
  detectImageLayer->drawImageFileFit(imageName);
...
  // check if detectImageLayer (top; candidate image) clicked
  // if clicked, invoke person detection
  if (detectImageLayer->getFeedback() != NULL) {
...
    // copy an image with a person into the memory area used for the input
    const uint8_t* person_data = capturedImage->buf;
    for (int i = 0; i < input->bytes; ++i) {
      input->data.int8[i] = person_data[i] ^ 0x80;
    }
...
    TfLiteStatus invoke_status = interpreter->Invoke();
    if (invoke_status != kTfLiteOk) {
      error_reporter->Report("Invoke failed");
    }
...
  }
...

Image data is captured by ESP32-CAM.
The image captured is drawn to the top layer detectImageLayer. (Effectively, streaming of images to the top layer continuously.)
Only after you clicked on the top layer with the image shown, then "person detection" will be performed.
The image data is set to the "input" of TensorFlow Lite. Note that since the DL model is quantized, it is necessary to manipulate the image data (the pixel grayscale values) converting them from unsigned to signed format (basically, minus 128 from the pixel grayscale value).
The Invoke method of interpreter object is called to do inferencing.

The inference result is interpreted like

    // process the inference (person detection) results
    TfLiteTensor* output = interpreter->output(0);
    int8_t _person_score = output->data.int8[kPersonIndex];
    int8_t _no_person_score = output->data.int8[kNotAPersonIndex];
    float person_score = (_person_score - output->params.zero_point) * output->params.scale;  // person_score should be chance from 0 to 1
    float no_person_score = (_no_person_score - output->params.zero_point) * output->params.scale;
    bool detected_person = person_score > PersonScoreThreshold;

The variable _person_score is read from the "output", and it is "normalized" (to probability) to person_score.
Whether a person is detected in the image is solely based on person_score; no_person_score is basically not used.

And the result is presented to the UI DumbDisplay layers like

    personImageLayer->unloadImageFile(imageName);  // remove any previous caching
    if (detected_person) {
      // save image to phone
      dumbdisplay.savePixelImageGS(imageName, capturedImage->buf, imageWidth, imageHeight);
      dumbdisplay.writeComment("detected ... save image to phone");
    } else {
      // only cache image for drawing
      personImageLayer->cachePixelImageGS(imageName, capturedImage->buf, imageWidth, imageHeight);
    }
    personImageLayer->drawImageFileFit(imageName);
...
    statusLayer->clear();
    if (detected_person) {
      personImageLayer->backgroundColor("green");
      statusLayer->pixelColor("darkgreen");
      statusLayer->writeCenteredLine("Detected!", 0);
    } else {
      personImageLayer->backgroundColor("gray");
      statusLayer->pixelColor("darkgray");
      statusLayer->writeCenteredLine("NO person!", 0);
    }
    statusLayer->writeLine(String("  SCORE : ") + String((int8_t) (100 * person_score)) + "%", 2);
    statusLayer->writeLine(String("  IN    : ") + String((float) detect_taken_millis / 1000.0) + "s", 3);

If a person is detected, the image is saved to your phone.
Regardless, the image is drawn to the bottom layer personImageLayer.
And, text is written to the middle layer statusLayer, showing the status of "person detection".

Run the Sketch

After uploading the sketch to your ESP32-CAM, you will need to connect it to your DumbDisplay Android app, via Bluetooth with name ESP32CAM. You may want to refer to my previous post -- Setup HC-05 and HC-06, for Wireless 'Number Invaders' -- showing how to make Bluetooth connection with DumbDisplay app.

As hinted previously, the UI is composed of three DumbDisplay layers, laid out vertically.

The top layer is a graphical LCD layer for showing candidate images for "person detection". The candidate images are captured by ESP32-CAM and shown to that top layer continuously. Once you properly positioned the camera, you click on the image -- the top layer -- to trigger "person detection" on the mage.

After "person detection", the image is shown to the bottom layer, which is also a graphical LCD layer. Notice that if person is detected, a green border surrounds the image; otherwise, a gray border is used.

In the middle is a text LCD layer for showing "person detection" status.

Whether "person detection" is in progress.
After "person detection", the result is presented, showing 1) whether a person is detected, 2) the score of the "person detection" in percentage, and 3) the time it took for the "personal detection" in seconds.

Enjoy!

Maybe I am a bit bias. However, I still hope that you will find the sketch shown in this post easy to understand and run. Enjoy!

Peace be with you. Jesus loves you. May God bless you!