Camera and Mic Arduino Experiment With TTGO T-Camera Plus and ESP32-Cam With INMP441, Featuring ESP Face Detection

3137 Views, 2 Favorites, 0 Comments

Camera and Mic Arduino Experiment With TTGO T-Camera Plus and ESP32-Cam With INMP441, Featuring ESP Face Detection

In this post, I am going to describe an experiment combining the use of a camera and a mic, with TTGO T-Camera Plus as well as ESP32-CAM board with INMP441 attached.

In TTGO T-Camera Plus case, I managed to make it to stream both pictures and sound simultaneously; however, picture streaming is limited to a very low frame rate.

In ESP32-CAM case, too bad that I failed to make it to stream both pictures and sound simultaneously with reasonable results. Nevertheless, it can switch between the two -- an optional button can be attached for activating the mic like a walkie-talkie.

This experiment is heavily based on two of my previous experiments -- ESP32-CAM Experiment -- Capture and Stream Pictures to Mobile Phone and ESP32 Mic Testing With INMP441 and DumbDisplay. Moreover, for more fun of it, the experiment also makes use of ESP's support for face detection (no external libraries are needed; no DL model; just the ESP standard libraries).

The targeted boards of the experiment are

AI-Thinker ESP32-CAM; INMP411 mic attachment is optional
LilyGo T-Camera Plus
LilyGo T-SimCam
LilyGo T-Camera (V1.7), even though it doesn't have a mic

I guess other boards, like ESP-Eye can very likely be coded as a target as well; however, I didn't have a chance to play with them.

It is apparent that different targets will have different capabilities. Moreover, according to my experimentation, they may require different settings (e.g. sound sample rate) as well to achieve reasonable results.

Like the two previous experiments I mentioned, this experiment will also use DumbDisplay for the UI, as well as the picture / sound streaming player -- UI is driven by the ESP32 board; stream data are acquired by ES32 with the attached camera and mic.

Originally, I planned to make use of ESP's deep-sleep feature; however, I am unable to build the sketch if ESP's deep-sleep functions are linked. Nevertheless, as long as you keep the ESP board powered up, you can connect to it with DumbDisplay app anytime as desired.

The UI

As shown in the screen captures of the DumbDisplay UI for this experiment, around the area where the stream pictures will be drawn are several options that you can adjust

You can turn ON/OFF the "FLASH"
You can turn ON/OFF "FACE" detection
You can turn ON/OFF "MIC"
On the right top is a charting area, which is supposed to plot the look of the sound wave.
You can use the slider below the chart area to adjust the software-based sound volume amplification factor.
You can switch between the different camera picture sizes with the "SIZE" button -- "QVGA", "VGA", "SVGA"
You can switch between the different software-based "frame rate" throttling with the "RATE" button -- "---" (none), "1", "2", "4", "8"

ESP32-CAM Connection

According to the post ESP32-CAM Pinout Reference:

The VCC pin normally outputs 3.3V from the onboard voltage regulator. It can, however, be configured to output 5V by using the Zero-ohm link near the VCC pin.

Hence

connect ESP32-CAM VCC to VDD of INMP441
connect ESP32-CAM GND to GND and L/R of INMP441 (connecting L/R to GND means using a single I2S for capturing mono sound)
connect ESP32-CAM GPIO12 to WS of INMP441
connect ESP32-CAM GPIO13 to SD of INMP441
connect ESP32-CAM GPIO14 to SCK of INMP441

To power the ESP32-CAM board up

connect ESP32 CAM 5V (input) to VCC of a 5V power source (like via a breadboard)
connect ESP32 CAM GND to the GND of the 5V power source

As for the optional button

connect ESP32-CAM GND to one leg of the button
connect ESP32-CAM GPIO15 to the other leg of the button

For the steps of uploading sketch to the AI-Thinker ESP32-CAM board, you may want to refer to my previous YouTube video ESP32-CAM Experiment -- Capture and Stream Pictures to Mobile Phone

The Sketch

You can download the sketch here🔗

As mentioned previously, this experiment is heavily based on two of my previous experiments -- ESP32-CAM Experiment -- Capture and Stream Pictures to Mobile Phone and ESP32 Mic Testing With INMP441 and DumbDisplay. Hence, the coding cores should already be described in those references and will not be repeated here. Nevertheless, there are specific areas I think should worth mentioning.

First of all, the sketch will require a little bit of configuration from you.

To tell the sketch which target board the sketch is tailored for, you uncomment the appropriate macro, say for ESP32-CAM board, like

// *** For board selection, uncomment one of following ; note that ESP32CAM is AI Thinker board and TCAMERA is v1.7
#define FOR_ESP32CAM
//#define FOR_LILYGO_TCAMERAPLUS
//#define FOR_LILYGO_TSIMCAM
//#define FOR_LILYGO_TCAMERA

As mentioned previously, in the case of using ESP32-CAM, you have options to install INMP441 (mic input) and to install a button to activate the mic. By default, the sketch assumes you have both installed as described above. To disable INMP441, comment out the macros I2S_WS, I2S_SD, and I2S_SCK. To disable the button, comment out MIC_BUTTON_PIN, like

#if defined(FOR_ESP32CAM)
  // *** For ESP_CAM, if button not attached, comment out the following MIC_BUTTON_PIN macro 
  //#define MIC_BUTTON_PIN            15
  #define FLASH_PIN                 4
  // ***  For ESP_CAM, if INMP441 not attached, comments out the following I2S_WS macro 
  //#define I2S_WS                    12
  //#define I2S_SD                    13
  //#define I2S_SCK                   14
...
#elif defined(FOR_LILYGO_TCAMERAPLUS)
...

If ESP32-compatible board is the target, you are suggested to use Bluetooth connectivity to your Android phone (DumbDisplay app). In such a case, you will need to define the macro BLUETOOTH as the Bluetooth device name of your chosen target.

// *** Strongly suggest to use Bluetooth (if board supported), in such a case uncomment the following BLUETOOTH macro, which defines the name of the Bluetooth device (the board)
// *** Otherwise will assume WIFI connectivity; need WIFI_SSID and WIFI_PASSWORD macros
#define BLUETOOTH     "CamMicBT"

In case you need to / want to use WIFI connectivity, you will need to define the macros WIFI_SSID and WIFI_PASSWORD

#define WIFI_SSID     "<wifi-ssid>"
#define WIFI_PASSWORD "<wifi-password>"

The mono sound sampling rate is coded at 16000. Nevertheless, different targets may be using different sound sample bit-count. For ESP32-CAM, the sound sample bit count is 16; for other targets, the sound sample bit count is 32. Note that even though some targets will be using sound sample bit count of 32, sound samples will always be shipped to DumbDisplay app with a sample bit count of 16. The software-based downscaling is a very simple one.

...
#if I2S_SAMPLE_BIT_COUNT == 32
        val = val / 0x0000ffff;  // 32 bit to 16 bit
#endif
...

Face detection with ESP's support is easy

...
#if defined(ENABLE_FACE_DETECTION)
  #include "human_face_detect_msr01.hpp"
  #include "human_face_detect_mnp01.hpp"
  HumanFaceDetectMSR01 detector(0.3F, 0.3F, 10, 0.75F);  // 0.75F is adjusted for 96x96; original 0.3F is for 240x240
  HumanFaceDetectMNP01 detector2(0.4F, 0.3F, 10);
#endif
...
void loop() {
...
#if defined(ENABLE_FACE_DETECTION)
  if (cameraFormat == PIXFORMAT_RGB565) {
    long startMs = millis();
    std::list<dl::detect::result_t> &candidates = detector.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3});
    std::list<dl::detect::result_t> &results = detector2.infer((uint16_t *)fb->buf, {(int)fb->height, (int)fb->width, 3}, candidates);
    long takenMs = millis() - startMs;
    fd_ms = takenMs;
    if (results.size()) {
      Serial.print("* FD:[");
      int i = 0;
      for (std::list<dl::detect::result_t>::iterator prediction = results.begin(); prediction != results.end(); prediction++, i++) {
        // left-eye
        fd_le_x = prediction->keypoint[0];
        fd_le_y = prediction->keypoint[1];
        // right-eye
        fd_re_x = prediction->keypoint[6];
        fd_re_y = prediction->keypoint[7];
        // nose
        fd_n_x = prediction->keypoint[4];
        fd_n_y = prediction->keypoint[5];
        // mouth-left
        fd_ml_x = prediction->keypoint[2];
        fd_ml_y = prediction->keypoint[3];
        // mouth-right
        fd_mr_x = prediction->keypoint[8];
        fd_mr_y = prediction->keypoint[9];
        // face rectangle
        fd_x1 = prediction->box[0];
        fd_y1 = prediction->box[1];
        fd_x2 = prediction->box[2];
        fd_y2 = prediction->box[3];
        Serial.print(i);
        Serial.print(':');
        Serial.print(fd_x1);
        Serial.print(',');
        Serial.print(fd_y1);
        Serial.print('-');
        Serial.print(fd_x2);
        Serial.print(',');
        Serial.print(fd_y2);
      }
      Serial.println("]");
    } else {
      fd_x1 = -1;
    } 
  }
#endif
...
}

A few things to notice:

In order for face detection to work, the camera's format should be set to PIXFORMAT_RGB565. Moreover, in this experiment, the picture size is set to 96x96 as well (unlike the original setting for TTGO T-Camera Plus; see LilyGo-Camera-Series)
If face detection is not enabled, the picture format is JPEG, which produces much less data when compared to formats like RGB565. And this is the reason why, in this experiment, when face detection is enabled, the picture size is set to the lowest 96x96.
In this experiment, two-phrase detection is employed; as a result, in addition to detecting face position, the eyes, the nose, and the mouth locations are highlighted as well. Notice that only the last one of the detected faces is tracked.

Building the Sketch

For TTGo T-Camera Plus, the internally attached 240x240 TFT screen is used in certain cases. As a result, the TFT_eSPI library is needed, and you will need to set up TFT_eSPI library for TTGO T-Camera Plus as well -- modify User_Setup_Selection.h changing #include

...
//#include <User_Setup.h>           // Default setup is root library folder
...
#include <User_Setups/Setup44_TTGO_CameraPlus.h>   // Setup file for ESP32 and TTGO T-CameraPlus ST7789 SPI bus TFT    240x240
...

For a more detailed description of making changes to User_Setup_Selection.h in PlatformIO environment, as well as a sample case of using PlatformIO for building [a sketch that uses DumbDisplay], you may want to refer to my previous post -- Extending a TFT_eSPI Example With TTGO T-Display Using PlatformIO, With DumbDisplay

I used VSCode with PlatformIO for the development of the sketch of this experiment. And here are the sections of platformio.ini where the mentioned target boards are configured

[env:ESP32CAM]
platform = espressif32
board = esp32cam
framework = arduino
lib_deps =
    https://github.com/trevorwslee/Arduino-DumbDisplay#develop
build_flags = -DFOR_ESP32CAM


[env:LILYGO_TCAMERA]  ; v7
platform = espressif32
board = esp-wrover-kit
framework = arduino
lib_deps =
    https://github.com/trevorwslee/Arduino-DumbDisplay#develop
build_flags =
    -DBOARD_HAS_PSRAM -mfix-esp32-psram-cache-issue
    -DFOR_LILYGO_TCAMERA


[env:LILYGO_TCAMERAPLUS]
platform = espressif32
board = esp32dev
framework = arduino
upload_speed = 921600
board_build.partitions = huge_app.csv
lib_deps =
    bodmer/TFT_eSPI @ ^2.5.30
    https://github.com/trevorwslee/Arduino-DumbDisplay#develop
build_flags =
    -mfix-esp32-psram-cache-issue
    -DBOARD_HAS_PSRAM
    -DCONFIG_MFN_V1=1
    -DCONFIG_S8=1
    -DFOR_LILYGO_TCAMERAPLUS


[env:LILYGO_TSIMCAM]
platform = espressif32
board = esp32s3box
framework = arduino
board_build.partitions = default_8MB.csv
lib_deps =
    https://github.com/trevorwslee/Arduino-DumbDisplay#develop
build_flags = 
    -DBOARD_HAS_PSRAM
    -DFOR_LILYGO_TSIMCAM

Notice that with PlatformIO config like the above, the target board selection macro required by the sketch is already defined when you choose the correct PlatformIO config.

Build and upload the sketch, and try it yourself!

Enjoy!

Demo of Camera and Mic Arduino Experiment With TTGO T-Camera Plus and DumbDisplay

I believe it would really be fun to add face recognization feature to the experiment. Until then, have fun and enjoy!

Peace be with you! May God bless you! Jesus loves you!