espressif/esp-dl
uploaded 5 months ago

readme

# ESP-DL [[中文]](./README_cn.md)

ESP-DL is a library for high-performance deep learning resources dedicated to [ESP32](https://www.espressif.com/en/products/socs/esp32), [ESP32-S2](https://www.espressif.com/en/products/socs/esp32-s2), [ESP32-S3](https://www.espressif.com/en/products/socs/esp32-s3) and [ESP32-C3](https://www.espressif.com/en/products/socs/esp32-c3).



## Overview

ESP-DL provides APIs for **Neural Network (NN) Inference**, **Image Processing**, **Math Operations** and some **Deep Learning Models**. With ESP-DL, you can use Espressif's SoCs for AI applications easily and fast.

As ESP-DL does not need any peripherals, it can be used as a component of some projects. For example, you can use it as a component of **[ESP-WHO](https://github.com/espressif/esp-who)**, which contains several project-level examples of image application. The figure below shows what ESP-DL consists of and how ESP-DL is implemented as a component in a project.

<p align="center">
    <img width="%" src="./img/architecture_en.drawio.svg"> 
</p>


## Get Started with ESP-DL

For setup instructions to get started with ESP-DL, please read [Get Started](./docs/en/get_started.md).

> Please use the [latest](https://github.com/espressif/esp-idf/tree/master) ESP-IDF on master branch.



## Try Models in the Model Zoo

ESP-DL provides some model APIs in the [Model Zoo](./include/model_zoo), such as Human Face Detection, Human Face Recognition, Cat Face Detection, etc. You can use these models in the table below out of box.

| Name                 | API Example                                                  |
| :-------------------- | :------------------------------------------------------------ |
| Human Face Detection | [ESP-DL/examples/human_face_detect](examples/human_face_detect) |
| Human Face Recognition | [ESP-DL/examples/face_recognition](examples/face_recognition)  |
| Cat Face Detection | [ESP-DL/examples/cat_face_detect](examples/cat_face_detect)  |


## Customize a Model

To customize a model, please proceed to [How to Customize a Model Step by Step](./tutorial), where the instructions with a runnable example will quickly help you design your model.

When you read the instructions, the following materials might be helpful:

- DL API
    * [About Variables and Constants](./docs/en/about_type_define.md): information about
        - variable: tensors
        - constants: filters, biases, and activations
    * [Customize a Layer Step by Step](./docs/en/implement_custom_layer.md): instructions on how to customize a layer.
    * [API Documentation](./include): guides to provided API about Layer, Neural Network (NN), Math and tools.

        > For API documentation, please refer to annotations in header files for the moment.

- Platform Conversion
    - Quantization Toolkit: a tool for quantizing floating-point models and evaluating quantized models on ESP SoCs
        * Toolkit: see [Quantization Toolkit](./tools/quantization_tool/README.md)
        * Toolkit API: see [Quantization Toolkit API](./tools/quantization_tool/quantization_tool_api.md)
    - Convert Tool: the tool and configuration file for floating-point quantization on coefficient.npy
        * config.json: see [Specification of config.json](./tools/convert_tool/specification_of_config_json.md)
        * convert.py: see [Usage of convert.py](./tools/convert_tool/README.md)

        > convert.py requires Python 3.7 or versions higher.

- Software and Hardware Boost
  
    * [Quantization Specification](./docs/en/quantization_specification.md): rules of floating-point quantization



## Feedback

[Q&A](./docs/en/Q&A.md) lists answers to frequently asked questions.

For feature requests or bug reports, please submit an [issue](https://github.com/espressif/esp-dl/issues). We will prioritize the most anticipated features.

readme of cat_face_detect example

                                        
                                        # Cat Face Detection [[中文]](./README_cn.md)

This project is an example of cat face detection interface. The input to this interface is a static image. The detection results are confidence scores and coordinate values shown in Terminal, which can be converted by a tool into an image shown on your PC screen.

Below is the structure of this project:

```shell
cat_face_detect/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── README.md
├── README_cn.md
└── result.png
```



## Run the Example

1. Open Terminal and go to esp-dl/examples/cat_face_detect, the directory where this project is stored:

    ```shell
    cd ~/esp-dl/examples/cat_face_detect
    ```

2. Set SoC target:

    ```shell
    idf.py set-target [SoC]
    ```
    Replace [SoC] with your target, such as esp32, esp32s2, and esp32s3.

3. Flash the program and launch IDF monitor to obtain the fractional and coordinate values of detection results:

   ```shell
   idf.py flash monitor
   
   ... ...
   
   [0] score: 1.709961, box: [122, 2, 256, 117]
   ```

4. The tool `display_image.py` stored in [examples/tool/](../tool/) allows you to directly view the image of detection results. According to instructions on [Tools](../tool/README.md), run the following command:

   ```shell
   python display_image.py -i ../cat_face_detect/image.jpg -b "(122, 2, 256, 117)"
   ```
   The image of detection results will show on your PC screen as follows:
   
   <p align="center">
    <img width="%" src="./result.png"> 
   </p>


## Customize Input Image

In this example project, [./main/image.hpp](./main/image.hpp) is the default input image. You can use the script `convert_to_u8.py` following instructions on [Tools](../tool/README.md), to convert your own image into C/C++ code in replace of the default input image.

1. Save your image to directory ./examples/cat_face_detect, and use [examples/tool/convert_to_u8.py](../tool/convert_to_u8.py) to convert the image into an hpp file:

   ```shell
   # Assume you are in cat_face_detect directory

   python ../tool/convert_to_u8.py -i ./image.jpg -o ./main/image.hpp
   ```

2. According to steps in Section [Run the Example](#run-the-example), flash the firmware, print the confidence scores and coordinate values of detection results, and view the image of detection results.



## Latency

|   SoC    |    Latency |
| :------: | ---------: |
|  ESP32   | 149,765 us |
| ESP32-S2 | 416,590 us |
| ESP32-S3 |  18,909 us |

> Results above are based on the default configuration of this example.


                                    

readme of color_detect example

                                        
                                        # Color Detection[[中文]](./README_cn.md)

This project is an example of color detection interface. The input to this interface is a static image of different color blocks. The output is results of color enrollment, color detection, color segmentation, color deletion, and other functions provided by this interface, shown in Terminal.

Below is the structure of this project:

```shell
color_detect/
├── CMakeLists.txt
├── rgby.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
└── README.md
└── README_cn.md
```



## Run the Example

1. Open Terminal and go to esp-dl/examples/color_detect, the directory where this project is stored:

    ```shell
    cd ~/esp-dl/examples/color_detect
    ```

2. Set SoC target:

    ```shell
    idf.py set-target [SoC]
    ```
    Replace [SoC] with your target, such as esp32, esp32s2, and esp32s3.

    We recommend using the ESP32-S3 chip, which runs much faster than other chips for AI applications.

3. Flash the program and launch IDF monitor to obtain the results of functions:

   ```shell
   idf.py flash monitor
   
   ... ...
   
   the information of registered colors: 
   name: red, 	thresh: 0, 10, 203, 255, 197, 255
   name: green, 	thresh: 54, 62, 221, 255, 197, 255
   name: blue, 	thresh: 96, 114, 179, 255, 230, 255
   name: yellow, 	thresh: 19, 32, 214, 255, 247, 255
   
   RGB888 | color detection result:
   color 0: detected box :2
   center: (46, 14)
   box: (0, 0), (94, 30)
   area: 768
   center: (14, 110)
   box: (0, 96), (30, 126)
   area: 256
   
   color 1: detected box :2
   center: (110, 30)
   box: (96, 0), (126, 62)
   area: 512
   center: (30, 46)
   box: (0, 32), (62, 62)
   area: 512
   
   color 2: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 3: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   
   RGB565 | color detection result:
   color 0: detected box :2
   center: (46, 14)
   box: (0, 0), (94, 30)
   area: 768
   center: (14, 110)
   box: (0, 96), (30, 126)
   area: 256
   
   color 1: detected box :2
   center: (110, 30)
   box: (96, 0), (126, 62)
   area: 512
   center: (30, 46)
   box: (0, 32), (62, 62)
   area: 512
   
   color 2: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 3: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   remained colors num: 3
   
   Blue, Yellow | color detection result:
   color 0: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 1: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   
   Blue, Yellow | color segmentation result:
   color 0: detected box :2
   box_index: 0, start col: 32, end col: 47, row: 16, area: 768
   box_index: 0, start col: 32, end col: 47, row: 17, area: 768
   box_index: 0, start col: 32, end col: 47, row: 18, area: 768
   box_index: 0, start col: 32, end col: 47, row: 19, area: 768
   box_index: 0, start col: 32, end col: 47, row: 20, area: 768
   box_index: 0, start col: 32, end col: 47, row: 21, area: 768
   box_index: 0, start col: 32, end col: 47, row: 22, area: 768
   box_index: 0, start col: 32, end col: 47, row: 23, area: 768
   box_index: 0, start col: 32, end col: 47, row: 24, area: 768
   box_index: 0, start col: 32, end col: 47, row: 25, area: 768
   box_index: 0, start col: 32, end col: 47, row: 26, area: 768
   box_index: 0, start col: 32, end col: 47, row: 27, area: 768
   box_index: 0, start col: 32, end col: 47, row: 28, area: 768
   box_index: 0, start col: 32, end col: 47, row: 29, area: 768
   box_index: 0, start col: 32, end col: 47, row: 30, area: 768
   box_index: 0, start col: 32, end col: 47, row: 31, area: 768
   box_index: 1, start col: 0, end col: 15, row: 32, area: 256
   box_index: 0, start col: 32, end col: 63, row: 32, area: 768
   box_index: 1, start col: 0, end col: 15, row: 33, area: 256
   box_index: 0, start col: 32, end col: 63, row: 33, area: 768
   box_index: 1, start col: 0, end col: 15, row: 34, area: 256
   box_index: 0, start col: 32, end col: 63, row: 34, area: 768
   box_index: 1, start col: 0, end col: 15, row: 35, area: 256
   box_index: 0, start col: 32, end col: 63, row: 35, area: 768
   box_index: 1, start col: 0, end col: 15, row: 36, area: 256
   box_index: 0, start col: 32, end col: 63, row: 36, area: 768
   box_index: 1, start col: 0, end col: 15, row: 37, area: 256
   box_index: 0, start col: 32, end col: 63, row: 37, area: 768
   box_index: 1, start col: 0, end col: 15, row: 38, area: 256
   box_index: 0, start col: 32, end col: 63, row: 38, area: 768
   box_index: 1, start col: 0, end col: 15, row: 39, area: 256
   box_index: 0, start col: 32, end col: 63, row: 39, area: 768
   box_index: 1, start col: 0, end col: 15, row: 40, area: 256
   box_index: 0, start col: 32, end col: 63, row: 40, area: 768
   box_index: 1, start col: 0, end col: 15, row: 41, area: 256
   box_index: 0, start col: 32, end col: 63, row: 41, area: 768
   box_index: 1, start col: 0, end col: 15, row: 42, area: 256
   box_index: 0, start col: 32, end col: 63, row: 42, area: 768
   box_index: 1, start col: 0, end col: 15, row: 43, area: 256
   box_index: 0, start col: 32, end col: 63, row: 43, area: 768
   box_index: 1, start col: 0, end col: 15, row: 44, area: 256
   box_index: 0, start col: 32, end col: 63, row: 44, area: 768
   box_index: 1, start col: 0, end col: 15, row: 45, area: 256
   box_index: 0, start col: 32, end col: 63, row: 45, area: 768
   box_index: 1, start col: 0, end col: 15, row: 46, area: 256
   box_index: 0, start col: 32, end col: 63, row: 46, area: 768
   box_index: 1, start col: 0, end col: 15, row: 47, area: 256
   box_index: 0, start col: 32, end col: 63, row: 47, area: 768
   
   color 1: detected box :1
   box_index: 0, start col: 16, end col: 31, row: 32, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 33, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 34, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 35, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 36, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 37, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 38, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 39, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 40, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 41, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 42, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 43, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 44, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 45, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 46, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 47, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 48, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 49, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 50, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 51, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 52, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 53, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 54, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 55, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 56, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 57, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 58, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 59, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 60, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 61, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 62, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 63, area: 1024
   
   ```

  
                                    

readme of face_recognition example

                                        
                                        # Face Recognition [[中文]](./README_cn.md)

This project is an example of human face recognition interface. The input to this interface is a static image of a human face. The output is results of face ID enrollment, face recognition, face ID deletion, and other functions provided by this interface, shown in Terminal.

The interface provides two model versions: 16-bit quantization model and 8-bit quantization model. Compared with the 8-bit quantization model, the 16-bit quantization model has higher accuracy but larger memory footprint and longer latency. You can select the appropriate model according to your scenario.

Below is the structure of this project:

```shell
face_recognition/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
└── README.md
└── README_cn.md
```



## Run the Example

1. Open Terminal and go to esp-dl/examples/face_recognition, the directory where this project is stored:

    ```shell
    cd ~/esp-dl/examples/face_recognition
    ```

2. Set SoC target:

    ```shell
    idf.py set-target [SoC]
    ```
    Replace [SoC] with your target, such as esp32, esp32s2, and esp32s3.

    We recommend using the ESP32-S3 chip, which runs much faster than other chips for AI applications.

3. Flash the program and launch IDF monitor to obtain the results of functions:

   ```shell
   idf.py flash monitor
   
   ... ...
   
   E (1907) MFN: Flash is empty
   
   enroll id ...
   name: Sandra, id: 1
   name: Jiong, id: 2
   
   recognize face ...
   [recognition result] id: 1, name: Sandra, similarity: 0.728666
   [recognition result] id: 2, name: Jiong, similarity: 0.827225
   
   recognizer information ...
   recognizer threshold: 0.55
   input shape: 112, 112, 3
   
   face id information ...
   number of enrolled ids: 2
   id: 1, name: Sandra
   id: 2, name: Jiong
   
   delete id ...
   number of remaining ids: 1
   [recognition result] id: -1, name: unknown, similarity: 0.124767
   
   enroll id ...
   name: Jiong, id: 2
   write 2 ids to flash.
   
   recognize face ...
   [recognition result] id: 1, name: Sandra, similarity: 0.758815
   [recognition result] id: 2, name: Jiong, similarity: 0.722041
   
   ```

## Other Configuration

1. At the beginning of [./main/app_main.cpp](./main/app_main.cpp), there is a macro definition called `QUANT_TYPE` that defines the version of quantization models.

    - `QUANT_TYPE` = 0: Use the 8-bit quantization model, which has lower accuracy but smaller memory footprint and shorter latency.
    - `QUANT_TYPE` = 1: Use the 16-bit quantization model, which has the same recognition accuracy as the floating-point model. 

    You can select the appropriate model according to your scenario.


2. At the beginning of [./main/app_main.cpp](./main/app_main.cpp), there is another macro definition called `USE_FACE_DETECTOR` that defines the way to obtain face landmark coordinates.

    - `USE_FACE_DETECTOR` = 0: Use the face landmark coordinates stored in ./image.hpp.
    - `USE_FACE_DETECTOR` = 1: Obtain face landmark coordinates using our face detection model.

    Note that the order of face landmark coordinates is:
  
   ```
      left_eye_x, left_eye_y, 
      mouth_left_x, mouth_left_y,
      nose_x, nose_y,
      right_eye_x, right_eye_y, 
      mouth_right_x, mouth_right_y
   ```

## Latency

| SoC | 8-bit | 16-bit |
|:---:| ----:| ----:|
| ESP32 | 13,301 ms | 5,041 ms |
| ESP32-S3 | 287 ms | 554 ms |

  

  
                                    

readme of human_face_detect example

                                        
                                        # Human Face Detection [[中文]](./README_cn.md)

This project is an example of human face detection interface. The input to this interface is a static image. The detection results are confidence scores and coordinate values shown in Terminal, which can be converted by a tool into an image shown on your PC screen.

Below is the structure of this project:

```shell
human_face_detect/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
├── README.md
├── README_cn.md
└── result.png
```



## Run the Example

1. Open Terminal and go to esp-dl/examples/human_face_detect, the directory where this project is stored:

    ```shell
    cd ~/esp-dl/examples/human_face_detect
    ```

2. Set SoC target:

    ```shell
    idf.py set-target [SoC]
    ```
    Replace [SoC] with your target, such as esp32, esp32s2, and esp32s3.

3. Flash the program and launch IDF monitor to obtain the fractional and coordinate values of detection results:

   ```shell
   idf.py flash monitor
   
   ... ...
   
   [0] score: 0.987580, box: [137, 75, 246, 215]
       left eye: (157, 131), right eye: (158, 177)
       nose: (170, 163)
       mouth left: (199, 133), mouth right: (193, 180)
   ```

4. The tool `display_image.py` stored in [examples/tool/](../tool/) allows you to directly view the image of detection results. According to instructions on [Tools](../tool/README.md), run the following command:

   ```shell
   python display_image.py -i ../human_face_detect/image.jpg -b "(137, 75, 246, 215)" -k "(157, 131, 158, 177, 170, 163, 199, 133, 193, 180)"
   ```
    The image of detection results will show on your PC screen as follows:
   

   <p align="center">
    <img width="%" src="./result.png"> 
   </p>


## Other Configuration

At the beginning of [./main/app_main.cpp](./main/app_main.cpp), there is a macro definition called `TWO_STAGE` that defines target detection algorithms. As annotations suggest:

- `TWO_STAGE` = 1: two-stage detectors with higher accuracy (support for facial landmarks) but lower speed.
- `TWO_STAGE` = 0: one-stage detectors with relatively lower accuracy (no support for facial landmarks) but higher speed.

You can experience the differences of the two detectors.



## Customize Input Image

In this example project, [./main/image.hpp](./main/image.hpp) is the default input image. You can use the script `convert_to_u8.py` following instructions on [Tools](../tool/README.md), to convert your own image into C/C++ code in replace of the default input image.

1. Save your image to directory ./examples/human_face_detect , and use [examples/tool/convert_to_u8.py](../tool/convert_to_u8.py) to convert the image into an hpp file:

   ```shell
   # Assume you are in human_face_detect 

   python ../tool/convert_to_u8.py -i ./image.jpg -o ./main/image.hpp
   ```

2. According to steps in Section [Run the Example](#run-the-example), flash the firmware, print the confidence scores and coordinate values of detection results, and view the image of detection results.



## Latency

|   SoC    | `TWO_STAGE` = 1 | `TWO_STAGE` = 0 |
| :------: | --------------: | --------------: |
|  ESP32   |      415,246 us |      154,687 us |
| ESP32-S2 |    1,052,363 us |      309,159 us |
| ESP32-S3 |       56,303 us |       16,614 us |

> Results above are based on the default configuration of this example.

                                    

Supports all targets

License: MIT

To add this component to your project, run:

idf.py add-dependency "espressif/esp-dl^1.1.0"

or download archive

Examples:

cat_face_detect

more details

color_detect

more details

face_recognition

more details

human_face_detect

more details

Stats

  • Downloaded in total
    Downloaded in total 60 times
  • Downloaded this version
    This version: 44 times

Badge

espressif/esp-dl version: 1.1.0 |