espressif/esp-dl

espressif/esp-dl

readme (zh)

# ESP-DL [[English]](./README.md) ESP-DL 是由乐鑫官方针对乐鑫系列芯片 [ESP32](https://www.espressif.com/en/products/socs/esp32)、[ESP32-S2](https://www.espressif.com/en/products/socs/esp32-s2)、[ESP32-S3](https://www.espressif.com/en/products/socs/esp32-s3) 和 [ESP32-C3](https://www.espressif.com/en/products/socs/esp32-c3) 所提供的高性能深度学习开发库。更多文档请查看 [ESP-DL 用户指南](https://docs.espressif.com/projects/esp-dl/zh_CN/latest/esp32/index.html) ## 概述 ESP-DL 为**神经网络推理**、**图像处理**、**数学运算**以及一些**深度学习模型**提供 API，通过 ESP-DL 能够快速便捷地将乐鑫各系列芯片产品用于人工智能应用。 ESP-DL 无需借助任何外围设备，因此可作为一些项目的组件，例如可将其作为 **[ESP-WHO](https://github.com/espressif/esp-who)** 的一个组件，该项目包含数个项目级图像应用实例。下图展示了 ESP-DL 的组成及作为组件时在项目中的位置。 <p align="center"> <img width="%" src="./docs/_static/architecture_cn.drawio.svg"> </p> ## 入门指南安装并入门 ESP-DL，请参考[快速入门](./docs/en/get_started.md)。 > 请使用 ESP-IDF 在 release/v5.0 分支上的[最新版本](https://github.com/espressif/esp-idf/tree/release/v5.0)。 ## 尝试模型库中的模型 ESP-DL 在 [模型库](./include/model_zoo) 中提供了一些模型的 API，如人脸检测、人脸识别、猫脸检测等。您可以使用下表中开箱即用的模型。 | 项目 | API 实例 | | -------------------- | ------------------------------------------------------------ | | 人脸检测 | [ESP-DL/examples/human_face_detect](examples/human_face_detect) | | 人脸识别 | [ESP-DL/examples/face_recognition](examples/face_recognition) | | 猫脸检测 | [ESP-DL/examples/cat_face_detect](examples/cat_face_detect) | ## 部署你的模型我们推荐使用 TVM 来部署你的模型，具体可参考 [ESP-DL/tutorial/tvm_example](tutorial/tvm_example)。 ## 反馈如果您在使用中发现了错误或者需要新的功能，请提交相关 [issue](https://github.com/espressif/esp-dl/issues)，我们会优先实现最受期待的功能。

readme (zh) of cat_face_detect example

                                        
                                        # 猫脸检测 [[English]](./README.md)

本项目为猫脸检测接口的示例。猫脸检测接口的输入图片为静态图片，检测结果的置信度分数和坐标值可显示在终端中，检测结果的图片可通过工具显示在 PC 屏上。

项目所在文件夹结构如下：

```shell
cat_face_detect/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── README.md
├── README_cn.md
└── result.png
```



## 运行示例

1. 打开终端，进入猫脸检测示例所在文件夹 esp-dl/examples/cat_face_detect：

    ```shell
    cd ~/esp-dl/examples/cat_face_detect
    ```

2. 设定目标芯片：

    ```shell
    idf.py set-target [SoC]
    ```
    将 [SoC] 替换为您的目标芯片，如 esp32、esp32s2、esp32s3。

3. 烧录程序，运行 IDF 监视器获取检测结果的分数值和坐标值：

   ```shell
   idf.py flash monitor
   
   ... ...
   
   [0] score: 1.709961, box: [122, 2, 256, 117]
   ```

4. 存放在 [examples/tool/](../tool/) 目录下的显示工具 `display_image.py`，可方便您更直观地查看检测结果的图片。根据[工具](../tool/README_cn.md)介绍使用显示工具，运行如下命令：

   ```shell
   python display_image.py -i ../cat_face_detect/image.jpg -b "(122, 2, 256, 117)"
   ```
   PC 屏上会显示当前示例检测结果的图片，如下图所示：

   <p align="center">
    <img width="%" src="./result.png"> 
   </p>
   

## 自定义输入图片

示例中 [./main/image.hpp](./main/image.hpp) 是预设的输入图片。您可根据[工具](../tool/README_cn.md)介绍，使用存放在 [example/tool/](../tool/) 目录下的转换工具 `convert_to_u8.py`，将自定义图片转换成 C/C++ 的形式，替换预设图片。

1. 将自定义图片存放至 ./examples/cat_face_detect 目录下，使用 [examples/tool/convert_to_u8.py](../tool/convert_to_u8.py) 把图片转换为 hpp 格式：

   ```shell
   # 假设当前仍在目录 cat_face_detect 下

   python ../tool/convert_to_u8.py -i ./image.jpg -o ./main/image.hpp
   ```

2. 参考[运行示例](#运行示例)中的步骤，烧录固件，打印检测结果的置信度分数和坐标值，显示检测结果的图片。



## 延时情况

|   芯片   |       耗时 |
| :------: | ---------: |
|  ESP32   | 149,765 us |
| ESP32-S2 | 416,590 us |
| ESP32-S3 |  18,909 us |

> 以上数据基于示例的默认配置。

readme (zh) of color_detect example

                                        
                                        # 颜色检测 [[English]](./README.md)

本项目为颜色检测接口的示例。颜色检测接口的输入图片为一张带有不同色块的静态图片。
输出是录入颜色、检测颜色、分割颜色， 删除颜色等接口功能的运行结果，显示在终端中。

项目所在文件夹结构如下：

```shell
color_detect/
├── CMakeLists.txt
├── rgby.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
└── README.md
└── README_cn.md
```


## 运行示例

1. 打开终端，进入颜色检测示例所在文件夹 esp-dl/examples/color_detect：

    ```shell
    cd ~/esp-dl/examples/color_detect
    ```

2. 设定目标芯片：

    ```shell
    idf.py set-target [SoC]
    ```
    将 [SoC] 替换为您的目标芯片，如 esp32、esp32s2、esp32s3。
    
    由于 ESP32-S3 芯片在 AI 应用上的运行速度远快于其他芯片，我们更推荐您使用 ESP32-S3 芯片。

3. 烧录程序，运行 IDF 监视器获取各功能的运行结果：

   ```shell
   idf.py flash monitor
   
   ... ...
   
   the information of registered colors: 
   name: red, 	thresh: 0, 10, 203, 255, 197, 255
   name: green, 	thresh: 54, 62, 221, 255, 197, 255
   name: blue, 	thresh: 96, 114, 179, 255, 230, 255
   name: yellow, 	thresh: 19, 32, 214, 255, 247, 255
   
   RGB888 | color detection result:
   color 0: detected box :2
   center: (46, 14)
   box: (0, 0), (94, 30)
   area: 768
   center: (14, 110)
   box: (0, 96), (30, 126)
   area: 256
   
   color 1: detected box :2
   center: (110, 30)
   box: (96, 0), (126, 62)
   area: 512
   center: (30, 46)
   box: (0, 32), (62, 62)
   area: 512
   
   color 2: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 3: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   
   RGB565 | color detection result:
   color 0: detected box :2
   center: (46, 14)
   box: (0, 0), (94, 30)
   area: 768
   center: (14, 110)
   box: (0, 96), (30, 126)
   area: 256
   
   color 1: detected box :2
   center: (110, 30)
   box: (96, 0), (126, 62)
   area: 512
   center: (30, 46)
   box: (0, 32), (62, 62)
   area: 512
   
   color 2: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 3: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   remained colors num: 3
   
   Blue, Yellow | color detection result:
   color 0: detected box :2
   center: (88, 68)
   box: (64, 32), (126, 94)
   area: 768
   center: (14, 78)
   box: (0, 64), (30, 94)
   area: 256
   
   color 1: detected box :1
   center: (70, 102)
   box: (32, 64), (126, 126)
   area: 1024
   
   
   Blue, Yellow | color segmentation result:
   color 0: detected box :2
   box_index: 0, start col: 32, end col: 47, row: 16, area: 768
   box_index: 0, start col: 32, end col: 47, row: 17, area: 768
   box_index: 0, start col: 32, end col: 47, row: 18, area: 768
   box_index: 0, start col: 32, end col: 47, row: 19, area: 768
   box_index: 0, start col: 32, end col: 47, row: 20, area: 768
   box_index: 0, start col: 32, end col: 47, row: 21, area: 768
   box_index: 0, start col: 32, end col: 47, row: 22, area: 768
   box_index: 0, start col: 32, end col: 47, row: 23, area: 768
   box_index: 0, start col: 32, end col: 47, row: 24, area: 768
   box_index: 0, start col: 32, end col: 47, row: 25, area: 768
   box_index: 0, start col: 32, end col: 47, row: 26, area: 768
   box_index: 0, start col: 32, end col: 47, row: 27, area: 768
   box_index: 0, start col: 32, end col: 47, row: 28, area: 768
   box_index: 0, start col: 32, end col: 47, row: 29, area: 768
   box_index: 0, start col: 32, end col: 47, row: 30, area: 768
   box_index: 0, start col: 32, end col: 47, row: 31, area: 768
   box_index: 1, start col: 0, end col: 15, row: 32, area: 256
   box_index: 0, start col: 32, end col: 63, row: 32, area: 768
   box_index: 1, start col: 0, end col: 15, row: 33, area: 256
   box_index: 0, start col: 32, end col: 63, row: 33, area: 768
   box_index: 1, start col: 0, end col: 15, row: 34, area: 256
   box_index: 0, start col: 32, end col: 63, row: 34, area: 768
   box_index: 1, start col: 0, end col: 15, row: 35, area: 256
   box_index: 0, start col: 32, end col: 63, row: 35, area: 768
   box_index: 1, start col: 0, end col: 15, row: 36, area: 256
   box_index: 0, start col: 32, end col: 63, row: 36, area: 768
   box_index: 1, start col: 0, end col: 15, row: 37, area: 256
   box_index: 0, start col: 32, end col: 63, row: 37, area: 768
   box_index: 1, start col: 0, end col: 15, row: 38, area: 256
   box_index: 0, start col: 32, end col: 63, row: 38, area: 768
   box_index: 1, start col: 0, end col: 15, row: 39, area: 256
   box_index: 0, start col: 32, end col: 63, row: 39, area: 768
   box_index: 1, start col: 0, end col: 15, row: 40, area: 256
   box_index: 0, start col: 32, end col: 63, row: 40, area: 768
   box_index: 1, start col: 0, end col: 15, row: 41, area: 256
   box_index: 0, start col: 32, end col: 63, row: 41, area: 768
   box_index: 1, start col: 0, end col: 15, row: 42, area: 256
   box_index: 0, start col: 32, end col: 63, row: 42, area: 768
   box_index: 1, start col: 0, end col: 15, row: 43, area: 256
   box_index: 0, start col: 32, end col: 63, row: 43, area: 768
   box_index: 1, start col: 0, end col: 15, row: 44, area: 256
   box_index: 0, start col: 32, end col: 63, row: 44, area: 768
   box_index: 1, start col: 0, end col: 15, row: 45, area: 256
   box_index: 0, start col: 32, end col: 63, row: 45, area: 768
   box_index: 1, start col: 0, end col: 15, row: 46, area: 256
   box_index: 0, start col: 32, end col: 63, row: 46, area: 768
   box_index: 1, start col: 0, end col: 15, row: 47, area: 256
   box_index: 0, start col: 32, end col: 63, row: 47, area: 768
   
   color 1: detected box :1
   box_index: 0, start col: 16, end col: 31, row: 32, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 33, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 34, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 35, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 36, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 37, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 38, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 39, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 40, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 41, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 42, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 43, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 44, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 45, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 46, area: 1024
   box_index: 0, start col: 16, end col: 31, row: 47, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 48, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 49, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 50, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 51, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 52, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 53, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 54, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 55, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 56, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 57, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 58, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 59, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 60, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 61, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 62, area: 1024
   box_index: 0, start col: 16, end col: 63, row: 63, area: 1024
   
   ```

readme (zh) of face_recognition example

                                        
                                        # 人脸识别 [[English]](./README.md)

本项目为人脸识别接口的示例。人脸识别接口的输入图片为一张带有人脸的静态图片，输出是录入人脸、识别人脸、删除人脸等接口功能的运行结果，显示在终端中。

该接口提供了 16 位量化与 8 位量化两个版本的模型。16 位量化的模型相比于 8 位量化的模型，精度更高，但是占用内存更多，运行速度也更慢。您可以根据实际使用场景挑选合适的模型。

项目所在文件夹结构如下：

```shell
face_recognition/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
└── README.md
└── README_cn.md
```



## 运行示例

1. 打开终端，进入人脸检测示例所在文件夹 esp-dl/examples/face_recognition

    ```shell
    cd ~/esp-dl/examples/face_recognition
    ```

2. 设定目标芯片：

    ```shell
    idf.py set-target [SoC]
    ```
    将 [SoC] 替换为您的目标芯片，如 esp32、esp32s2、esp32s3。
    
    由于 ESP32-S3 芯片在 AI 应用上的运行速度远快于其他芯片，我们更推荐您使用 ESP32-S3 芯片。

3. 烧录程序，运行 IDF 监视器获取各功能的运行结果：

   ```shell
   idf.py flash monitor
   
   ... ...
   
   E (1907) MFN: Flash is empty
   
   enroll id ...
   name: Sandra, id: 1
   name: Jiong, id: 2
   
   recognize face ...
   [recognition result] id: 1, name: Sandra, similarity: 0.728666
   [recognition result] id: 2, name: Jiong, similarity: 0.827225
   
   recognizer information ...
   recognizer threshold: 0.55
   input shape: 112, 112, 3
   
   face id information ...
   number of enrolled ids: 2
   id: 1, name: Sandra
   id: 2, name: Jiong
   
   delete id ...
   number of remaining ids: 1
   [recognition result] id: -1, name: unknown, similarity: 0.124767
   
   enroll id ...
   name: Jiong, id: 2
   write 2 ids to flash.
   
   recognize face ...
   [recognition result] id: 1, name: Sandra, similarity: 0.758815
   [recognition result] id: 2, name: Jiong, similarity: 0.722041
   
   ```

## 其他设置

1. [./main/app_main.cpp](./main/app_main.cpp) 开头处的宏定义 `QUANT_TYPE`，可定义模型的量化类型。

    - `QUANT_TYPE` = 0：使用 8 位量化模型，识别精度低于 16 位模型，但速度更快，内存占用更少。
    - `QUANT_TYPE` = 1：使用 16 位量化模型，识别精度与浮点模型一致。

    您可根据实际使用场景挑选合适的模型。


2. [./main/app_main.cpp](./main/app_main.cpp) 开头处的宏定义 `USE_FACE_DETECTOR`，可定义人脸关键点 (landmark) 坐标的获得方式。

    - `USE_FACE_DETECTOR` = 0：使用存放在 ./image.hpp 中的关键点坐标。
    - `USE_FACE_DETECTOR` = 1：使用人脸检测模型获得关键点坐标。

   请注意关键点坐标顺序为：
   
   ```
    left_eye_x, left_eye_y, 
    mouth_left_x, mouth_left_y,
    nose_x, nose_y,
    right_eye_x, right_eye_y, 
    mouth_right_x, mouth_right_y
   ```

## 延时情况

|   SoC    |      8 位 |    16 位 |
| :------: | --------: | -------: |
|  ESP32   | 13,301 ms | 5,041 ms |
| ESP32-S3 |    287 ms |   554 ms |

readme (zh) of human_face_detect example

                                        
                                        # 人脸检测 [[English]](./README.md)

本项目为人脸检测接口的示例。人脸识别接口的输入图片为静态图片，检测结果的置信度分数和坐标值可显示在终端中，检测结果的图片可通过工具显示在 PC 屏上。

项目所在文件夹结构如下：

```shell
human_face_detect/
├── CMakeLists.txt
├── image.jpg
├── main
│   ├── app_main.cpp
│   ├── CMakeLists.txt
│   └── image.hpp
├── partitions.csv
├── README.md
├── README_cn.md
└── result.png
```



## 运行示例

1. 打开终端，进入人脸检测示例所在文件夹 esp-dl/examples/human_face_detect：

    ```shell
    cd ~/esp-dl/examples/human_face_detect
    ```

2. 设定目标芯片：

    ```shell
    idf.py set-target [SoC]
    ```
    将 [SoC] 替换为您的目标芯片，如 esp32、esp32s2、esp32s3。

3. 烧录固件，打印检测结果的分数值和坐标值：

   ```shell
   idf.py flash monitor
   
   ... ...
   
   [0] score: 0.987580, box: [137, 75, 246, 215]
       left eye: (157, 131), right eye: (199, 133)
        nose: (170, 163)
        mouth left: (158, 177), mouth right: (193, 180)
   ```

4. 存放在 [example/tool/](../tool/) 目录下的显示工具 `display_image.py`，可方便您更直观地查看检测结果的图片。根据[工具](../tool/README_cn.md)介绍使用显示工具，运行如下命令：

   ```shell
   python display_image.py -i ../human_face_detect/image.jpg -b "(137, 75, 246, 215)" -k "(157, 131, 199, 133, 170, 163, 158, 177, 193, 180)"
   ```
   PC 屏上会显示当前示例检测结果的图片，如下图所示：
   
    <p align="center">
    <img width="%" src="./result.png"> 
    </p>



## 其他设置

[./main/app_main.cpp](./main/app_main.cpp) 开头处的宏定义 `TWO_STAGE`，可定义目标检测的算法。如注释所述：

- `TWO_STAGE` = 1：检测器为 two-stage（两阶段），检测结果更加精确（支持人脸关键点），但速度较慢。
- `TWO_STAGE` = 0：检测器为 one-stage（单阶段），检测结果精确度稍差（不支持人脸关键点），但速度较快。

您可自行体验两者差异。



## 自定义输入图片

示例中 [./main/image.hpp](./main/image.hpp) 是预设的输入图片。您可根据[工具](../tool/README_cn.md)介绍，使用存放在 [example/tool/](../tool/) 目录下的转换工具 `convert_to_u8.py`，将自定义图片转换成 C/C++ 的形式，替换预设图片。

1. 将自定义图片存放至 ./examples/human_face_detect 目录下，使用 [examples/tool/convert_to_u8.py](../tool/convert_to_u8.py) 把图片转换为 hpp 格式：

   ```shell
   # 假设当前仍在目录 human_face_detect 下

   python ../tool/convert_to_u8.py -i ./image.jpg -o ./main/image.hpp
   ```

2. 参考[运行示例](#运行示例)中的步骤，烧录固件，打印检测结果的置信度分数和坐标值，显示检测结果的图片。



## 延时情况

|   芯片   | `TWO_STAGE` = 1 | `TWO_STAGE` = 0 |
| :------: | --------------: | --------------: |
|  ESP32   |      415,246 us |      154,687 us |
| ESP32-S2 |    1,052,363 us |      309,159 us |
| ESP32-S3 |       56,303 us |       16,614 us |

> 以上数据基于示例的默认配置。

Examples:

cat_face_detect

more details

To create a project from this example, run:

idf.py create-project-from-example "espressif/esp-dl^2.0.0:cat_face_detect"

or download archive

color_detect

more details

To create a project from this example, run:

idf.py create-project-from-example "espressif/esp-dl^2.0.0:color_detect"

or download archive

face_recognition

more details

To create a project from this example, run:

idf.py create-project-from-example "espressif/esp-dl^2.0.0:face_recognition"

or download archive

human_face_detect

more details

To create a project from this example, run:

idf.py create-project-from-example "espressif/esp-dl^2.0.0:human_face_detect"

or download archive

readme (zh)

readme (zh) of cat_face_detect example

readme (zh) of color_detect example

readme (zh) of face_recognition example

readme (zh) of human_face_detect example

Supports all targets

License: MIT

Dependencies

Examples:

cat_face_detect

color_detect

face_recognition

human_face_detect

Stats

Badge