espressif/esp-nn

# ESP-NN The library contains optimised NN (Neural Network) functions for various Espressif chips. * Supported platforms: * TensorFlow Lite Micro (TFLite Micro). Repo can be found [here](https://github.com/espressif/tflite-micro-esp-examples) * Supported ESP chips include: * ESP32-S3 (Assembly versions optimised to benefit from vector instructions of ESP32-S3) * ESP32-P4 (Optimised using PIE/QACC SIMD instructions) * ESP32 (Generic optimisations) * ESP32-C3 (Generic optimisations) ## Performance ### Kernelwise performance for s8 versions: * Kernelwise performance on ESP32-P4 chip * Numbers are ticks taken for kernel to execute * Chip config: 360MHz, SPI-RAM: HEX 200MHz, L2-Cache: 128KB | Function | ANSI C | Optimized | Opt Ratio | Data info | Memory | | ----------------| --------|---------|---------|-------------|-----------| | elementwise_add | 187971 | 173104 | -- | size = 1615 | External | | elementwise_mul | 79898 | 71245 | -- | size = 1615 | External | | convolution | 4005512 | 572459 | 7.00 | input(10,10), filter(64x1x1x64), pad(0,0), stride(1,1) | External | | convolution | 249389 | 98319 | 2.54 | input(8,8), filter(16x1x1x16), pad(0,0), stride(1,1) | External | | convolution | 816975 | 533318 | 1.53 | input(10,10), filter(64x3x3x3), pad(0,0), stride(1,1) | External | | depthwise conv | 962834 | 482389 | 2.00 | input (16, 16), pad(0,0), stride(1,1) filter: 1x3x3x16 | External | | depthwise conv | 1365066 | 703989 | 1.94 | input (12, 12), pad(1,1), stride(1,1) filter: 8x5x5x4 | External | | max pool | 601843 | 592189 | -- | input(16,16), filter (1x3x3x16) | Internal | | avg pool | 392947 | 380527 | -- | input(16,16), filter (1x3x3x16) | Internal | | fully connected | 7692 | 7616 | -- | len: 271, ch = 3 | Internal | | prelu (relu6) | 22487 | 18963 | -- | size, 1615 | Internal | * Kernelwise performance on ESP32-S3 chip * Numbers are ticks taken for kernel to execute * Chip config: 240MHz, SPI: QPI 80MHz, Data cache: 64KB | Function | ANSI C | Optimized | Opt Ratio | Data info | Memory | | ----------------| ---------|-----------|-----------|-------------|-----------| | elementwise_add | 281337 | 74440 | 3.78 | size = 1615 | External | | elementwise_mul | 122703 | 35002 | 3.51 | size = 1615 | External | | convolution | 4712500 | 331008 | 14.24 | input(10,10), filter(64x1x1x64), pad(0,0), stride(1,1) | External | | convolution | 312754 | 39022 | 8.01 | input(8,8), filter(16x1x1x16), pad(0,0), stride(1,1) | External | | convolution | 2193289 | 394842 | 5.55 | input(8,8), filter(64x3x3x3), pad(0,0), stride(1,1) | External | | depthwise conv | 1159831 | 184176 | 6.30 | input(18,18), pad(0,0), stride(1,1), filter: 1x3x3x16 | External | | depthwise conv | 1671363 | 372435 | 4.49 | input(12,12), pad(1,1), stride(1,1), filter: 8x5x5x4 | External | | max pool | 376294 | 48069 | 7.83 | input(16,16), filter(1x3x3x16) | Internal | | avg pool | 427293 | 118052 | 3.62 | input(16,16), filter(1x3x3x16) | Internal | | fully connected | 8443 | 1078 | 7.83 | len: 271, ch = 3 | Internal | | softmax | 15209 | 11107 | 1.37 | h: 8, w: 32 | Internal | | prelu (relu6) | 1125 | 98 | 11.48 | size: 1615 | Internal | ### Model-level performance: * **Person Detection** (Visual Wake Words, INT8 quantized — from [esp-tflite-micro](https://github.com/espressif/esp-tflite-micro)) * Numbers are time (ms) for `invoke()` call, using internal memory | Chip | CPU Freq | without ESP-NN | with ESP-NN | | -------- | -------- | -------------- | ----------- | | ESP32-P4 | 360MHz | 1395ms | 73ms | | ESP32-S3 | 240MHz | 2300ms | 54ms | | ESP32 | 240MHz | 4084ms | 380ms | | ESP32-C3 | 160MHz | 3355ms | 426ms | * **MobileNetV3 Small** (INT8 quantized, 224x224x3, 1000 classes) | Chip | CPU Freq | without ESP-NN | with ESP-NN | | -------- | -------- | -------------- | ----------- | | ESP32-S3 | 240MHz | 26000ms | 1434ms | | ESP32-P4 | 360MHz | 11600ms | 1305ms | > **Note**: - The above is time taken for execution of the `invoke()` call - SPIRAM used for TensorArena. - Person detection on ESP32-S3 with internal RAM: 47ms - ESP32-P4 optimisation is work in progress - `Without ESP-NN` case is when `esp-nn` is completely disabled by removing below flag from [CMakeLists.txt](CMakeLists.txt): ```cmake # enable ESP-NN optimizations by Espressif target_compile_options(${COMPONENT_LIB} PRIVATE -DESP_NN) ``` ## Configuration * To configure, please use `idf.py menuconfig` and under `ESP-NN` select `NN_OPTIMIZATIONS` * There are two options presented: * Optimized versions * ANSI C * Default selection is for `Optimized versions`. For ESP32-S3 and ESP32-P4, assembly versions are automatically selected, whereas for other chips (viz., ESP32, ESP32-C3), generic optimisations are selected. * For debugging purposes, you may want to select `ANSI C` reference versions. ## Contributing If you encounter an issue with ESP-NN, or wish to submit a feature request, please use the Issues section on the Github. For general questions related to this library, please use the esp32.com forum. Please check [CONTRIBUTING.md](CONTRIBUTING.md) for further information if you'd like to contribute to ESP-NN. ## Copyrights and License All original source code in this repository is Copyright (C) 2020-2021 Espressif Systems. This source code is licensed under the Apache License 2.0 as described in the file LICENSE.

readme

Links

Supports all targets

License: Apache-2.0

Stats

Badge