| Supported Targets | ESP32-S3 | ESP32-P4 |
|-------------------|----------|----------|
# YOLO26 Models
## Model List
[supported]: https://img.shields.io/badge/-supported-green "supported"
| Chip | YOLO26n (Int8) |
|----------|------------------------|
| ESP32-S3 | ![alt text][supported] |
| ESP32-P4 | ![alt text][supported] |
## Model Benchmarks
| name | input(hwc) | Flash(MB) | PSRAM(MB) | preprocess(ms) | model(ms) | postprocess(ms) | mAP50-95 on COCO val2017 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| yolo26n_512_s8_p4 | 512×512×3 | 16 | 32 | 12.0 | 2067.0 | 13.0 | 0.365 |
| yolo26n_640_s8_p4 | 640×640×3 | 16 | 32 | 17.0 | 3474.0 | 21.0 | 0.387 |
| yolo26n_512_s8_s3 | 512×512×3 | 16 | 16 | 34.0 | 7822.0 | 23.0 | 0.363 |
| yolo26n_640_s8_s3 | 640×640×3 | 16 | 16 | 51.0 | 13107.0 | 36.0 | 0.384 |
*Models generated by the [YOLOv26 Quantization Tutorial](../../examples/tutorial/how_to_quantize_model/quantize_yolo26/README.md).*
*Performance depends on memory configuration (Flash vs PSRAM).*
---
## Module Features
| Feature | Description |
|---|---|
| **NMS-Free Postprocessing** | One2One head top-K selection by confidence score only. No IoU suppression needed. |
| **Hardware Letterbox Preprocessing** | Uses ESP-DL `ImagePreprocessor` with gray padding (value=114), pixel-exact match to the Python emulation. |
| **SIMD LUT Quantization** | 256-entry INT8 LUT (hardware-accelerated) quantizes pixels directly into model input RAM zero-copy. |
| **Templated INT8 / INT16 Decode** | `decode_grid<T>` dispatches on tensor dtype at runtime supports INT8, INT16, or mixed models transparently. |
| **Integer Threshold Optimization** | Confidence threshold pre-converted to integer space to skip sigmoid on low-score cells no float math in the hot loop. |
| **Auto Class Count Detection** | `num_classes` is read from the output tensor shape at runtime no hardcoding needed. |
| **Generic Dataset Support** | `class_names` is a user-supplied `const char**` works for COCO (80), custom datasets (any count), or Roboflow exports. |
---
## Model Usage
### 1. Initialize
```cpp
#include "yolo26.hpp"
// Load model from Flash (symbol name generated by CMakeLists.txt)
extern const uint8_t model_espdl[] asm("_binary_yolo26n_512_s8_p4_espdl_start");
dl::Model* model = new dl::Model((const char *)model_espdl,
fbs::MODEL_LOCATION_IN_FLASH_RODATA,
0, // max_internal_size
dl::MEMORY_MANAGER_GREEDY, // mm_type
nullptr, // key
false); // param_copy (keep false to save RAM)
// Option 1: COCO classes (default)
YOLO26 processor(model, YOLO_TARGET_K, YOLO_CONF_THRESH, coco_classes);
// Option 2: Custom classes (must match your training labels)
const char* my_classes[] = { "brick_2x4", "brick_1x2", /* ... */ };
YOLO26 processor(model, YOLO_TARGET_K, YOLO_CONF_THRESH, my_classes);
```
> `YOLO_TARGET_K = 32` and `YOLO_CONF_THRESH = 0.25f` are defined in `yolo26.hpp`.
### 2. Run
```cpp
// A. Decode JPEG → RGB888
auto img = processor.decode_jpeg(jpg_data, jpg_len);
// B. Preprocess: letterbox + SIMD LUT quantization → written directly into model input RAM
processor.preprocess(img);
heap_caps_free(img.data); // free JPEG decode buffer
// C. Hardware Inference
model->run();
// D. Postprocess (NMS-Free, top-K by score)
auto results = processor.postprocess(model->get_outputs());
```
### 3. Result Structure
```cpp
struct Detection {
float x1, y1, x2, y2; // Bounding box (pixels, in model input space)
float score; // Confidence (0.0 – 1.0)
int class_id; // Index into class_names
};
```
#### Example Output Loop:
```cpp
for (const auto& res : results) {
ESP_LOGI("YOLO26", "[category: %s, score: %.2f, x1: %d, y1: %d, x2: %d, y2: %d]",
coco_classes[res.class_id], res.score,
(int)res.x1, (int)res.y1, (int)res.x2, (int)res.y2);
}
// Example:
// I (4350) YOLO26: [category: person, score: 0.86, x1: 87, y1: 187, x2: 176, y2: 428]
```
---
## Quantization Constraints
* **INT8 / INT16 Output Support**: The `decode_grid` function is templated and dispatches on tensor dtype at runtime. The exported model uses INT16 for class/box head outputs and INT8 for the backbone all handled automatically.
* **INT8 Input**: The model input must be INT8 (the `ImagePreprocessor` always produces INT8 via the hardware LUT).
* *Internal* layers can use any precision supported by ESP-DL (INT8, INT16, Mixed).
9a86552cf840e127f4c7ed77571622cd431593d6
idf.py add-dependency "espressif/yolo26^0.1.0"