espressif/yolo26

| Supported Targets | ESP32-S3 | ESP32-P4 | |-------------------|----------|----------| # YOLO26 Models ## Model List [supported]: https://img.shields.io/badge/-supported-green "supported" | Chip | YOLO26n (Int8) | |----------|------------------------| | ESP32-S3 | ![alt text][supported] | | ESP32-P4 | ![alt text][supported] | ## Model Benchmarks | name | input(hwc) | Flash(MB) | PSRAM(MB) | preprocess(ms) | model(ms) | postprocess(ms) | mAP50-95 on COCO val2017 | | --- | --- | --- | --- | --- | --- | --- | --- | | yolo26n_512_s8_p4 | 512×512×3 | 16 | 32 | 12.0 | 2067.0 | 13.0 | 0.365 | | yolo26n_640_s8_p4 | 640×640×3 | 16 | 32 | 17.0 | 3474.0 | 21.0 | 0.387 | | yolo26n_512_s8_s3 | 512×512×3 | 16 | 16 | 34.0 | 7822.0 | 23.0 | 0.363 | | yolo26n_640_s8_s3 | 640×640×3 | 16 | 16 | 51.0 | 13107.0 | 36.0 | 0.384 | *Models generated by the [YOLOv26 Quantization Tutorial](../../examples/tutorial/how_to_quantize_model/quantize_yolo26/README.md).* *Performance depends on memory configuration (Flash vs PSRAM).* --- ## Module Features | Feature | Description | |---|---| | **NMS-Free Postprocessing** | One2One head top-K selection by confidence score only. No IoU suppression needed. | | **Hardware Letterbox Preprocessing** | Uses ESP-DL `ImagePreprocessor` with gray padding (value=114), pixel-exact match to the Python emulation. | | **SIMD LUT Quantization** | 256-entry INT8 LUT (hardware-accelerated) quantizes pixels directly into model input RAM zero-copy. | | **Templated INT8 / INT16 Decode** | `decode_grid<T>` dispatches on tensor dtype at runtime supports INT8, INT16, or mixed models transparently. | | **Integer Threshold Optimization** | Confidence threshold pre-converted to integer space to skip sigmoid on low-score cells no float math in the hot loop. | | **Auto Class Count Detection** | `num_classes` is read from the output tensor shape at runtime no hardcoding needed. | | **Generic Dataset Support** | `class_names` is a user-supplied `const char**` works for COCO (80), custom datasets (any count), or Roboflow exports. | --- ## Model Usage ### 1. Initialize ```cpp #include "yolo26.hpp" // Load model from Flash (symbol name generated by CMakeLists.txt) extern const uint8_t model_espdl[] asm("_binary_yolo26n_512_s8_p4_espdl_start"); dl::Model* model = new dl::Model((const char *)model_espdl, fbs::MODEL_LOCATION_IN_FLASH_RODATA, 0, // max_internal_size dl::MEMORY_MANAGER_GREEDY, // mm_type nullptr, // key false); // param_copy (keep false to save RAM) // Option 1: COCO classes (default) YOLO26 processor(model, YOLO_TARGET_K, YOLO_CONF_THRESH, coco_classes); // Option 2: Custom classes (must match your training labels) const char* my_classes[] = { "brick_2x4", "brick_1x2", /* ... */ }; YOLO26 processor(model, YOLO_TARGET_K, YOLO_CONF_THRESH, my_classes); ``` > `YOLO_TARGET_K = 32` and `YOLO_CONF_THRESH = 0.25f` are defined in `yolo26.hpp`. ### 2. Run ```cpp // A. Decode JPEG → RGB888 auto img = processor.decode_jpeg(jpg_data, jpg_len); // B. Preprocess: letterbox + SIMD LUT quantization → written directly into model input RAM processor.preprocess(img); heap_caps_free(img.data); // free JPEG decode buffer // C. Hardware Inference model->run(); // D. Postprocess (NMS-Free, top-K by score) auto results = processor.postprocess(model->get_outputs()); ``` ### 3. Result Structure ```cpp struct Detection { float x1, y1, x2, y2; // Bounding box (pixels, in model input space) float score; // Confidence (0.0 – 1.0) int class_id; // Index into class_names }; ``` #### Example Output Loop: ```cpp for (const auto& res : results) { ESP_LOGI("YOLO26", "[category: %s, score: %.2f, x1: %d, y1: %d, x2: %d, y2: %d]", coco_classes[res.class_id], res.score, (int)res.x1, (int)res.y1, (int)res.x2, (int)res.y2); } // Example: // I (4350) YOLO26: [category: person, score: 0.86, x1: 87, y1: 187, x2: 176, y2: 428] ``` --- ## Quantization Constraints * **INT8 / INT16 Output Support**: The `decode_grid` function is templated and dispatches on tensor dtype at runtime. The exported model uses INT16 for class/box head outputs and INT8 for the backbone all handled automatically. * **INT8 Input**: The model input must be INT8 (the `ImagePreprocessor` always produces INT8 via the hardware LUT). * *Internal* layers can use any precision supported by ESP-DL (INT8, INT16, Mixed).

Readme

Links

Supports all targets

Stats

Badge