rderr/esp-idf-zstd

# esp-idf-zstd [![build](https://github.com/rderr/esp-idf-zstd/actions/workflows/build.yml/badge.svg)](https://github.com/rderr/esp-idf-zstd/actions/workflows/build.yml) [![License: BSD-3-Clause](https://img.shields.io/badge/License-BSD--3--Clause-blue.svg)](LICENSE) [Zstandard](https://github.com/facebook/zstd) (zstd) compression library packaged as an ESP-IDF component. Wraps upstream zstd v1.5.7 as a git submodule. ## Memory usage zstd was designed for desktop and server workloads. On ESP32 it works but uses substantially more RAM than zlib or heatshrink. Measured on an ESP32-S3 compressing a 73-byte JSON payload with a 10.7 KB trained dictionary: - Default level 3 + CDict: ~220 KB working heap, ~2.5–3.0x ratio - Level 1 + CDict: ~180 KB working heap, ~2.1x ratio - Aggressively tuned (level -3, windowLog=14, hashLog=12, pledged srcSize): ~65 KB working heap, ~1.8x ratio - Below ~65 KB the match finder runs out of slots to use the dictionary and the compressor effectively stops working For comparison, zlib at minimum tuning uses ~10–15 KB of working heap and achieves comparable ratios on small dictionary-aware payloads; heatshrink runs in 1–4 KB. The `ZSTD_DCtx` struct alone is 50+ KB at default settings, so even decompression-only deployments are memory-heavy. Plan for a 150–250 KB heap window at default settings, or ~65–100 KB if you tune aggressively. ### When to pick which codec | Situation | Recommended | |---|---| | Payloads >1 KB, ≥150 KB free heap | zstd — best ratio + speed | | Speed-critical, frequent compression, RAM available | zstd at level 1 — ~3–5× faster than zlib | | Small payloads (<200 B), tuning effort acceptable | zstd with aggressive tuning (usable but not clearly better than zlib) | | Small payloads, <150 KB free heap | zlib + primed dictionary (miniz is already in ESP-IDF) | | Very tight RAM (<32 KB free) | heatshrink | ## Features - Full zstd v1.5.7 compression and decompression - Kconfig toggles to disable either direction (saves code size) - Minify mode for smallest possible code-size footprint - Trained-dictionary support for small structured payloads (within the RAM budget above) - Host-side Python dictionary trainer included ## Installation ### ESP-IDF Component Manager (recommended) Add to your project's `idf_component.yml`: ```yaml dependencies: esp-idf-zstd: version: "*" ``` Then run: ```bash idf.py reconfigure ``` ### Manual Clone into your project's `components/` directory: ```bash cd your_project/components git clone --recursive https://github.com/rderr/esp-idf-zstd.git ``` ## Quick Start **Do not call zstd from the main task.** `ZSTD_compress` and `ZSTD_decompress` use 6–12 KB of stack — the default ESP-IDF main task stack (3,584 bytes) will overflow silently and crash later at a context switch. Always run zstd on a dedicated worker task with a 16 KB stack (trim after measuring with `uxTaskGetStackHighWaterMark`). ```c #include "zstd.h" #include "freertos/FreeRTOS.h" #include "freertos/task.h" static void zstd_task(void *arg) { size_t bound = ZSTD_compressBound(src_size); void *dst = malloc(bound); size_t compressed_size = ZSTD_compress(dst, bound, src, src_size, 1); // ... decompress similarly ... free(dst); vTaskDelete(NULL); } void app_main(void) { xTaskCreate(zstd_task, "zstd", 16384, NULL, 5, NULL); } ``` See [`examples/basic`](examples/basic/) for a complete working example with stack measurement, and [`examples/dictionary`](examples/dictionary/) for the dictionary-based pattern. ## Configuration Run `idf.py menuconfig` and navigate to **Component config > Zstandard (zstd)**: | Option | Default | Description | |---|---|---| | `ZSTD_COMPRESSION` | y | Include compression support | | `ZSTD_DECOMPRESSION` | y | Include decompression support | | `ZSTD_MINIFY` | n | Enable all size optimizations (smaller code, slower) | | `ZSTD_STRIP_ERROR_STRINGS` | n | Remove error message strings to save flash | | `ZSTD_NO_INLINE` | n | Disable inlining to reduce code size | ### Memory Considerations See the [Memory budget](#memory-budget--read-this-first) callout at the top for the high-level numbers. This section covers the runtime tuning knobs in detail. #### Tuning zstd memory usage The component exposes the full zstd advanced API. Key knobs: ```c ZSTD_CCtx *cctx = ZSTD_createCCtx(); ZSTD_CCtx_setParameter(cctx, ZSTD_c_compressionLevel, 1); // 1, or negative ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, 14); // 16 KB window ZSTD_CCtx_setParameter(cctx, ZSTD_c_hashLog, 12); // 16 KB hash table ZSTD_CCtx_setPledgedSrcSize(cctx, expected_payload_size); // single-shot only ``` | Parameter | Recommendation | Effect | |---|---|---| | `ZSTD_c_compressionLevel` | 1 to 3, or negative for tighter RAM | Lower = less working memory | | `ZSTD_c_windowLog` | `ceil(log2(dictSize))` minimum, default ~17 | Sets the history window; must accommodate dict | | `ZSTD_c_hashLog` | 12+ when using a dict, 6 minimum | Match-finder table size — biggest single RAM knob | | `setPledgedSrcSize` | Set to actual payload size | Lets zstd size internal buffers to expected input | #### Measured tuning sweep Compressing a 73-byte JSON payload with a 10.7 KB trained dictionary on an ESP32-S3: | Configuration | Working heap | Compression ratio | |---|---|---| | Default level 3 + CDict | ~220 KB | ~2.5-3.0x | | Level 1 + CDict + windowLog=14 | ~180 KB | ~2.1x | | Level 1 + CCtx-loaded dict + windowLog=14 | ~180 KB | ~2.1x | | Level -3 + windowLog=14 + hashLog=12 + pledgedSrcSize | **~65 KB** | **~1.8x** | | Level -3 + hashLog=6 (too aggressive) | ~48 KB | ~1.1x — defeats the dict | The level -3 / hashLog=12 row is the embedded "knee" — below that, the match finder is too small to use the dictionary effectively. **Above that, the ratio gains are modest while RAM grows quickly.** Pick the row that fits your application's RAM budget and ratio needs. ### Stack Requirements **`ZSTD_compress` and `ZSTD_decompress` use several KB of stack** — typically 8–12 KB at compression level 1, more at higher levels. The default ESP-IDF main task stack (3584 bytes) is too small and will overflow silently. The crash is reported later, often as a "stack overflow in task main" at the next context switch. The right fix is to call zstd from a dedicated worker task with a measured stack budget — not to bloat the main task. The included examples follow this pattern: ```c #define ZSTD_TASK_STACK_BYTES 16384 static void zstd_task(void *arg) { // ... ZSTD_compress / ZSTD_decompress here ... UBaseType_t hw = uxTaskGetStackHighWaterMark(NULL); ESP_LOGI(TAG, "Stack free: %u bytes", (unsigned)(hw * sizeof(StackType_t))); vTaskDelete(NULL); } void app_main(void) { xTaskCreate(zstd_task, "zstd", ZSTD_TASK_STACK_BYTES, NULL, 5, NULL); } ``` After running once, check the logged high-water mark and trim `ZSTD_TASK_STACK_BYTES` to the peak you actually observed plus a safety margin (1–2 KB). To reduce stack usage further: - Use a **negative compression level** (`-1` to `-7`) — uses less stack, less heap, less CPU, at the cost of compression ratio - Enable `CONFIG_ZSTD_NO_INLINE=y` — less aggressive inlining keeps frame sizes smaller - Enable `CONFIG_ZSTD_MINIFY=y` — also reduces code size at a moderate speed cost ## Dictionary Compression Standard zstd struggles with small payloads (< 1 KB) because there isn't enough data to build effective compression tables. A **trained dictionary** solves this by front-loading the compressor with patterns learned from representative sample data. This is especially useful for: - **Small JSON messages**: Sensor readings, API responses, device config — a 200-byte JSON payload that barely compresses normally can achieve 3-5x compression with a dictionary - **MQTT / telemetry**: IoT devices send structurally identical messages where only values change - **Log entries**: Repetitive formats with fixed prefixes and templates - **Any small, structured data** where messages share common patterns ### Building a Dictionary A Python tool is included in `tools/dictbuilder/`: ```bash # Install dependency pip install zstandard # Collect 100+ representative samples into a directory, then: python tools/dictbuilder/build_dictionary.py samples/ -o main/zstd_dictionary.h # Or as binary for EMBED_FILES: python tools/dictbuilder/build_dictionary.py samples/ -o main/dict.bin --format binary ``` The tool prints evaluation stats showing compression with and without the dictionary so you can verify the benefit. See [tools/dictbuilder/README.md](tools/dictbuilder/README.md) for full usage and tips. ### Using a Dictionary in Firmware **Option 1: C header (simple)** ```c #include "zstd_dictionary.h" // Generated by build_dictionary.py ZSTD_CDict *cdict = ZSTD_createCDict(zstd_dictionary, zstd_dictionary_size, 3); ZSTD_CCtx *cctx = ZSTD_createCCtx(); size_t result = ZSTD_compress_usingCDict(cctx, dst, dst_cap, src, src_size, cdict); ``` **Option 2: Binary embedded in flash** In `CMakeLists.txt`: ```cmake target_add_binary_data(${COMPONENT_LIB} "dict.bin" BINARY) ``` In code: ```c extern const uint8_t dict_start[] asm("_binary_dict_bin_start"); extern const uint8_t dict_end[] asm("_binary_dict_bin_end"); ZSTD_CDict *cdict = ZSTD_createCDict(dict_start, dict_end - dict_start, 3); ``` **Important**: The decompressor needs the same dictionary. If a server decompresses the data, deploy the same dictionary file to your backend. ## Examples - **[basic](examples/basic/)** — Simple compress/decompress round-trip - **[dictionary](examples/dictionary/)** — Dictionary-based compression of small JSON payloads Build an example: ```bash cd examples/basic idf.py set-target esp32 idf.py build flash monitor ``` ## License BSD-3-Clause (same as upstream zstd). See [LICENSE](LICENSE).

Readme

Links

Targets

Maintainer

Tags

Stats

Badge