dictionary

Example of the component rderr/esp-idf-zstd v1.5.7
# Dictionary Example

Compress a small JSON sensor payload using a pre-trained dictionary embedded in flash. Demonstrates the canonical zstd-with-dictionary pattern.

## What it demonstrates

- Embedding a binary dictionary in flash via `EMBED_FILES`
- The `_byReference` CDict/DDict constructors — zstd holds a pointer into flash instead of copying the dict into RAM
- The `ZSTD_compress_usingCDict` / `ZSTD_decompress_usingDDict` pattern with reusable contexts
- Reporting free heap before/after compression so the memory cost is visible

## Why a dictionary at all?

Standard zstd assumes it has enough input data to build up its internal frequency tables. For a 73-byte JSON payload that's not the case — generic zstd compression on small messages typically makes them *larger*, not smaller (frame overhead exceeds anything saved).

A trained dictionary front-loads the compressor with patterns learned from representative samples (typical key names, common value patterns, JSON structure). With a 10.7 KB dictionary trained on similar messages, expect 2–5x compression on small structured payloads.

See [`../../tools/dictbuilder/README.md`](../../tools/dictbuilder/README.md) for how to train one from your own data.

## Build and run

```bash
idf.py set-target esp32        # or esp32s3, esp32c3, etc.
idf.py build
idf.py -p COMx flash monitor
```

The example ships with a `dict.bin` trained on synthetic sample data. **For production, retrain on real messages from your device** — the dictionary needs to match your actual payload distribution to be effective.

## Expected output

```
I (xxx) zstd_dict: Dictionary in flash: 10742 bytes
I (xxx) zstd_dict: Free heap at start: 372000 bytes
I (xxx) zstd_dict: JSON compressed: 73 -> 28 bytes (~2.6x)
I (xxx) zstd_dict: Free heap after compress: ~150000 bytes
I (xxx) zstd_dict: Round-trip match: YES
I (xxx) zstd_dict: Decompressed: {"device":"esp32-sensor-01",...}
```

Numbers vary with target and compression level. At default level 3 expect roughly 200+ KB of working heap during compression — see [the main README's Memory Considerations](../../README.md#memory-considerations) for the trade-off curve and how to tune down if your RAM budget is tighter.

## Retraining the dictionary

```bash
# Collect ~100+ representative messages your device will actually send
# Save each to its own file under samples/

python ../../tools/dictbuilder/build_dictionary.py samples/ \
    -o main/dict.bin --format binary --dict-size 16384

idf.py build flash monitor
```

The trained `dict.bin` must be deployed to whatever decodes the data too — if a server decompresses your telemetry, ship the same dictionary file to the server.

To create a project from this example, run:

idf.py create-project-from-example "rderr/esp-idf-zstd=1.5.7:dictionary"

or download archive (~6.81 KB)