pcm_benchmark

Example of the component esphome/esp-audio-libs v3.2.1
# PCM Convert / Mixer / Gain Benchmark

Benchmarks `pcm_convert::copy_frames`, `mixer::mix_frames`, and `gain::apply` across a matrix of formats: different input and output bit depths (1, 2, 3, 4 bytes per sample) and, for convert and mix, different channel counts (mono and stereo, including up-mix and down-mix). The gain pass adds in-place and out-of-place variants per bit depth. The input is synthetic PCM generated at runtime, so this is a pure CPU throughput test of the kernels with no embedded audio asset.

## Building and Flashing

### Prerequisites

- **PlatformIO** (recommended) OR ESP-IDF v5.0 or later
- Any ESP32 development board (the benchmark keeps its buffers in internal RAM, so PSRAM is not required)

### Option 1: PlatformIO (Recommended)

```bash
cd examples/pcm_benchmark

# Build, upload, and monitor (ESP32-S3)
pio run -e esp32s3 -t upload -t monitor

# Or target a plain ESP32
pio run -e esp32 -t upload -t monitor
```

The PlatformIO configuration pulls in the parent esp-audio-libs repository as a component, so no additional setup is required.

### Option 2: Native ESP-IDF

```bash
cd examples/pcm_benchmark
idf.py set-target esp32s3
idf.py build
idf.py flash monitor
```

## Configuration Options

The benchmark dimensions are compile-time defines (override them with `build_flags` in `platformio.ini` or `-D` in ESP-IDF):

| Define | Default | Meaning |
| ------ | ------- | ------- |
| `PCM_BENCH_FRAMES` | 1024 | Frames processed per library call |
| `PCM_BENCH_BATCH` | 32 | Calls timed together as one measurement sample |
| `PCM_BENCH_SAMPLES` | 50 | Measurement samples collected per scenario |

To benchmark different formats, edit the `CONVERT_SCENARIOS`, `MIX_SCENARIOS`, and `GAIN_SCENARIOS` tables near the top of `src/pcm_benchmark.cpp`. Convert and mix entries are `{bps, channels}` sets; gain entries are `{bps, in_place}`. Bit depth is in bytes (1, 2, 3, 4).

## Expected Output

Each iteration prints one line per scenario, for both alignment cases:

```text
I (310) PCM_BENCH: --- pcm_convert::copy_frames ---
I (330) PCM_BENCH: CONV 2b2 -> 2b2 (copy)      aligned     10.86 ns/frame     736.7 MB/s      1918x RT  (min 10.83 max 11.05 sd 0.04)
I (350) PCM_BENCH: CONV 2b2 -> 2b2 (copy)      misaligned  10.95 ns/frame     730.5 MB/s      1902x RT  (min 10.93 max 11.14 sd 0.04)
I (400) PCM_BENCH: CONV 2b2 -> 4b2 (widen)     aligned     28.72 ns/frame     417.9 MB/s       725x RT  (min 28.69 max 28.90 sd 0.06)
I (580) PCM_BENCH: CONV 2b2 -> 4b2 (widen)     misaligned 108.82 ns/frame     110.3 MB/s       191x RT  (min 108.73 max 108.95 sd 0.09)
...
I (2310) PCM_BENCH: --- mixer::mix_frames ---
I (2480) PCM_BENCH: MIX  2b2 + 2b2 -> 2b2       aligned    100.72 ns/frame     119.1 MB/s       207x RT  (min 100.65 max 100.86 sd 0.09)
...
I (6760) PCM_BENCH: --- gain::apply ---
I (6810) PCM_BENCH: GAIN 2b out-of-place        aligned     27.32 ns/sample    146.4 MB/s       763x RT  (min 27.28 max 27.50 sd 0.06)
I (6930) PCM_BENCH: GAIN 2b out-of-place        misaligned  75.22 ns/sample     53.2 MB/s       277x RT  (min 75.16 max 75.38 sd 0.08)
```

Columns: kind (`CONV`/`MIX`/`GAIN`), format (`<bps>b<channels>`; e.g., `2b2` is 16-bit stereo, with input(s) → output), alignment, ns/frame (ns/sample for gain, which has no channel concept; lower is better), MB/s of read+write traffic, xRT@48k (how many 48 kHz streams one core could sustain on this op alone), and the per-sample min/max/sd spread.

## How It Works

Three 16-byte-aligned buffers (two inputs, one output) are filled with a deterministic pseudo-random pattern; the sample values do not affect timing. Each scenario calls the kernel `PCM_BENCH_BATCH` times back-to-back, timed as a unit with `esp_timer_get_time()` to stay well above the 1 µs timer resolution, and `PCM_BENCH_SAMPLES` such batches give the min/max/avg/sd spread. The misaligned variant offsets every pointer by one byte to force the byte-wise path.

## Interpreting the Results

- The `aligned` rows reflect the wide-load fast path that the library takes when each pointer is aligned to its sample width (2 bytes for 16-bit, 4 bytes for 32-bit). The `misaligned` rows show the byte-wise fallback. The gap is the value of keeping your audio buffers aligned.
- Bit-depth conversion goes through a Q31 intermediate, so widening (e.g. 16→32) is exact while narrowing rounds the low bits. The identity case (matching bit depth and channel count) degenerates to a `memcpy` and is the fastest convert scenario.
- Mixing sums two streams in Q29, so `mix_frames` does roughly twice the input traffic of a convert and lands slower per frame.
- Gain multiplies each sample by a Q31 scale factor (1, 2, and 3 byte widths round; the 4 byte width truncates). It touches a single sample per unit, so its ns/sample numbers are not directly comparable to the per-frame convert/mix numbers; e.g., a stereo frame is two samples. In-place and out-of-place perform the same arithmetic; any difference is purely memory traffic (one buffer touched versus two).

To create a project from this example, run:

idf.py create-project-from-example "esphome/esp-audio-libs=3.2.1:pcm_benchmark"

or download archive (~8.72 KB)