jason-mao/av_processor

0.5.1

Latest
uploaded 2 hours ago
Audio/Video Processor module for ESP-GMF

readme

# AV Processor Component (Audio/Video Processing)

This component provides a unified encapsulation for audio recording/playback/data feeding and video capture/decoding/rendering on ESP platforms, implemented based on middleware such as GMF (General Media Framework), esp-audio-render, esp-video-codec, and esp-capture.

## Feature Overview 🎛️

- **Audio 🎧**:
  - Recording (AEC/VAD/AFE optional)
  - Playback (supports URL and local files, decoding depends on configuration)
  - Data feeding (feeder, suitable for scenarios like RTC/streaming where receiving and playing happen simultaneously)
  - Optional mixer with volume fade and focus control between multiple streams
- **Video 🎥**:
  - Multi-channel video capture
  - Video decoding and rendering callbacks

---

## Quick Start

### 1. Audio Usage

1. Initialize audio manager:

```c
// Method 1: Use default configuration macro (recommended)
audio_manager_config_t cfg = DEFAULT_AUDIO_MANAGER_CONFIG();
cfg.play_dev = your_play_dev;
cfg.rec_dev = your_rec_dev;
strcpy(cfg.mic_layout, "RMNM"); // Microphone layout (optional, for AFE configuration)
cfg.board_sample_rate = 16000;
cfg.board_bits = 32;
cfg.board_channels = 2;
cfg.play_volume = 80;
cfg.rec_volume = 60;
cfg.rec_ref_volume = 60;
cfg.enable_mixer = true; // Enable when mixing feeder and playback is needed

audio_manager_init(&cfg);
```

2. Recording (polling read):

```c
// Recording event callback (optional)
void recorder_event_cb(void *event, void *ctx) {
    // Handle recording events (e.g., AFE events)
}

// Configure recorder (using default configuration macro)
audio_recorder_config_t recorder_cfg = DEFAULT_AUDIO_RECORDER_CONFIG();
// Optional: Configure AFE (runtime configuration, takes priority over Kconfig)
recorder_cfg.afe_config.vad_enable = true;
recorder_cfg.afe_config.vad_mode = 4;
recorder_cfg.afe_config.agc_enable = true;
recorder_cfg.recorder_event_cb = recorder_event_cb;

// Open recorder
audio_recorder_open(&recorder_cfg);
uint8_t buf[2048];
audio_recorder_read_data(buf, sizeof(buf));

audio_recorder_close();
```

3. Feeder playback (suitable for external stream → local playback):

```c
// Configure feeder (using default configuration macro)
audio_feeder_config_t feeder_cfg = DEFAULT_AUDIO_FEEDER_CONFIG();
// Optional: When using OPUS decoder, task_stack is recommended to be >= 4096 * 10
// feeder_cfg.feeder_task_config.task_stack = 4096 * 10;

audio_feeder_open(&feeder_cfg);
audio_feeder_run();
// Feed external data blocks multiple times as needed
audio_feeder_feed_data(pkt, pkt_len);

audio_feeder_stop();
audio_feeder_close();
```

4. Normal playback (URL or local):

```c
// Configure player (using default configuration macro)
audio_playback_config_t playback_cfg = DEFAULT_AUDIO_PLAYBACK_CONFIG();

audio_playback_open(&playback_cfg);
audio_playback_play("http://<ip>:<port>/audio.mp3");

audio_playback_pause();
audio_playback_resume();
audio_playback_stop();
audio_playback_close();
```

5. Prompt playback:

```c
// Configure player (using default configuration macro)
audio_prompt_config_t prompt_cfg = DEFAULT_AUDIO_PROMPT_CONFIG();

audio_prompt_open(&prompt_cfg);
audio_prompt_play("spiffs://audio.mp3");
audio_prompt_close();
```
> Prompt playback is similar to normal playback, but `prompt playback` is blocking playback, suitable for short audio playback. The purpose is to have good responsiveness when you need to play prompt sounds while using normal playback.

6. Focus/fade control (when mixer is enabled):

```c
// Note: Mixer must be opened after both playback and feeder are opened
audio_processor_mixer_open();
audio_processor_ramp_control(AUDIO_MIXER_FOCUS_FEEDER);     // Focus on feeder audio
audio_processor_ramp_control(AUDIO_MIXER_FOCUS_PLAYBACK);   // Focus on playback audio
audio_processor_mixer_close();
```

### 2. Video Usage

1. Render (decode) callback:

```c
void decoded_cb(void *ctx, const uint8_t *data, size_t size) {
    // Handle decoded frame data
}

video_render_config_t rcfg = {
    .decode_cfg = your_vdec_cfg,
    .resolution = {.width = 640, .height = 480},
    .decode_cb = decoded_cb,
};
video_render_handle_t r = video_render_open(&rcfg);
video_render_start(r);

video_frame_t f = {.data = enc_frame, .size = enc_size};
video_render_frame_feed(r, &f);

video_render_stop(r);
video_render_close(r);
```

2. Capture (multi-channel):

```c
void capture_cb(void *ctx, int index, video_frame_t *frame) {
    // Handle captured video frame
}

video_capture_config_t ccfg = {0};
ccfg.camera_config = &your_cam_cfg;
ccfg.sink_num = 2;
ccfg.sink_cfg[0] = your_sink0_cfg;
ccfg.sink_cfg[1] = your_sink1_cfg;
ccfg.capture_frame_cb = capture_cb;

video_capture_handle_t c = video_capture_open(&ccfg);
video_capture_start(c);
video_capture_stop(c);
video_capture_close(c);
```

---

## Media Dump 💾

- Enable condition ✅: Takes effect after enabling `MEDIA_DUMP_ENABLE` in `menuconfig`.
- Save methods 📤: Supports SD card and UDP outputs (choose one).
  - SD card 💾: Enable `CONFIG_MEDIA_DUMP_SINK_SDCARD`, output to file `CONFIG_MEDIA_DUMP_SDCARD_FILENAME`, duration controlled by `CONFIG_MEDIA_DUMP_DURATION_SEC`.
  - UDP 📡: Enable `CONFIG_MEDIA_DUMP_SINK_UDP`, send raw media data via `CONFIG_MEDIA_DUMP_UDP_IP` and `CONFIG_MEDIA_DUMP_UDP_PORT` (can use `script/udp_reciver.py` script).
- Save point 🎚️ (before/after AEC processing): Select in "Audio Save Mode (AEC point)" in `menuconfig`
  - `MEDIA_DUMP_AUDIO_BEFORE_AEC` (Save Before AEC): Save raw microphone audio before AEC (for analyzing noise/feedback)
  - `MEDIA_DUMP_AUDIO_AFTER_AEC` (Save After AEC): Save audio after AEC processing (for evaluating AEC effectiveness)
- Typical usage 🔍: Export raw audio/video data when reproducing issues, check saturation, feedback, and format correctness offline using tools like Audacity.

## Configuration ⚙️

### Runtime Configuration (Recommended)

AFE configuration can be set at runtime via the `afe_config` field in the `audio_recorder_config_t` structure, which takes priority over Kconfig configuration:

```c
audio_recorder_config_t recorder_cfg = DEFAULT_AUDIO_RECORDER_CONFIG();
// Configure AFE
recorder_cfg.afe_config.ai_mode_wakeup = false;        // AI mode: true=wakeup mode, false=direct mode
recorder_cfg.afe_config.vad_enable = true;              // Enable VAD
recorder_cfg.afe_config.vad_mode = 4;                   // VAD mode (1-4), larger values are more sensitive
recorder_cfg.afe_config.vad_min_speech_ms = 64;        // Minimum speech duration (ms)
recorder_cfg.afe_config.vad_min_noise_ms = 1000;       // Minimum noise duration (ms)
recorder_cfg.afe_config.agc_enable = true;              // Enable AGC
recorder_cfg.afe_config.enable_vcmd_detect = false;     // Enable VCMD
recorder_cfg.afe_config.vcmd_timeout_ms = 5000;         // VCMD timeout (ms)
recorder_cfg.afe_config.mn_language = "cn";             // Model language: "cn" or "en"
recorder_cfg.afe_config.wakeup_time_ms = 10000;         // Wakeup time (ms)
recorder_cfg.afe_config.wakeup_end_time_ms = 2000;      // Wakeup end time (ms)

// Pass configuration when opening recorder
audio_recorder_open(&recorder_cfg);
```

### Kconfig Configuration (Alternative)

If fields in `audio_recorder_config_t.afe_config` use default values or are not set, Kconfig configuration will be used. Configure via `idf.py menuconfig`:

#### 💾 Media Dump Configuration

Configure under **"Component config" -> "Audio/Video Processor Configuration" -> "Media Dump"** in `menuconfig`:

- **`MEDIA_DUMP_ENABLE`**: Enable media data dump functionality, disabled by default
  - When enabled, raw audio/video data can be saved for debugging and analysis

- **`MEDIA_DUMP_AUDIO_POINT`**: Audio save point selection (choice type)
  - **`MEDIA_DUMP_AUDIO_BEFORE_AEC`**: Save raw audio before AEC processing
    - Useful for analyzing noise and feedback issues
  - **`MEDIA_DUMP_AUDIO_AFTER_AEC`**: Save audio after AEC processing
    - Useful for evaluating AEC processing effectiveness
  - **`MEDIA_DUMP_AUDIO_NONE`**: Do not save audio (default)

- **`MEDIA_DUMP_DURATION_SEC`**: Save duration in seconds, default 20
  - Type: integer (int)
  - Range: 1-3600 seconds

- **`MEDIA_DUMP_SINK`**: Save method selection (choice type)
  - **`MEDIA_DUMP_SINK_SDCARD`**: Save to SD card file (default)
    - File path configuration: `CONFIG_MEDIA_DUMP_SDCARD_FILENAME` (default: `/sdcard/media_dump.bin`)
    - Ensure SD card is properly mounted
  - **`MEDIA_DUMP_SINK_UDP`**: Send via UDP
    - Target IP configuration: `CONFIG_MEDIA_DUMP_UDP_IP` (default: `192.168.1.100`)
    - Target port configuration: `CONFIG_MEDIA_DUMP_UDP_PORT` (default: 5000)
    - Can use `script/udp_reciver.py` script to receive data

### Task Configuration

The audio processing component uses multiple FreeRTOS tasks internally, which can be customized via task configuration fields in each module's configuration structure:

#### Task Configuration for Each Module

- **Recorder** (`audio_recorder_config_t`):
  - `afe_feed_task_config`: AFE feed task configuration (default stack size 3KB)
  - `afe_fetch_task_config`: AFE fetch task configuration (default stack size 3KB)
  - `recorder_task_config`: Recorder task configuration (default stack size 5KB, recommend >= 40KB when using OPUS encoder)

- **Player** (`audio_playback_config_t`):
  - `playback_task_config`: Playback task configuration (default stack size 4KB)

- **Feeder** (`audio_feeder_config_t`):
  - `feeder_task_config`: Feeder task configuration (default stack size 5KB, recommend >= 40KB when using OPUS decoder)

**Important Notes**:
- When using OPUS encoder or decoder, `task_stack` of `recorder_task_config` and `feeder_task_config` should be set to at least `4096 * 10` bytes (40KB).
- When `task_stack` in task configuration is set to 0, default values will be used.
- When `task_stack_in_ext` is set to `true`, task stack will be allocated in external memory, helping to save internal RAM.
- Task configuration should be set in each module's configuration structure (e.g., `audio_recorder_config_t`, `audio_playback_config_t`, `audio_feeder_config_t`), not in `audio_manager_config_t`.

---

## API Reference 📚

Header files:

- `include/audio_processor.h`
- `include/video_processor.h`
- `include/av_processor_type.h`

Common functions (selection):

- Audio management:
  - `audio_manager_init`/`audio_manager_deinit`: Initialize/deinitialize audio manager
- Recording:
  - `audio_recorder_open`: Open recorder (supports encoder configuration and event callback)
  - `audio_recorder_read_data`: Read recording data
  - `audio_recorder_get_afe_manager_handle`: Get AFE manager handle
  - `audio_recorder_close`: Close recorder
- Feeder:
  - `audio_feeder_open`: Open feeder (supports decoder configuration)
  - `audio_feeder_run`: Start feeder
  - `audio_feeder_feed_data`: Feed audio data
  - `audio_feeder_stop`: Stop feeder
  - `audio_feeder_close`: Close feeder
- Playback:
  - `audio_playback_open`/`audio_playback_close`: Open/close player
  - `audio_playback_play`: Play audio (supports URL or local file, e.g., `http://...` or `file:///sdcard/...`)
  - `audio_playback_stop`: Stop playback
  - `audio_playback_pause`/`audio_playback_resume`: Pause/resume playback
  - `audio_playback_get_state`: Get playback state
- Mixer:
  - `audio_processor_mixer_open`: Open mixer (must be called after both playback and feeder are opened)
  - `audio_processor_mixer_close`: Close mixer
  - `audio_processor_ramp_control`: Control audio focus and volume fade
- Video rendering:
  - `video_render_open`/`video_render_close`: Open/close video renderer
  - `video_render_start`/`video_render_stop`: Start/stop rendering
  - `video_render_frame_feed`: Feed video frame
- Video capture:
  - `video_capture_open`/`video_capture_close`: Open/close video capturer
  - `video_capture_start`/`video_capture_stop`: Start/stop capture

---

## FAQ ❓

### Audio Issues

- **"Self-questioning" phenomenon occurs (device playback sound is repeatedly captured by microphone)**:
  - Enable `MEDIA_DUMP_ENABLE` in `menuconfig`, reproduce the issue and export saved audio data for further analysis
  - Open the exported audio file using [Audacity](https://www.audacityteam.org/download/), observe waveform/spectrum to determine if saturation or feedback exists
  - If saturation clipping occurs, reduce microphone gain (`esp_codec_dev_set_in_gain`); or appropriately reduce speaker volume (`esp_codec_dev_set_out_vol`) to avoid excessive feedback

- **Stack overflow when using OPUS encoder/decoder**:
  - Ensure `task_stack` of `recorder_task_config` and `feeder_task_config` is set to at least `4096 * 10` bytes (40KB)
  - Consider allocating task stack in external memory (set `task_stack_in_ext = true`)

- **Mixer not working properly**:
  - Ensure that both player (`audio_playback_open`) and feeder (`audio_feeder_open`) are opened before calling `audio_processor_mixer_open()`
  - Ensure `enable_mixer = true` is set when calling `audio_manager_init`

### Development and Debugging

- **Media data dump (Media Dump)**:
  - UDP method: Can use `script/media_dump_server.py` script to receive data
  - SD card method: Ensure SD card is properly mounted and file path is accessible

---

## Version Information

Current version: v0.1.0

## License

This component follows the MIT license. See the `LICENSE` file included in the repository for details.

## Related Resources

- Project repository: https://github.com/espressif/esp-gmf/tree/main/components/av_processor
- Issue reporting: https://github.com/espressif/esp-gmf/issues
- Documentation: https://github.com/espressif/esp-gmf/blob/main/components/av_processor/README.md

Links

Supports all targets

License: Custom

To add this component to your project, run:

idf.py add-dependency "jason-mao/av_processor^0.5.1"

download archive

Stats

  • Archive size
    Archive size ~ 70.20 KB
  • Downloaded in total
    Downloaded in total 0 times
  • Downloaded this version
    This version: 0 times

Badge

jason-mao/av_processor version: 0.5.1
|