jason-mao/av_processor

# AV Processor V2 Component (Audio/Video Processing) `av_processor_v2` is an application-facing audio/video processing component that organizes common media capabilities on ESP platforms into a more unified and easier-to-integrate set of modules. It is built on top of GMF, `esp_audio_simple_player`, `esp_capture`, `esp_video_codec`, and related middleware, and is intended for projects that need audio recording, playback, video capture, and debugging-oriented data export in the same system. ## Documentation This `README.md` is kept as a high-level overview only. If you need full API details, configuration structures, lifecycle information, mode differences, usage flows, or implementation notes, go directly to the following documents: - Detailed guide (includes full API index): [docs/COMPONENT_GUIDE.md](/home/xutao/workspace-20/av_processor_v2/docs/COMPONENT_GUIDE.md) - Audio API header: [include/audio_processor.h](/home/xutao/workspace-20/av_processor_v2/include/audio_processor.h) - Video API header: [include/video_processor.h](/home/xutao/workspace-20/av_processor_v2/include/video_processor.h) - Shared type definitions: [include/av_processor_type.h](/home/xutao/workspace-20/av_processor_v2/include/av_processor_type.h) ## Capability Overview ### Audio - Audio manager initialization and board-level input/output abstraction - Audio recording - Frontend processing on the recording path - Encoded recorder output, including PCM, G711, OPUS, AAC, MP3, and related formats - Unified audio playback through `audio_play_handle_t`, for URL or local file sources - Feeder playback for RTC, network stream, or custom transport receive paths - Prompt playback through the same `audio_play_*` API, for short, blocking, high-priority sounds - Optional mixer support with focus and fade control between playback and feeder streams ### Frontend and Algorithm Support - Built-in `ai_afe` path, using the complete AFE stack from `esp-sr` - Custom frontend path, intended for composing independent `esp-sr` modules - Optional AEC, NS, VAD, AGC, wakeup, and voice command detection capabilities - Current custom-path support for standalone WakeNet use cases ## Audio Instance Model The current audio API uses explicit handles: - `audio_recorder_open(..., &handle)` creates a recorder instance - `audio_feeder_open(..., &handle)` creates a feeder instance - `audio_play_open(..., &handle)` creates a play instance - `audio_play_config_t.type` distinguishes normal playback from prompt playback Follow-up operations such as `pause`, `resume`, `read`, `feed`, `play`, `stop`, and `close` all take the corresponding handle explicitly. That means one application can hold multiple instances at the same time, for example multiple recorders or both playback-type and prompt-type play instances. ### Video - Video capture - Multi-sink output - Stream-mode callback handling - Fetch-mode frame acquire/release flow - Video decoding - Render callbacks for decoded frames ### Debugging and Analysis - Media Dump support for debugging - Export of raw audio data from the recorder path to SD card or UDP - Capture points before or after AEC for echo, feedback, clipping, noise floor, and processing-effect analysis ## Typical Use Cases - Device-side audio recording for upload or local storage - Receiving network audio and playing it locally while decoding on the fly - Inserting high-priority prompt sounds while a main playback stream is active - Capturing camera frames for local preview, encoded output, or multi-path distribution - Outputting both compressed and raw video from the same source path - Reproducing field issues and exporting raw audio for offline debugging ## Example Applications This repository already includes several representative examples. In most cases, the fastest way to understand how the component is intended to be used is to start from one of them. ### Audio Examples - [examples/audio_afe](/home/xutao/workspace-20/av_processor_v2/examples/audio_afe) Demonstrates the built-in `ai_afe` path and is the best starting point for AEC, NS, VAD, and AGC integration. - [examples/audio_sr_vc_switch](/home/xutao/workspace-20/av_processor_v2/examples/audio_sr_vc_switch) Demonstrates switching between `AFE_TYPE_SR` and `AFE_TYPE_VC` recorder frontends, and is also useful for understanding multiple recorder instances and console-driven switching flow. - [examples/audio_wn](/home/xutao/workspace-20/av_processor_v2/examples/audio_wn) Demonstrates the custom frontend path and standalone WakeNet, and is also useful for understanding recorder event callbacks. - [examples/audio_echo](/home/xutao/workspace-20/av_processor_v2/examples/audio_echo) Demonstrates playback- and feeder-related flows, including receive-then-play style audio pipelines. - [examples/audio_play_test](/home/xutao/workspace-20/av_processor_v2/examples/audio_play_test) Demonstrates the unified `audio_play_*` API, normal playback and prompt playback, feeder input, mixer focus switching, and console interaction commands. - [examples/audio_mem_leak](/home/xutao/workspace-20/av_processor_v2/examples/audio_mem_leak) Used to repeatedly validate open/close cycles for recorder, play, feeder, and related modules, and is helpful for memory-footprint and leak investigation. ### Video Examples - [examples/video_preview](/home/xutao/workspace-20/av_processor_v2/examples/video_preview) Demonstrates basic video capture and preview flow and is the easiest entry point for capture usage. - [examples/video_test](/home/xutao/workspace-20/av_processor_v2/examples/video_test) Used for validating and testing the video pipeline, and is useful when checking sink configuration, format handling, and processing behavior. ## Media Dump Media Dump is a debugging-oriented helper capability rather than a normal application-facing API. At the moment it is primarily used to export audio data from the recorder path. Typical reasons to enable it include: - Investigating playback-to-microphone feedback or echo issues - Comparing audio before and after AEC processing - Checking clipping, saturation, elevated noise floor, or format-related problems Current support includes: - Capture before AEC - Capture after AEC - SD card file output - UDP output For detailed configuration steps and behavior notes, see: - Media Dump details: [docs/COMPONENT_GUIDE.md](/home/xutao/workspace-20/av_processor_v2/docs/COMPONENT_GUIDE.md) ## Reading Path - If you only want to know what this component does and which example to start from, this README is enough. - If you are integrating the component and need structure definitions, macros, call order, the full API index, or mode differences, go to [docs/COMPONENT_GUIDE.md](/home/xutao/workspace-20/av_processor_v2/docs/COMPONENT_GUIDE.md). - If you need exact function signatures, enums, or configuration fields, go straight to the public headers: [include/audio_processor.h](/home/xutao/workspace-20/av_processor_v2/include/audio_processor.h) [include/video_processor.h](/home/xutao/workspace-20/av_processor_v2/include/video_processor.h) [include/av_processor_type.h](/home/xutao/workspace-20/av_processor_v2/include/av_processor_type.h) ## Version Current version: `v0.5.8` ## License This component follows the MIT license. See the `LICENSE` file in the repository for details. ## Related Resources - Issue reporting: https://github.com/espressif/esp-gmf/issues

Readme

Links

Supports all targets

Tags

Stats

Badge