# Espressif Multimedia Capture - [](https://components.espressif.com/components/espressif/esp_capture) - [中文版](./README_CN.md) Espressif Multimedia Capture (**esp_capture**) is a lightweight multimedia capture component developed by Espressif, based on the [ESP-GMF](https://github.com/espressif/esp-gmf/blob/main/README.md) architecture. It features low memory footprint, high flexibility, and a modular design. The component integrates functions such as audio/video encoding, image rotation and scaling, echo cancellation, and text overlay. It is widely applicable to scenarios including audio/video recording, AI large model input, WebRTC, RTMP/RTSP streaming, local storage, and remote monitoring. ## 🔑 Key Features - 📦 **Low memory overhead** with modular pipeline structure - 🎚️ **Tight integration with ESP-GMF** for advanced audio/video processing - 🎥 **Support for multiple input devices**: V4L2, DVP cameras, audio codecs - 🔁 **Parallel streaming and storage** options - ⚙️ **Automatic source-sink negotiation** for simplified configuration - ✨ **Customizable processing pipelines** for professional use cases ## ⚙️ Architecture Overview A capture system connects sources (input devices) to sinks (output targets) through an intermediate processing path. ```mermaid graph LR Capture_Source --> Capture_Path --> Capture_Sink ``` | Component | Description | |-------------------|--------------------------------------------------------------------| | **Capture Source** | Interfaces for physical input devices (camera, mic, etc.) | | **Capture Path** | Processing pipeline (audio/video filters, encoders, overlays) | | **Capture Sink** | Output targets (e.g., streaming, storage, muxers) | ### 🧠 AV Synchronization and Muxing To enable synchronized audio-video muxing, a dedicated sync module aligns timestamps across streams. ```mermaid graph LR capture_audio_src --> capture_audio_path --> capture_audio_sink capture_audio_src --> capture_sync capture_video_src --> capture_sync capture_video_src --> capture_video_path --> capture_video_sink capture_audio_sink --> capture_muxer capture_video_sink --> capture_muxer capture_muxer --> capture_muxer_sink ``` ## 🔊 Audio Sources Audio sources are used to acquire audio data from audio input devices connected via various buses (like I2S, USB, etc.). **Interface**: `esp_capture_audio_src_if_t` Built-in sources: - `esp_capture_new_audio_dev_src`: Codec-based audio capture - `esp_capture_new_audio_aec_src`: Codec-based audio capture with Acoustic Echo Cancellation (AEC) ## 🎥 Video Sources Video sources are used to capture video data from video input devices connected via various buses (like SPI, MIPI, USB, etc.). **Interface**: `esp_capture_video_src_if_t` Built-in sources: - `esp_capture_new_video_v4l2_src`: V4L2 camera input (via `esp_video`) - `esp_capture_new_video_dvp_src`: DVP camera input ## 🕓 Stream Synchronization Stream synchronization is achieved by the `capture_sync` module. `capture_sync` aligns audio and video frame timestamps for synchronized playback or muxing. It is automatically configured through `esp_capture_open`. ## 🔧 Audio/Video Processing Paths **Interface**: `esp_capture_path_mngr_if_t` ### 🎚️ Audio Path Built-in: - `esp_capture_new_gmf_audio_mngr`: Creates audio processing path using `ESP-GMF` with elements like: - `aud_rate_cvt` – Sample rate conversion - `aud_ch_cvt` – Channel conversion (mono ↔ stereo) - `aud_bit_cvt` – Bit depth conversion` - `aud_enc` – Audio encoder **Pipeline Builders** (`esp_capture_pipeline_builder_if_t`): - `esp_capture_create_auto_audio_pipeline`: Auto-generated audio pipeline based on negotiation - `esp_capture_create_audio_pipeline`: Prebuilt audio template pipeline ### 🎛️ Video Path Built-in: - `esp_capture_new_gmf_video_mngr`: Creates video processing path using `ESP-GMF` with elements like: - `vid_ppa` – Resize, crop, color conversion - `vid_overlay` – Text/graphic overlays - `vid_fps_cvt` – Framerate conversion - `vid_enc` – Video encoder **Pipeline Builders**: - `esp_capture_create_auto_video_pipeline`: Auto-generated video pipeline based on negotiation - `esp_capture_create_video_pipeline`: Prebuilt video template pipeline ## 🎞️ Muxing Mux audio/video into containers for storage or streaming: - MP4: File-based only - TS: Supports streaming and file-based ### Data Flow Control for Muxers The module provides flexible data flow control options for muxers: - **Muxer-only mode**: All data is consumed by the muxer, preventing access to raw audio/video streams - **Streaming while storage**: Simultaneous storage and streaming when supported by the muxer - **Unified API**: Use `esp_capture_sink_acquire_frame` for both muxer output and direct stream access ## 🖋️ Overlays Overlays are used to mix text or images into original video frames. Typical use cases include: Adding real-time timestamps or statistical data onto video frames. **Interface**: `esp_capture_overlay_if_t` - Built-in: `esp_capture_new_text_overlay` - Automatically handled if overlay is present in the video path ## ⚡ Auto Capture Mode Simplified configuration by automatically connecting sources, paths, and sinks. Typical call sequence for auto capture is shown below (using audio capture as an example): ```mermaid sequenceDiagram participant App as Application participant AudioSrc as Audio Source participant Capture as ESP Capture participant Sink as Capture Sink App->>AudioSrc: esp_capture_new_audio_dev_src(...) AudioSrc-->>App: audio_src handle App->>Capture: esp_capture_open(&cfg, &capture) Note over App,Capture: cfg.audio_src = audio_src App->>Capture: esp_capture_sink_setup(capture, 0, &sink_cfg, &sink) App->>Sink: esp_capture_sink_enable(sink, ESP_CAPTURE_RUN_MODE_ALWAYS) App->>Capture: esp_capture_start(capture) loop Frame Processing App->>Sink: esp_capture_sink_acquire_frame(sink, &frame, false) App->>Sink: esp_capture_sink_release_frame(sink, &frame) end App->>Capture: esp_capture_stop(capture) ``` For detailed examples, see [audio_capture](examples/audio_capture/README.md) and [video_capture](examples/video_capture/README.md) ## 🧩 Customizing Auto Pipelines 1. Register Custom Elements ```c esp_capture_register_element(capture, ESP_CAPTURE_STREAM_TYPE_AUDIO, proc_element); ``` 2. Customize Pipeline Before Start ```c const char *elems[] = { "aud_ch_cvt", "aud_rate_cvt", "aud_enc" }; esp_capture_sink_build_pipeline(sink, ESP_CAPTURE_STREAM_TYPE_AUDIO, elems, 3); ``` ## 🤝 Auto-Negotiation ### Audio - Automatically inserts elements like `aud_rate_cvt`, `aud_ch_cvt` on demand - Negotiates format based on encoder requirements - Elements are configured based on negotiation results Built-in: - `esp_capture_audio_pipeline_auto_negotiate` – Auto negotiate from audio source to multiple audio sinks ### Video - Automatically inserts `vid_ppa`, `vid_fps_cvt` on demand - Prioritizes high-quality format - Negotiates source format based on encoder capabilities Built-in: - `esp_capture_video_pipeline_auto_negotiate` – Auto negotiate from video source to multiple video sinks ### Fixed Negotiation for Sources In some cases, auto-negotiation for source format and information may not meet requirements. Audio sources and video sources support `set_fixed_caps` to fix source format settings and avoid negotiation failure cases. ## ❌ When Auto-Negotiation Fails In complex pipelines, auto-negotiation may fail (e.g., redundant sample rate converter in one pipeline). Manual configuration is recommended. ## 📦 Binary Size Optimization Unused elements are excluded unless registered. ### Menuconfig Options Enable features only when needed: - `CONFIG_ESP_CAPTURE_ENABLE_AUDIO`: Enable audio support - `CONFIG_ESP_CAPTURE_ENABLE_VIDEO`: Enable video support ### Optional Registrations - `mp4_muxer_register()` / `ts_muxer_register()` – on-demand muxers - `esp_audio_enc_register_default()` / `esp_video_enc_register_default()` – customize encoder usage via menuconfig ## 🔧 Extending esp_capture You can extend `esp_capture` by: 1. Adding a custom capture source 2. Implementing a new muxer using `esp_muxer` 3. Creating new encoders via `esp_audio_codec` / `esp_video_codec`
idf.py add-dependency "espressif/esp_capture^0.7.0"