uploaded 2 weeks ago
Espressif audio encoder and decoder

readme

# ESP_AUDIO_CODEC

Espressif Audio Codec (ESP_AUDIO_CODEC) is the official audio encoding and decoding processing module developed by Espressif Systems for SoCs. 

The ESP Audio Encoder provides a common encoder interface that allows you to register multiple encoders, such as AAC, AMR-NB, AMR-WB, ADPCM, G711A, G711U, PCM, OPUS, ALAC. User can create one or multiple encoder instances based on the encoder interfaces, these instances can run simultaneous encoding. Meanwhile user can also call specified encoder API directly to have less call depth. 

The ESP Audio Decoder provides a common decoder interface that allows you to register multiple decoders, such as AAC, MP3, AMR-NB, AMR-WB, ADPCM, G711A, G711U, VORBIS, OPUS, ALAC. You can create one or multiple decoder instances using the provided interfaces, enabling simultaneous decoding. Meanwhile user can also call specified decoder API directly to have less call depth. ESP Audio Decoder can only process audio frame data (which means input data is frame boundary).

To simplify the decoding process for parsing and locating audio frames, we utilize the ESP Audio Simple Decoder. This decoder employs a parser to easily and conveniently aggregate and structure audio frames, which are subsequently decoded using the ESP Audio Decoder. Users can input data of varying lengths. Supported audio containers include AAC, MP3, WAV, FLAC, AMRNB, AMRWB, M4A. 

The licenses of the third-party copyrights are recorded in [Copyrights and Licenses](http://docs.espressif.com/projects/esp-adf/en/latest/COPYRIGHT.html).

# Highlights

- **User-Friendly Interface**  
  The ESP Audio Codec module features a user-friendly interface designed for simple usability.
  
- **Lightweight with High Performance**  
  The module is optimized for high performance while maintaining a lightweight footprint and minimal memory usage.
  
- **Dual-Level Decoder API**  
  User can use ESP Audio Decoder when input data already at frame boundary, or use ESP Audio Simple Decoder to process any input length of data. Both decoder provide similar APIs make it easy to transition between them.
  
- **High Customization through Registration**  
  Through registration API, user can add customized decoder, encoder or simple decoder easily, meanwhile user can overwrite the default decoder or encoder without change for application code.
  
# Features

The ESP Audio Codec supports the following features:   

## Encoder   

* Following encoders are supported:
  - AAC
  - AMR-NB and AMR-WB
  - ADPCM
  - G711A and G711U
  - PCM
  - ALAC
  - OPUS
* Supports operate all encoder through common API see [esp_audio_enc.h](include/encoder/esp_audio_enc.h)
* Supports customized encoder through `esp_audio_enc_register` or overwrite default encoder
* Supports register all supported encoder through `esp_audio_enc_register_default` and manager it by menuconfig

Details for the supported encoders are as follow:  
**AAC**     
- AAC low complexity profile encode (AAC-LC)
- Encoding sample rates (Hz): 96000, 88200, 64000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000    
- Encoding channel num: mono, dual     
- Encoding bits per sample: 16 bits    
- Constant bitrate encoding from 12 Kbps to 160 Kbps    
- Choosing whether to write ADTS header or not   

**AMR**       
- Encoding narrow band (NB) and wide band (WB)   
- AMRNB encoding at the sampling rate of 8 kHz       
- AMRWB encoding at the sampling rate of 16 kHz     
- Encoding channel num: mono    
- Encoding bits per sample: 16 bits    
- AMRNB encoding bitrate (Kbps): 4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2, 12.2    
- AMRWB encoding bitrate (Kbps): 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85      
- Discontinuous transmission (DTX)     

**ADPCM**   
- Encoding sample rates (Hz): all    
- Encoding channel num: mono, dual    
- Encoding bits per sample: 16 bits    

**G711**    
- Encoding A-LAW and U-LAW      
- Encoding sample rates (Hz): all    
- Encoding channel num: all    
- Encoding bits per sample: 16 bits    

**OPUS**    
- Encoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Encoding channel num: mono, dual    
- Encoding bits per sample: 16 bits    
- Constant bitrate encoding from 20Kbps to 510Kbps      
- Encoding frame duration (ms): 2.5, 5, 10, 20, 40, 60       
- Application mode for VoIP and music       
- Encoding complexity adjustment, from 0 to 10      
- Inband forward error correction (FEC)     
- Discontinuous transmission (DTX)

**ALAC**    
- Encoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Encoding channel num: mono, dual    
- Encoding bits per sample: 16 bits  
  
## Decoder   

* Following decoders are supported:
  - AAC
  - MP3
  - AMR-NB and AMR-WB
  - ADPCM
  - G711A and G711U
  - OPUS
  - VORBIS
  - FLAC
  - ALAC
* Supports operate all decoder through common API see [esp_audio_dec.h](include/encoder/esp_audio_dec.h)
* Supports customized decoder through `esp_audio_dec_register` or overwrite default decoder
* Supports register all supported decoder through `esp_audio_dec_register_default` and manager it by menuconfig

Details for the supported decoders are as follow:  
**AAC**     
- Supports AAC-LC, AAC-Plus
- Use can control AAC-Plus decoding behavior, to decrease cpu and memory usage if no need AAC-Plus decoding
- Supports sample rates (Hz): 96000, 88200, 64000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000
- Supports channel num: mono, dual only
- Supports bits per sample: 16 bits
- Supports decoding both with and without ADTS header
 
**AMR**       
- AMRNB decoding at the sampling rate of 8 kHz       
- AMRWB decoding at the sampling rate of 16 kHz     
- Decoding channel num: mono    
- Supports bits per sample: 16 bits  

**ADPCM**   
- Decoding sample rates (Hz): all    
- Decoding channel num: mono, dual    
- Decoding bits per sample: 16 bits  
- Supports IMA-ADPCM only

**G711**    
- Decoding A-LAW and U-LAW      
- Decoding sample rates (Hz): all    
- Decoding channel num: all    
- Decoding bits per sample: 16 bits    

**OPUS**    
- Decoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Decoding channel num: mono, dual    
- Decoding bits per sample: 16 bits      
- Supports decoding self delimited packet also

**ALAC**    
- Decoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Decoding channel num: mono, dual    
- Decoding bits per sample: 16 bits 
  
**FLAC**  
- Supports sample rates (Hz): 96000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000 
- Decoding channel num: mono, dual    
- Decoding bits per sample: 16 bits 

**VORBIS**  
- Supports sample rates (Hz): 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000 
- Decoding channel num: mono, dual    
- Decoding bits per sample: 16 bits
- Supports decode VORBIS frame only, need remove OGG header
- User need provide common header information firstly
  
## Simple Decoder   

* Supports audio frame finding and decoding
* Supports common parser, user can add customized parser according parser rules
* Supports customized simple decoder to handle new file format
* Supports customized parser and decoder pair: Use default parser but with customized decoder
* Supports streaming decode only not support seek

Details for the supported audio containers are as follow:
| Audio Container| Notes                                                       |
|       --       |  --                                                         |
|       AAC      | Supports AAC-Plus controlled by configuration               |
|       MP3      | Supports layer 3 only                                       |
|       AMRNB    | Supports files with AMRNB file header only                  |
|       AMRWB    | Supports files with AMRWB file header only                  |
|       FLAC     | Supports files with FLAC file header only                   |
|       ADPCM    | Supports IMA-ADPCM only                                     |
|       WAV      | Supports G711A, G711U, PCM, ADPCM                           |
|       M4A      | Supports MP3, AAC, ALAC <br> Supports MDAT after MOOV only  |
|       TS       | Supports MP3, AAC                                           |

# Performance

The following results are tested and gathered on ESP32-S3R8 platform. The data can be obtained by running `Encode to Decode chain test` in the test code, sample log output is as follow:
```
I (12599) CODEC_TEST: Start to do chain test for AAC sample_rate 48000 channel 2
I (12879) DEC_TEST: Encoder for AAC cpu: 12.91%
I (12880) DEC_TEST: Encoder AAC compress ratio: 5.86% heap usage: 51400
I (12914) DEC_TEST: Decode for AAC cpu: 6.75% heap usage: 51168
```
To get performance under other sample rate, channel or complexity, please change the parameters in the test codes ( mainly located in function `chain_test_thread` and `get_encoder_config` ).

## Encoder
| Encoder Codec | Sample Rate (Hz) | Channel | Memory (KB) | CPU loading (%) |
| --            |  --              | --      | --          | --              |
| AAC           | 48000            | 2       | 51.4        | 12.9            |
| G711-A        | 8000             | 1       | 0.06        | 0.32            |
| G711-U        | 8000             | 1       | 0.06        | 0.33            |
| AMR-NB        | 8000             | 1       | 3.3         | 17.81           |
| AMR-WB        | 16000            | 1       | 5.6         | 37.69           |
| ADPCM         | 48000            | 2       | 0.01        | 2.69            |
| OPUS          | 48000            | 2       | 29.4        | 24.9            |
  
**Notes:**   
Encoder cpu usage is highly dependent on certain encoding settings (like bitrate or complexity)  
 1) For AAC encoder, tested under bitrate 90kbps
 2) For AMR-NB and AMR-WB encoders, test bitrate under ARM-NB: 12.2kbps, ARM-WB: 8.85kbps.
 3) For OPUS encoder test bitrate set as 90kbps, complexity set as 0.
 4) Memory usage only consider heap usage, stack usage is not included, to support all encoders the running on stack size should about 40k.

## Decoder
| Decoder Codec | Sample Rate (Hz) | Channel | Memory (KB) | CPU loading (%) |
| --            |  --              | --      | --          | --              |
| AAC           | 48000            | 2       | 51.2        | 6.75            |
| G711-A        | 8000             | 1       | 0.04        | 0.14            |
| G711-U        | 8000             | 1       | 0.04        | 0.13            |
| AMR-NB        | 8000             | 1       | 1.8         | 4.23            |
| AMR-WB        | 16000            | 1       | 5.4         | 9.5             |
| ADPCM         | 48000            | 2       | 0.11        | 2.43            |
| OPUS          | 48000            | 2       | 26.6        | 5.86            |
| MP3           | 44100            | 2       | 28          | 8.17            |
| FLAC          | 44100            | 2       | 89.4        | 8.0             |

**Notes:** 
 1) MP3 and FLAC decoders are tested with real audio data. All other codes are tested with encoded data from sin wav PCM. 
 2) For AAC decoder, tested file is encoded in AAC-LC profile, decoding AAC-Plus profile will have higher memory and CPU usage.
 3) Only the heap usage is considered here. To support all decoders, the task running the decoder should have stack size of about 20K.

#  ESP_AUDIO_CODEC Release and SoC Compatibility

The following table shows the support of ESP_AUDIO_CODEC for Espressif SoCs. The "&#10004;" means supported, and the "&#10006;" means not supported. 

|Chip         |         v2.0.0     |
|:-----------:|:------------------:|
|ESP32        |       &#10004;     |
|ESP32-S2     |       &#10004;     |
|ESP32-C3     |       &#10004;     |
|ESP32-C6     |       &#10004;     |
|ESP32-S3     |       &#10004;     |
|ESP32-P4     |       &#10004;     |

# Usage

## Encoder Usage
For sample usage, please refer to [audio_encoder_test.c](test_apps/audio_codec_test/main/audio_encoder_test.c)

## Decoder Usage
For sample usage, please refer to [audio_decoder_test.c](test_apps/audio_codec_test/main/audio_decoder_test.c)

## Simple Decoder Usage
For sample usage, please refer to [simple_decoder_test.c](test_apps/audio_codec_test/main/simple_decoder_test.c)

Links

Supports all targets

License: Custom

To add this component to your project, run:

idf.py add-dependency "espressif/esp_audio_codec^2.0.2"

or download archive

Stats

  • Archive size
    Archive size: 52.40 MB
  • Downloaded in total
    Downloaded in total 848 times
  • Downloaded this version
    This version: 59 times

Badge

espressif/esp_audio_codec version: 2.0.2
|