espressif/coze_ws_app

0.5.0

uploaded 23 hours ago
Espressif Coze WebSocket APP base on ESP-GMF

readme (zh)

# 扣子 Websocket 双向流式对话

- [English](./README.md)

## 例程简介

本例程实现了扣子智能语音对话的 WebSocket OpenAPI,支持通过直接对话、语音唤醒以及按键打断等方式与智能体进行语音交互。

## 示例创建

### IDF 默认分支

本例程兼容 IDF release/v5.4 及以上版本的分支。

### 预备知识

首先需要在[Coze文档中](https://bytedance.larkoffice.com/docx/Da6qd87pQodvNrxdFYrcnzMxnsh)申请 `Access token` 和 `BOT ID`账号
更多的 Websocket 文档可以参考 [双向流式对话事件](https://www.coze.cn/open/docs/developer_guides/streaming_chat_event)

本示例基于 [ESP-GMF](https://github.com/espressif/esp-gmf) 框架,演示了音频初始化及 3A(自动增益、回声消除、噪声抑制)算法的应用。

## 前期准备

### 硬件准备

- 本例程默认的是 `esp32-s3-korvo-2` 开发板,硬件参考相关[文档](https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/dev-boards/user-guide-esp32-s3-korvo-2.html),其他的版本可以在 `menuconfig->Example Configuration → Audio Board` 中选择。

> 如果您使用的是自定义开发板,请选择 Custom audio board,并在 [esp_gmf_gpio_config.h](components/common/esp_gmf_gpio_config.h#162) 文件中修改对应的 IO 配置。

### 关于编码格式

- 当前上行数据采用 PCM 格式,下行数据为 Opus 编码,后续将支持更多音频编码格式。

### 关于工作模式

> 目前系统支持以下几种工作模式:

- **普通模式**:用户无需使用唤醒词,可以直接连续与设备进行语音交互。

- **唤醒对话模式**:用户通过唤醒词唤醒设备,唤醒后可进行语音交互。默认唤醒词为 `Hi 乐鑫`,可在 `menuconfig -> ESP Speech Recognition → use wakenet → Select wake words` 中选择唤醒词。

- **唤醒对话模式**:用户通过按键唤醒设备,唤醒后可进行语音交互。默认按键为 `REC`。

> 默认启用的工作模式为 **普通模式** 模式

在不同模式时,需要在 `Component config → ESP Audio Simple Player` 中调整不同的参数参数。

1.使用按键模式, 音频的输入输出会配置为 `16bit`, `单通道`,配置如下

```c
CONFIG_AUDIO_SIMPLE_PLAYER_CH_CVT_DEST=1
CONFIG_AUDIO_SIMPLE_PLAYER_BIT_CVT_DEST_16BIT=y
```

2.使用单 `ES8311` 作为音频的输入输出, 并使用回音消除等功能,音频的输入输出会配置为 `16bit`, `双通道`,配置如下

```c
CONFIG_AUDIO_SIMPLE_PLAYER_CH_CVT_DEST=2
CONFIG_AUDIO_SIMPLE_PLAYER_BIT_CVT_DEST_16BIT=y
```

3.使用 `ES8311` 作为音频的输出, `ES7210` 作为输入, 音频的输入输出会配置为 `32bit`, `双通道`,配置如下(默认模式)

```c
CONFIG_AUDIO_SIMPLE_PLAYER_CH_CVT_DEST=2
CONFIG_AUDIO_SIMPLE_PLAYER_BIT_CVT_DEST_32BIT=y
```

### 配置

1. 将获取到的 `Access token` 和 `BOT ID` 信息填入 `Menuconfig->Example Configuration` 中。`Access token` 默认是以 `pat_` 开头的。
2. 將 wifi 信息填入 `Menuconfig->>Example Configuration` 中。

### 编译和下载

编译与下载
在编译本例程之前,请确保已配置好 ESP-IDF 环境。如果已配置,可以跳过此步骤,直接进行后续配置。如果尚未配置,请在 ESP-IDF 根目录运行以下脚本来设置编译环境。有关完整的配置和使用步骤,请参考 [《ESP-IDF 编程指南》](https://docs.espressif.com/projects/esp-idf/zh_CN/latest/esp32s3/index.html)。

编译本例程前需要先确保已配置 ESP-IDF 的环境,如果已配置可跳到下一项配置,如果未配置需要先在 ESP-IDF 根目录运行下面脚本设置编译环境,有关配置和使用 ESP-IDF 完整步骤,请参阅 [《ESP-IDF 编程指南》](https://docs.espressif.com/projects/esp-idf/zh_CN/latest/esp32s3/index.html)

```c
./install.sh
. ./export.sh
```

- 选择编译芯片,以 esp32s3 为例:

```c
idf.py set-target esp32s3
```

- 编译例子程序

```c
idf.py build
```

- 烧录程序并运行 monitor 工具来查看串口输出 (替换 PORT 为端口名称):

```c
idf.py -p PORT flash monitor
```

## 如何使用例程

### 功能和用法

- 例程开始运行后, 当出现以下log就说明了与服务端建立了连接, 就可以对话了:

```c
E (1029) COZE_CHAT_WS: Failed to get SPIFFS partition information (ESP_ERR_INVALID_STATE)
I (1031) example_connect: Start example_connect.
I (1035) pp: pp rom version: e7ae62f
I (1038) net80211: net80211 rom version: e7ae62f
I (1044) wifi:wifi driver task: 3fced994, prio:23, stack:6656, core=0
I (1051) wifi:wifi firmware version: 7d2994f4b
I (1052) wifi:wifi certification version: v7.0
I (1057) wifi:config NVS flash: enabled
I (1060) wifi:config nano formatting: disabled
I (1064) wifi:Init data frame dynamic rx buffer num: 32
I (1069) wifi:Init static rx mgmt buffer num: 5
I (1074) wifi:Init management short buffer num: 32
I (1078) wifi:Init dynamic tx buffer num: 32
I (1082) wifi:Init static tx FG buffer num: 2
I (1086) wifi:Init static rx buffer size: 1600
I (1090) wifi:Init static rx buffer num: 16
I (1094) wifi:Init dynamic rx buffer num: 32
I (1099) wifi_init: rx ba win: 6
I (1101) wifi_init: accept mbox: 6
I (1104) wifi_init: tcpip mbox: 32
I (1107) wifi_init: udp mbox: 6
I (1110) wifi_init: tcp mbox: 6
I (1113) wifi_init: tcp tx win: 65535
I (1117) wifi_init: tcp rx win: 5760
I (1120) wifi_init: tcp mss: 1440
I (1123) wifi_init: WiFi IRAM OP enabled
I (1127) wifi_init: WiFi RX IRAM OP enabled
I (1131) phy_init: phy_version 701,f4f1da3a,Mar  3 2025,15:50:10
I (1178) wifi:mode : sta (74:4d:bd:9d:b6:30)
I (1179) wifi:enable tsf
I (1180) example_connect: Connecting to xtworks...
W (1180) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
I (1188) example_connect: Waiting for IP(s)
I (3665) wifi:new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1, snd_ch_cfg:0x0
I (3665) wifi:state: init -> auth (0xb0)
I (3668) wifi:state: auth -> assoc (0x0)
I (3689) wifi:state: assoc -> run (0x10)
I (3714) wifi:connected with xtworks, aid = 58, channel 11, BW20, bssid = ec:56:23:e9:7e:f0
I (3714) wifi:security: WPA2-PSK, phy: bgn, rssi: -38
I (3716) wifi:pm start, type: 1

I (3719) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (3727) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 10000
I (3735) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (4830) esp_netif_handlers: example_netif_sta ip: 192.168.3.7, mask: 255.255.255.0, gw: 192.168.3.1
I (4830) example_connect: Got IPv4 event: Interface "example_netif_sta" address: 192.168.3.7
I (5031) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fe80:0000:0000:0000:764d:bdff:fe9d:b630, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (5034) example_common: Connected to example_netif_sta
I (5039) example_common: - IPv4 address: 192.168.3.7,
I (5044) example_common: - IPv6 address: fe80:0000:0000:0000:764d:bdff:fe9d:b630, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (5054) ESP_COZE_CHAT: wss_url: ws://ws.coze.cn/v1/chat?bot_id=748006***********
I (5061) ESP_COZE_CHAT: token: Bearer pat_*********************
I (5071) ESP_COZE_CHAT: WEBSOCKET_EVENT_BEGIN
I (5076) websocket_client: Started
I (5677) ESP_COZE_CHAT: WEBSOCKET_EVENT_CONNECTED
I (5678) ESP_COZE_CHAT: WS connected
I (6491) wifi:<ba-add>idx:0 (ifx:0, ec:56:23:e9:7e:f0), tid:6, ssn:1804, winSize:64
I (6500) ESP_COZE_CHAT: Request conversation_id : 74970793********
I (6500) ESP_COZE_CHAT: WS start updata chat
I (6501) ESP_COZE_CHAT: Update chat: {
        "id":   "0a7c8af3-e729-56e7-47b1-ecb3a8adb149",
        "event_type":   "chat.update",
        "data": {
                "chat_config":  {
                        "auto_save_history":    true,
                        "conversation_id":      "74970793********",
                        "user_id":      "userid_123",
                        "meta_data":    {
                        },
                        "custom_variables":     {
                        },
                        "extra_params": {
                        },
                        "parameters":   {
                                "custom_var_1": "测试"
                        }
                },
                "input_audio":  {
                        "format":       "pcm",
                        "codec":        "pcm",
                        "sample_rate":  16000,
                        "channel":      1,
                        "bit_depth":    16
                },
                "turn_detection":       {
                        "type": "server_vad",
                        "prefix_padding_ms":    600,
                        "silence_duration_ms":  500
                },
                "output_audio": {
                        "codec":        "opus",
                        "opus_config":  {
                                "bitrate":      16000,
                                "frame_size_ms":        60,
                                "limit_config": {
                                        "period":       1,
                                        "max_frame_num":        18
                                }
                        },
                        "speech_rate":  20,
                        "voice_id":     "7426720361733144585"
                },
                "event_subscriptions":  ["conversation.audio.delta", "conversation.chat.completed", "input_audio_buffer.speech_started", "input_audio_buffer.speech_stopped", "chat.created", "error", "conversation.message.delta"]
        }
}
I (6607) ES8311: Work in Slave mode
E (6611) i2s_common: i2s_channel_disable(1216): the channel has not been enabled yet
I (6611) I2S_IF: channel mode 0 bits:32/32 channel:2 mask:3
I (6614) I2S_IF: STD Mode 1 bits:32/32 channel:2 sample_rate:16000 mask:3
I (6635) Adev_Codec: Open codec device OK
I (6638) ES7210: Work in Slave mode
I (6645) ES7210: Enable ES7210_INPUT_MIC1
I (6647) ES7210: Enable ES7210_INPUT_MIC2
I (6650) ES7210: Enable ES7210_INPUT_MIC3
I (6653) ES7210: Enable TDM mode
E (6659) i2s_common: i2s_channel_disable(1216): the channel has not been enabled yet
I (6659) I2S_IF: channel mode 0 bits:32/32 channel:2 mask:3
I (6661) I2S_IF: STD Mode 0 bits:32/32 channel:2 sample_rate:16000 mask:3
I (6668) ES7210: Bits 16
I (6676) ES7210: Enable ES7210_INPUT_MIC1
I (6680) ES7210: Enable ES7210_INPUT_MIC2
I (6683) ES7210: Enable ES7210_INPUT_MIC3
I (6686) ES7210: Enable TDM mode
I (6692) ES7210: Unmuted
I (6693) Adev_Codec: Open codec device OK
I (6693) ESP_GMF_BLOCK: The block buf:0x3c25d5dc, end:0x3c2625dc
I (6695) NEW_DATA_BUS: New block buf, num:1, item_cnt:20480, db:0x3c2625e0
I (6702) ESP_GMF_TASK: Waiting to run... [tsk:http-0x3fcd3bf0, wk:0x0, run:0]
I (6709) ESP_GMF_TASK: Waiting to run... [tsk:http-0x3fcd3bf0, wk:0x3c263e6c, run:0]
I (6717) ESP_GMF_BLOCK: The block buf:0x3c263fb8, end:0x3c268fb8
I (6722) NEW_DATA_BUS: New block buf, num:1, item_cnt:20480, db:0x3c268fbc
I (6729) ESP_GMF_TASK: Waiting to run... [tsk:http-0x3fcc9aa4, wk:0x0, run:0]
I (6735) ESP_GMF_TASK: Waiting to run... [tsk:http-0x3fcc9aa4, wk:0x3c26a848, run:0]
W (6745) AUD_SDEC_REG: Overwrote ES decoder 6
W (6748) AUD_SDEC_REG: Overwrote ES decoder 7
W (6753) AUD_SDEC_REG: Overwrote ES decoder 8
I (6758) ASP_POOL: Dest rate:16000
I (6761) ASP_POOL: Dest channels:2
I (6764) ASP_POOL: Dest bits:32
I (6767) ESP_GMF_TASK: Waiting to run... [tsk:TSK_0x3fcca54c-0x3fcca54c, wk:0x0, run:0]
I (6775) MODEL_LOADER: The storage free size is 22208 KB
I (6780) MODEL_LOADER: The partition size is 5168 KB
I (6785) MODEL_LOADER: Successfully load srmodels
I (6789) AFE_CONFIG: Set WakeNet Model: wn9_hilexin

/********** General AFE (Audio Front End) Parameter **********/
pcm_config.total_ch_num: 4
pcm_config.mic_num: 2: [ ch1, ch3 ]
pcm_config.ref_num: 1: [ ch0 ]
pcm_config.sample_rate: 16000
afe_type: SR
afe_mode: HIGH PERF
afe_perferred_core: 0
afe_perferred_priority: 5
afe_ringbuf_size: 50
memory_alloc_mode: 3
afe_linear_gain: 1.0
debug_init: false
fixed_first_channel: false

/********** AEC (Acoustic Echo Cancellation) **********/
aec_init: true
aec mode: SR_HIGH_PERF
aec_filter_length: 4

/********** SE (Speech Enhancement, Microphone Array Processing) **********/
se_init: true, model: BSS

/********** NS (Noise Suppression) **********/
ns_init: false
ns model name: WEBRTC

/********** VAD (Voice Activity Detection) **********/
vad_init: true
vad_mode: 3
vad_model_name: NULL
vad_min_speech_ms: 64
vad_min_noise_ms: 1000
vad_delay_ms: 128
vad_mute_playback: false
vad_enable_channel_trigger: false

/********** WakeNet (Wake Word Engine) **********/
wakenet_init: false
wakenet_model_name: wn9_hilexin
wakenet_model_name_2: NULL
wakenet_mode: 0

/********** AGC (Automatic Gain Control) **********/
agc_init: false
agc_mode: WAKENET
agc_compression_gain_db: 9
agc_target_level_dbfs: 9

/**************************************************/
I (6993) AFE: AFE Version: (2MIC_V250113)
I (6994) AFE: Input PCM Config: total 4 channels(2 microphone, 1 playback), sample rate:16000
I (6997) AFE: AFE Pipeline: [input] -> |AEC(SR_HIGH_PERF)| -> |SE(BSS)| -> |VAD(WebRTC)| -> [output]
I (7007) AFE_MANAGER: Feed task, ch 4, chunk 1024, buf size 8192
I (7012) GMF_AFE: Create AFE, gmf_afe-0x3c2ec720
I (7017) GMF_AFE: Create AFE, gmf_afe-0x3c2ec864
I (7021) GMF_AFE: New an object,gmf_afe-0x3c2ec864
I (7026) ESP_GMF_TASK: Waiting to run... [tsk:TSK_0x3fce1714-0x3fce1714, wk:0x0, run:0]
I (7033) ESP_GMF_TASK: Waiting to run... [tsk:TSK_0x3fce1714-0x3fce1714, wk:0x3c2ec9e4, run:0]
Build fst from commands.
Quantized MultiNet6:rnnt_ctc_1.0, name:mn6_cn, (Feb 18 2025 12:00:53)
Quantized MultiNet6 search method: 2, time out:5.8 s
I (8298) NEW_DATA_BUS: New ringbuffer:0x3c4cdb30, num:2, item_cnt:8192, db:0x3c7176f4
I (8301) NEW_DATA_BUS: New ringbuffer:0x3c4c95e0, num:1, item_cnt:20480, db:0x3c4cd9e8
I (8308) AFE_MANAGER: AFE manager suspend 1
I (8312) AFE_MANAGER: AFE manager suspend 0
I (8316) ESP_GMF_TASK: One times job is complete, del[wk:0x3c2ec9e4,ctx:0x3c2ec864, label:gmf_afe_open]
I (8325) ESP_GMF_PORT: ACQ IN, new self payload:0x3c70a0e4, port:0x3c2ec9a4, el:0x3c2ec864-gmf_afe
I (8350) ASP_POOL: Dest rate:16000
I (8350) ASP_POOL: Dest channels:2
I (8351) ASP_POOL: Dest bits:32
I (8351) ESP_GMF_TASK: Waiting to run... [tsk:TSK_0x3fce5534-0x3fce5534, wk:0x0, run:0]
I (8352) AUD_SIMP_PLAYER: Reconfig decoder by music info, rate:16000, channels:1, bits:16, bitrate:0
E (8365) ESP_GMF_AUDIO_HELPER: IS HERE, /coze.opus
I (8370) ESP_GMF_TASK: Waiting to run... [tsk:TSK_0x3fce5534-0x3fce5534, wk:0x3c728b54, run:0]
W (8378) AUD_SDEC: Not find default parser for 9
I (8378) main_task: Returned from app_main()
W (8384) ESP_OPUS_DEC: Frame duration is not set, out pcm buffer size is counted as the length of 60ms.
I (8397) ESP_GMF_TASK: One times job is complete, del[wk:0x3c728b54,ctx:0x3c71ffe8, label:aud_simp_dec_open]
I (8407) ESP_GMF_PORT: ACQ IN, new self payload:0x3c728b54, port:0x3c72f554, el:0x3c71ffe8-aud_simp_dec
I (8527) ESP_GMF_PORT: ACQ OUT, new self payload:0x3c720374, port:0x3c2ec964, el:0x3c2ec864-gmf_afe
```

Links

License: Custom

To add this component to your project, run:

idf.py add-dependency "espressif/coze_ws_app^0.5.0"

or download archive

Stats

  • Archive size
    Archive size ~ 45.12 KB
  • Downloaded in total
    Downloaded in total 0 times
  • Downloaded this version
    This version: 0 times

Badge

espressif/coze_ws_app version: 0.5.0
|