Real-time cloud API
Protocol description
The capability interface uses the WebSocket protocol;
All protocol fields use UTF-8 encoding;
Access requirements
APPID, APISecret, and APIKey created on the platform
Authorized virtual human avatar_id and voice vcn available for use
Integration endpoint
API handshake
Refer to the iFLYTEK Open Platform handshake authentication
https://global.xfyun.cn/doc/tts/online_tts/API.html#authentication-method
Request URL
wss://avatar-api-gp.xf-yun.com/v1/interact
Request structure description
| Parameter | Definition | Type | Description |
|---|---|---|---|
| header | Protocol Header | Object | Request Header |
| parameter | Capability Parameters | Object | AI Capability Parameters — for enabling or disabling specific AI engine capabilities. |
| payload | Service Alias | Object | Request Data Packet |
Response structure description
| Parameter | Definition | Type | Description |
|---|---|---|---|
| header | Protocol Header | Object | Response Header |
| payload | Service Alias | Object | Response Data Packet |
API Message
Start initialization protocol
Request
Request example
{
"header": {
"app_id": "xxxx",
"request_id": "xxxx",
"res_key": "",//User resources are stored in the storage gateway. External links are encrypted, and the loader decrypts them before downloading
"ctrl": "start"
},
"parameter": {
"avatar": {
"stream": {
"protocol": "xrtc", // Protocols supported: xrtc
"fps": 25, // It is recommended to use the default video frame rate of 25
"bitrate": 2000, //video bitrate
"alpha": 0, //1 alpha channel
"room_id": "" //stream room ID
},
"interactive_scene": "type=live;target_section=[0,3000]", // 2D usage — control target
"mask_region": "[0,51,1080,1347]",
"move_h": 12, //[-4096, +4096] Pixel offset for host translation. Controls the horizontal distance between the host’s center position and the composite image center. Negative values shift left; positive values shift right
"move_v": 0, // [-4096, +4096],Controls the vertical offset of the host within the display frame. A value of 0 is the default, placing the host at the bottom edge of the frame. Negative values move the host downward; positive values move the host upward
"scale": 0.99,
"vad_mode": 0, // 0: VAD disabled; 1: VAD detection; 2: VAD noise reduction
"audio_format": 1/2, //0:16k,2:24k
"avatar_id": "118801001", //Avatar id
//live photo opo4b8621000000116 training-free rvdbd6051000000111 standard 2D 110006001
"width": 1280, // video resolution:width
"height": 720 // video resolution:height
},
"tts": { // synthesis parameters
"vcn": "", // voice
"speed": 50, // speed:[0,100],default:50
"pitch": 50, // pitch:[0,100],default:50
"volume": 50, // volume:[0,100],default:50
}
},
"payload": {
"background": { // optional
"data": "xxxxx", //When type is "url", the value is an external background link; when empty, it represents background data in base64
"type": "url" //url、res_id、data
}
}
}
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | start | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Parameter.avatar
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| stream | Streaming Data Segment | Object | Yes | ||
| stream.protocol | Streaming Protocol | string | xrtc | Yes | |
| stream.fps | Streaming Frame Rate | Int | 13-25 | No | 25 |
| stream.bitrate | Streaming Bitrate | int | 100-5000 Unit:kb | No | 2000 |
| stream.alpha | Transparent Channel Streaming | int | 1 Transparent Channel (xrtc Protocol Activation) | No | 0 |
| stream.room_id | Streaming Room ID | string | String up to 32 characters | No | |
| avatar_id | Virtual Avatar ID | string | 118801001 Authorization Required | Yes | |
| mask_region | Virtual Avatar Cropping Parameters | string | [0,51,1080,1347] | No | |
| width | Resolution Width | int | Must be a multiple of 4 and not greater than 4096 | No | 720 |
| height | Resolution Height | int | Must be a multiple of 4 and not greater than 4096 | No | 1280 |
| scale | Virtual Avatar Scale | float | [0.1, 1.0] Anchor Size in Background (relative to original target video) | No | 1 |
| move_h | Virtual Avatar Translation Pixel Distance | int | [-4096, +4096], Defines the horizontal pixel distance for anchor translation, controlling the offset between the anchor’s center position and the composite image center. Negative values indicate left translation, positive values indicate right translation. | No | 0 |
| move_v | Virtual Avatar Vertical Movement Distance | int | [-4096, +4096],Controls the vertical movement distance of the anchor within the display frame. A value of 0 sets the anchor at the bottom edge by default. Negative values move the anchor downward, positive values move the anchor upward. | No | 0 |
| interactive_scene | Avatar Interaction Parameter Control | string | type=live;target_section=[0,3000] | No | |
| audio_format | Audio Driver Sample Rate | int | 1:16k, 2:24k | No | 1 |
Parameter.tts
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| vcn | Synthetic Voice | string | x4_yezi ... Requires Authorization to Use | No | Default Avatar Voice |
| speed | Speech Synthesis Rate | int | [0,100] | No | 50 |
| pitch | Speech Synthesis Pitch | int | [0,100] | No | 50 |
| volume | Speech Synthesis Volume | int | [0,100] | No | 50 |
Payload.background parameter
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| data | Background Data | string | Background url | No | |
| type | Background Type | string | url | No |
Response
Response example
// response data
{
"header": {
"code": 0,
"message": "success",
"sid": "vdh009b01e0@dx18f70f917bc0001772",
"status": 0
},
"payload": {
"avatar": {
"request_id": "req009b01e1@dx18f70f917d70001772", //the request_id corresponding to the request
"period": "global",
"event_type": "stream_info", //stream_start
"error_code": 0,
"error_message": "",
"stream_url": "xrtcs://xrtc-cn-east-2.xf-yun.com/ase0001015chu18f70f9df240442402", //stream pull URL
"stream_extend": { //extended parameters for joining
"appid": "1000000001",
"user_sign": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxMDAwMDAwMDAxIiwidGltZSI6MTY0ODAxODQ2MTU0MywiaWF0IjoxNjQ4MTkxMjQyfQ.CTcOh_kCLqvvglo5VLVnjgpZzoFpzk7Un3Et0c9dhUs"
}
}
}
}
Header response
| Parameter | Definition | Type |
|---|---|---|
| code | Return Code: 0 indicates success; any other value indicates an error | int |
| message | Error Description | string |
| sid | Session ID | string |
| session | Session token used for reconnection | string |
Payload.avatar response
| Parameter | Definition | Type |
|---|---|---|
| event_type | Response event types: stream_info: returns the stream URL stream_start: callback triggered on the first frame of the stream | string |
| stream_url | The pull-stream URL returned in the 'stream_info' event | string |
| stream_extend | Extended pull-stream information returned in the 'stream_info' event | json |
Text-Driven protocol
Request
Request example
{
"header": {
"app_id": "xxxx",
"ctrl":"text_driver",
"request_id": "yyyyy"
},
"parameter": {
"avatar_dispatch": {
"interactive_mode": 1 // 0: Append 1: Interrupt
"disable_audit ": 0/1// 0: Enable content moderation 1: Disable content moderation Default: 0 Controls whether content moderation is enabled for text-driven processing; effective at the driver level
}
"tts": { // synthesis parameters
"vcn": "", // voice
"speed": 50, // speed:[0,100],default:50
"pitch": 50, // pitch:[0,100],default:50
"volume": 50, // volumn:[0,100],default:50
"auido": {
"sample_rate": 16000, //16000、24000
}
}
},
"payload": {
"text": {
//text that drives the digital human’s narration
"content": "I am a digital human"
},
"json_text":{
"text": "iFLYTEK picture‑in‑picture test: no matter how vast the sea of people",
"cmd": [
{
"type": "background_image/background_video",//background_video not supported currently
"value": "external link"
},
{
"type": "front_image/front_video",
"value": "external link",
"position_x": x,
"position_y": y,
"layer": 1,//Video layer index; 0 represents the current virtual human video frame. Higher values appear in front.
"transparency": 0.5, // opacity; if not set, the content is fully opaque (currently unsupported, planned for future expansion)
"width": 100,
"height": 200
}
]
}
}
}
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | text_driver | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Parameter.avatar_dispatch
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| interactive_mode | Driver Type | int | 0 = append; 1 = interrupt | No | 1 |
| enable_action_status | Action Status Returned in the Response | int | 0 = no return; 1 = return | No | 0 |
Parameter.tts
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| vcn | Synthetic Voice | string | x4_yezi ... Requires Authorization to Use | No | Default voice of the avatar configuration |
| speed | Speech Synthesis Rate | int | [0,100] | No | 50 |
| pitch | Speech Synthesis Pitch | Int | [0,100] | No | 50 |
| volume | Speech Synthesis Volume | int | [0,100] | No | 50 |
| audio | Information about the synthesized audio | object | No | ||
| audio.sample_rate | Synthesized Audio Sample Rate | int | 16000 24000 | No | 16000 |
Payload.text parameter
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| content | Text | string | Driver text, up to 2000 characters | Yes |
Response
Response example
// response data
{
"header": {
"code": 0,
"message": "success",
"sid": ""
},
"payload": {
"avatar": {
"request_id": "",
"period": "gloable/driver",
"event_type": "stream_start/driver_status/atcion_status/tts_duration/pong",
"vmr_status": "0,1,2",
"frame_num": 0,
"error_code": 0, //abnormal error code; connection remains open
"error_message": "" //abnormal message
}
}
}
Header response
| Parameter | Definition | Type |
|---|---|---|
| code | Return code: 0 indicates success; any other value indicates an error. | int |
| message | Error Description | string |
| sid | Session ID | string |
Payload.avatar Response
| Segment | Definition | Type |
|---|---|---|
| request_id | ID of the single drive operation | string |
| period | gloable/driver | string |
| event_type | driver_status:The request_id is a unique identifier provided by the client; if omitted, it will be generated automatically. period=driver indicates a driver‑level event. Values 0, 1, and 2 represent the start, intermediate processing, and completion of the driver, respectively. action_status: Processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}} The request_id is a unique identifier provided by the client. This field is not returned externally for now. tts_duration: Duration of the synthesized audio. The request_id is a unique identifier provided by the client. | string |
| vmr_status | 0 = start; 1 = intermediate processing; 2 = end | int |
| frame_num | Corresponding frame count | int |
| error_code | Exception status of a single drive operation: 0 indicates success; any other value indicates an error. | int |
| error_message | Exception description of a single drive operation | string |
Audio drive protocol
Example
{
"header": {
"app_id": "",
"ctrl":"audio_driver",
"request_id": ""
},
"parameter": {
"avatar_dispatch": {
"audio_mode":1,0// whether it is real-time input
"target_type":"live/genneral"//the target segment corresponding to the current synthesized speech
}
},
"payload": {
"audio": {
// driver audio
"encoding": "raw",
"sample_rate": 16000,
"channels": 1,
"bit_depth": 16,
"status": 0, //0,1,2
"seq": 1,
"audio": "", //audio base64
"frame_size": 0
},
"avatar": [
{
"type": "action", // type:action
"value": "A_LH_introduced_O", // action name
"tb":xxx //Time offset, in milliseconds relative to the start of the session
}
]
}
}
// response data
{
"header": {
"code": 0,
"message": "success",
"sid": ""
},
"payload": {
"avatar": {
"request_id": "",
"period": "gloable/driver",
"event_type": "stream_start/driver_status/atcion_status/tts_duration",
"vmr_status": "0,1,2",
"frame_num": 0,
"error_code": 0, //abnormal error code; connection remains open
"error_message": "" //abnormal message
}
}
}
Request
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | audio_driver | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Parameter.avatar_dispatch
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| audio_mode | Audio Type | int | 0 for non-real-time audio (audio file); 1 for real-time audio | No | 0 |
Payload.audio parameter
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| encoding | Audio Codec | string | pcm | No | Value range is enumerable. |
| sample_rate | Sample Rate | int | 16000 24000 audio sample rate | No | 16000 |
| channels | Number of Channels | int | Channel Count: 1 | No | 1 |
| bit_depth | Bit Depth | int | 16-bit sample depth | No | 16 |
| status | Data Status | int | 0:start, 1:processing, 2:end | Yes | |
| seq | Data Sequence Number | int | Minimum value:0, Maximum value:9999999 | No | 0 |
| frame_size | Frame Size | int | Minimum value:0, Maximum value:1024 | No | Frame size, default 0 |
| audio | Audio Data | string | Minimum size:1B, Maximum size:10485760B. Audio data in Base 64 | Yes |
Payload.avatar parameter
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| type | Type | string | action (operation) | Yes | |
| value | Word for the corresponding type | string | Action name | Yes | |
| tb | Time offset, in milliseconds relative to the start of the sub-session | int | 0-99999 | Yes | |
| te | Time offset, in milliseconds relative to the end of the sub-session | int | -1 indicates continuation until the end | No |
Response
Header response
| Parameter | Definition | Type |
|---|---|---|
| code | Return code: 0 indicates success; any other value indicates an error | int |
| message | Error Description | string |
| sid | Session ID | string |
Payload.avatar response
| Segment | Definition | Type |
|---|---|---|
| request_id | ID of the single drive operation | string |
| period | gloable/driver | string |
| event_type | driver_status: the request_id is a unique identifier provided by the client; if omitted, it will be generated automatically. period=driver indicates a driver‑level event. Values 0, 1, and 2 represent the start, intermediate processing, and completion of the driver, respectively. action_status: Processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}},The request_id is a unique identifier provided by the client. This field is not returned externally for now. | string |
| vmr_status | 0 = start ; 1 = intermediate processing; 2 = end | int |
| frame_number | Corresponding frame count | int |
| error_code | Exception status of a single drive operation: 0 indicates success; any other value indicates an error. | int |
| error_message | Exception description of a single drive operation | string |
Standalone command protocol (currently supports real-time actions only)
Example
// request data
{
"header": {
"app_id": "",
"ctrl":"cmd",//send command separately
"request_id": ""
}
}
// response data
{
"header": {
"code": 0, // error code
"message": "success", // response message
"sid": "" // session id
},
"payload": {
"cmd_text": {
"avatar": [
{
"type": "action", // type:action
"value": "A_LH_introduced_O", // action name
"tb": 0 // trigger action immediately
}
]
}
}
}
Request
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | cmd | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Payload.avatar parameter
| Feature ID | Feature Description | Data Type | Value Range | Required | Default Value |
|---|---|---|---|---|---|
| type | Action type | string | action (operation) | Yes | |
| value | Action name | string | A_LH_introduced_O | Yes | |
| tb | Time offset, in milliseconds relative to the start of the sub-session | int | 0 indicates immediate action trigger | Yes | 0 |
Response
Header response
| Parameter | Definition | Type |
|---|---|---|
| code | Return code: 0 indicates success; any other value indicates an error. | int |
| message | Error Description | string |
| sid | Session ID | string |
Payload.avatar response
| Segment | Definition | Type |
|---|---|---|
| request_id | ID of the single drive operation | string |
| period | gloable/driver | string |
| event_type | atcion_status:processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}} request_id is a unique identifier provided by the client. It is not returned externally for now. | string |
| vmr_status | 0 = start ; 1 = intermediate processing; 2 = end | int |
| frame_number | Corresponding frame count | int |
| error_code | Exception status of a single drive operation: 0 indicates success; any other value indicates an error. | int |
| error_message | Exception description of a single drive operation | string |
Reset (interrupt) protocol
Example
// request data
{
"header": {
"app_id": "",
"uid": "",
"session": "",
"ctrl":"reset",//restore the virtual human to the silent streaming state
"request_id": ""
}
}
// response data
{
"header": {
"code": 0,
"message": "success",
"sid": "vms000eb24a@dx18df2c3ec946f19882"
},
"payload": {
"avatar": {
"request_id": "",
"period":"driver",
"event_type":"reset",
"error_code":0,
"error_message":""
}
}
}
Request
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | reset | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Response
Header response
| Segment | Definition | Type |
|---|---|---|
| code | Return code: 0 indicates success; any other value indicates an error | int |
| message | Error Description | string |
| sid | Session ID | string |
Payload.avatar response
| Segment | Definition | Type |
|---|---|---|
| request_id | ID of the single drive operation | string |
| period | gloable/driver | string |
| event_type | reset: Returned in response to a user‑initiated reset request. | string |
| error_code | Exception status for a single execution: 0 indicates success; any other value indicates an error. | int |
| error_message | Exception description for a single execution | string |
Ping (keep-alive) protocol
Example
// request data
{
"header": {
"app_id": "",
"uid": "",
"session": "",
"ctrl":"ping",
"request_id": ""
}
}
// response data
{
"header": {
"code": 0,
"message": "success",
"sid": "vms000eb24a@dx18df2c3ec946f19882"
},
"payload": {
"avatar": {
"request_id": "",
"period":"driver",
"event_type":"pong",
"error_code":0,
"error_message":""
}
}
}
Request
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | ping | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Response
Header response
Payload.avatar response
Stop protocol
Example
// request data
{
"header": {
"app_id": "",
"ctrl":"stop",
"request_id": ""
}
}
// response data
{
"header": {
"code": 0, // error code
"message": "success", // response message
"sid": "", // session id
"session": "" // session information
},
"payload": {
"avatar": {
"request_id": "",
"period": "gloable",
"event_type": "stop",
"error_code":0,
"error_message":""
}
}
}
Request
Header parameter
| Parameter | Definition | Type | Limitations | Required |
|---|---|---|---|---|
| app_id | APPID information applied from the platform | string | "maxLength":50 | Yes |
| ctrl | Control Parameter | string | Stop | Yes |
| request_id | Unique Request ID | string | "maxLength":50 | Yes |
Response
Header response
| Segment | Definition | Type |
|---|---|---|
| code | Return code: 0 indicates success; any other value indicates an error | int |
| message | Error Description | string |
| sid | Session ID | string |
Payload.avatar response
| Segment | Definition | Type |
|---|---|---|
| request_id | ID of the single drive operation | string |
| period | gloable/driver | string |
| event_type | stop: returned in response to a user‑initiated stop request. | string |
| error_code | Exception status for a single execution: 0 indicates success; any other value indicates an error. | int |
| error_message | Exception description for a single execution | string |
Demos
In This Page: