Real-time cloud API

Protocol description

The capability interface uses the WebSocket protocol;
All protocol fields use UTF-8 encoding;

Access requirements

APPID, APISecret, and APIKey created on the platform
Authorized virtual human avatar_id and voice vcn available for use

Integration endpoint

API handshake

Refer to the iFLYTEK Open Platform handshake authentication
https://global.xfyun.cn/doc/tts/online_tts/API.html#authentication-method

Request URL

wss://avatar-api-gp.xf-yun.com/v1/interact

Request structure description

Parameter Definition Type Description
header Protocol Header Object Request Header
parameter Capability Parameters Object AI Capability Parameters — for enabling or disabling specific AI engine capabilities.
payload Service Alias Object Request Data Packet

Response structure description

Parameter Definition Type Description
header Protocol Header Object Response Header
payload Service Alias Object Response Data Packet

API Message

Start initialization protocol

Request

Request example

{
    "header": {
        "app_id": "xxxx",
        "request_id": "xxxx",
        "res_key": "",//User resources are stored in the storage gateway. External links are encrypted, and the loader decrypts them before downloading
        "ctrl": "start"
    },
    "parameter": {
        "avatar": {
            "stream": {
                "protocol": "xrtc", // Protocols supported: xrtc
                "fps": 25, // It is recommended to use the default video frame rate of 25
                "bitrate": 2000, //video bitrate
                "alpha": 0, //1 alpha channel
                "room_id": "" //stream room ID
            },
            "interactive_scene": "type=live;target_section=[0,3000]", // 2D usage — control target
            "mask_region": "[0,51,1080,1347]",
            "move_h": 12, //[-4096, +4096] Pixel offset for host translation. Controls the horizontal distance between the host’s center position and the composite image center. Negative values shift left; positive values shift right
            "move_v": 0, // [-4096, +4096],Controls the vertical offset of the host within the display frame. A value of 0 is the default, placing the host at the bottom edge of the frame. Negative values move the host downward; positive values move the host upward
            "scale": 0.99,
            "vad_mode": 0, // 0: VAD disabled; 1: VAD detection; 2: VAD noise reduction
            "audio_format": 1/2, //0:16k,2:24k
            "avatar_id": "118801001", //Avatar id  
            //live photo opo4b8621000000116  training-free rvdbd6051000000111  standard 2D 110006001
            "width": 1280, // video resolution:width
            "height": 720 // video resolution:height
        },
        "tts": { // synthesis parameters
            "vcn": "", // voice
            "speed": 50, // speed:[0,100],default:50
            "pitch": 50, // pitch:[0,100],default:50
            "volume": 50, // volume:[0,100],default:50
        }
      },
    "payload": {
        "background": { // optional
            "data": "xxxxx", //When type is "url", the value is an external background link; when empty, it represents background data in base64
            "type": "url" //url、res_id、data
        }
    }
}

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string start Yes
request_id Unique Request ID string "maxLength":50 Yes

Parameter.avatar

Feature ID Feature Description Data Type Value Range Required Default Value
stream Streaming Data Segment Object Yes
stream.protocol Streaming Protocol string xrtc Yes
stream.fps Streaming Frame Rate Int 13-25 No 25
stream.bitrate Streaming Bitrate int 100-5000 Unit:kb No 2000
stream.alpha Transparent Channel Streaming int 1 Transparent Channel (xrtc Protocol Activation) No 0
stream.room_id Streaming Room ID string String up to 32 characters No
avatar_id Virtual Avatar ID string 118801001 Authorization Required Yes
mask_region Virtual Avatar Cropping Parameters string [0,51,1080,1347] No
width Resolution Width int Must be a multiple of 4 and not greater than 4096 No 720
height Resolution Height int Must be a multiple of 4 and not greater than 4096 No 1280
scale Virtual Avatar Scale float [0.1, 1.0] Anchor Size in Background (relative to original target video) No 1
move_h Virtual Avatar Translation Pixel Distance int [-4096, +4096], Defines the horizontal pixel distance for anchor translation, controlling the offset between the anchor’s center position and the composite image center. Negative values indicate left translation, positive values indicate right translation. No 0
move_v Virtual Avatar Vertical Movement Distance int [-4096, +4096],Controls the vertical movement distance of the anchor within the display frame. A value of 0 sets the anchor at the bottom edge by default. Negative values move the anchor downward, positive values move the anchor upward. No 0
interactive_scene Avatar Interaction Parameter Control string type=live;target_section=[0,3000] No
audio_format Audio Driver Sample Rate int 1:16k, 2:24k No 1

Parameter.tts

Feature ID Feature Description Data Type Value Range Required Default Value
vcn Synthetic Voice string x4_yezi ... Requires Authorization to Use No Default Avatar Voice
speed Speech Synthesis Rate int [0,100] No 50
pitch Speech Synthesis Pitch int [0,100] No 50
volume Speech Synthesis Volume int [0,100] No 50

Payload.background parameter

Feature ID Feature Description Data Type Value Range Required Default Value
data Background Data string Background url No
type Background Type string url No

Response

Response example

// response data
{
    "header": {
        "code": 0,
        "message": "success",
        "sid": "vdh009b01e0@dx18f70f917bc0001772",
        "status": 0
    },
    "payload": {
        "avatar": {
            "request_id": "req009b01e1@dx18f70f917d70001772", //the request_id corresponding to the request
            "period": "global",
            "event_type": "stream_info", //stream_start
            "error_code": 0,
            "error_message": "",
            "stream_url": "xrtcs://xrtc-cn-east-2.xf-yun.com/ase0001015chu18f70f9df240442402", //stream pull URL
            "stream_extend": { //extended parameters for joining
                "appid": "1000000001",
                "user_sign": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxMDAwMDAwMDAxIiwidGltZSI6MTY0ODAxODQ2MTU0MywiaWF0IjoxNjQ4MTkxMjQyfQ.CTcOh_kCLqvvglo5VLVnjgpZzoFpzk7Un3Et0c9dhUs"
            }
        }
    }
}

Header response

Parameter Definition Type
code Return Code: 0 indicates success; any other value indicates an error int
message Error Description string
sid Session ID string
session Session token used for reconnection string

Payload.avatar response

Parameter Definition Type
event_type Response event types:
stream_info: returns the stream URL
stream_start: callback triggered on the first frame of the stream
string
stream_url The pull-stream URL returned in the 'stream_info' event string
stream_extend Extended pull-stream information returned in the 'stream_info' event json

Text-Driven protocol

Request

Request example

{
    "header": {
        "app_id": "xxxx",
        "ctrl":"text_driver",
        "request_id": "yyyyy"
    },
    "parameter": {
      "avatar_dispatch": {
            "interactive_mode": 1 // 0: Append  1: Interrupt
            "disable_audit ": 0/1// 0: Enable content moderation  1: Disable content moderation  Default: 0  Controls whether content moderation is enabled for text-driven processing; effective at the driver level
        }
        "tts": { // synthesis parameters
            "vcn": "", // voice
            "speed": 50, // speed:[0,100],default:50
            "pitch": 50, // pitch:[0,100],default:50
            "volume": 50, // volumn:[0,100],default:50
            "auido": {
                "sample_rate": 16000, //16000、24000
             }
         }
    },
    "payload": {
        "text": {
            //text that drives the digital human’s narration
            "content": "I am a digital human"
        },
        "json_text":{
          "text": "iFLYTEK picture‑in‑picture test: no matter how vast the sea of people",
          "cmd": [
              {
                  "type": "background_image/background_video",//background_video not supported currently 
                  "value": "external link"
              },
              {
                  "type": "front_image/front_video",
                  "value": "external link",
                  "position_x": x,
                  "position_y": y,
                  "layer": 1,//Video layer index; 0 represents the current virtual human video frame. Higher values appear in front.
                  "transparency": 0.5, // opacity; if not set, the content is fully opaque (currently unsupported, planned for future expansion)
                  "width": 100,
                  "height": 200
              }
            ]
        }
    }
}

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string text_driver Yes
request_id Unique Request ID string "maxLength":50 Yes

Parameter.avatar_dispatch

Feature ID Feature Description Data Type Value Range Required Default Value
interactive_mode Driver Type int 0 = append;
1 = interrupt
No 1
enable_action_status Action Status Returned in the Response int 0 = no return;
1 = return
No 0

Parameter.tts

Feature ID Feature Description Data Type Value Range Required Default Value
vcn Synthetic Voice string x4_yezi ... Requires Authorization to Use No Default voice of the avatar configuration
speed Speech Synthesis Rate int [0,100] No 50
pitch Speech Synthesis Pitch Int [0,100] No 50
volume Speech Synthesis Volume int [0,100] No 50
audio Information about the synthesized audio object No
audio.sample_rate Synthesized Audio Sample Rate int 16000 24000 No 16000

Payload.text parameter

Feature ID Feature Description Data Type Value Range Required Default Value
content Text string Driver text, up to 2000 characters Yes

Response

Response example


// response data
{
    "header": {
        "code": 0,
        "message": "success",
        "sid": ""
    },
    "payload": {
        "avatar": {
            "request_id": "",
            "period": "gloable/driver",
            "event_type": "stream_start/driver_status/atcion_status/tts_duration/pong",
            "vmr_status": "0,1,2",
            "frame_num": 0,
            "error_code": 0,  //abnormal error code; connection remains open
            "error_message": ""  //abnormal message
        }
    }
}

Header response

Parameter Definition Type
code Return code: 0 indicates success; any other value indicates an error. int
message Error Description string
sid Session ID string

Payload.avatar Response

Segment Definition Type
request_id ID of the single drive operation string
period gloable/driver string
event_type driver_status:The request_id is a unique identifier provided by the client; if omitted, it will be generated automatically. period=driver indicates a driver‑level event. Values 0, 1, and 2 represent the start, intermediate processing, and completion of the driver, respectively. action_status: Processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}} The request_id is a unique identifier provided by the client. This field is not returned externally for now. tts_duration: Duration of the synthesized audio. The request_id is a unique identifier provided by the client. string
vmr_status 0 = start; 1 = intermediate processing; 2 = end int
frame_num Corresponding frame count int
error_code Exception status of a single drive operation: 0 indicates success; any other value indicates an error. int
error_message Exception description of a single drive operation string

Audio drive protocol

Example

{
    "header": {
        "app_id": "",
        "ctrl":"audio_driver",
        "request_id": ""
    },
    "parameter": {
      "avatar_dispatch": {
            "audio_mode":1,0// whether it is real-time input
            "target_type":"live/genneral"//the target segment corresponding to the current synthesized speech
        }
    },
    "payload": {
        "audio": {
            // driver audio
            "encoding": "raw",
            "sample_rate": 16000,
            "channels": 1,
            "bit_depth": 16,
            "status": 0, //0,1,2
            "seq": 1,
            "audio": "",  //audio base64
            "frame_size": 0
        },
        "avatar": [
           {
              "type": "action", // type:action
              "value": "A_LH_introduced_O", // action name
              "tb":xxx //Time offset, in milliseconds relative to the start of the session
           }
        ]
    } 
}
// response data
{
    "header": {
        "code": 0,
        "message": "success",
        "sid": ""
    },
    "payload": {
        "avatar": {
            "request_id": "",
            "period": "gloable/driver",
            "event_type": "stream_start/driver_status/atcion_status/tts_duration",
            "vmr_status": "0,1,2",
            "frame_num": 0,
            "error_code": 0,  //abnormal error code; connection remains open
            "error_message": ""  //abnormal message
        }
    }
}

Request

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string audio_driver Yes
request_id Unique Request ID string "maxLength":50 Yes

Parameter.avatar_dispatch

Feature ID Feature Description Data Type Value Range Required Default Value
audio_mode Audio Type int 0 for non-real-time audio (audio file); 1 for real-time audio No 0

Payload.audio parameter

Feature ID Feature Description Data Type Value Range Required Default Value
encoding Audio Codec string pcm No Value range is enumerable.
sample_rate Sample Rate int 16000 24000 audio sample rate No 16000
channels Number of Channels int Channel Count: 1 No 1
bit_depth Bit Depth int 16-bit sample depth No 16
status Data Status int 0:start, 1:processing, 2:end Yes
seq Data Sequence Number int Minimum value:0, Maximum value:9999999 No 0
frame_size Frame Size int Minimum value:0, Maximum value:1024 No Frame size, default 0
audio Audio Data string Minimum size:1B, Maximum size:10485760B. Audio data in Base 64 Yes

Payload.avatar parameter

Feature ID Feature Description Data Type Value Range Required Default Value
type Type string action (operation) Yes
value Word for the corresponding type string Action name Yes
tb Time offset, in milliseconds relative to the start of the sub-session int 0-99999 Yes
te Time offset, in milliseconds relative to the end of the sub-session int -1 indicates continuation until the end No

Response

Header response

Parameter Definition Type
code Return code: 0 indicates success; any other value indicates an error int
message Error Description string
sid Session ID string

Payload.avatar response

Segment Definition Type
request_id ID of the single drive operation string
period gloable/driver string
event_type driver_status: the request_id is a unique identifier provided by the client; if omitted, it will be generated automatically. period=driver indicates a driver‑level event. Values 0, 1, and 2 represent the start, intermediate processing, and completion of the driver, respectively.
action_status: Processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}},The request_id is a unique identifier provided by the client. This field is not returned externally for now.
string
vmr_status 0 = start ; 1 = intermediate processing; 2 = end int
frame_number Corresponding frame count int
error_code Exception status of a single drive operation: 0 indicates success; any other value indicates an error. int
error_message Exception description of a single drive operation string

Standalone command protocol (currently supports real-time actions only)

Example

// request data
{
    "header": {
        "app_id": "",
        "ctrl":"cmd",//send command separately
        "request_id": ""
      }
}
// response data
{
    "header": {
        "code": 0, // error code
        "message": "success", // response message 
        "sid": "" // session id
    }"payload": {
        "cmd_text": {
            "avatar": [
                {
                    "type": "action", // type:action
                    "value": "A_LH_introduced_O", // action name
                    "tb": 0 // trigger action immediately
                }
            ]
        }
    }
}

Request

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string cmd Yes
request_id Unique Request ID string "maxLength":50 Yes

Payload.avatar parameter

Feature ID Feature Description Data Type Value Range Required Default Value
type Action type string action (operation) Yes
value Action name string A_LH_introduced_O Yes
tb Time offset, in milliseconds relative to the start of the sub-session int 0 indicates immediate action trigger Yes 0

Response

Header response

Parameter Definition Type
code Return code: 0 indicates success; any other value indicates an error. int
message Error Description string
sid Session ID string

Payload.avatar response

Segment Definition Type
request_id ID of the single drive operation string
period gloable/driver string
event_type atcion_status:processing status of the engine action. For example:"vmr_action_status":{"action":{"type":"A_RH_like_O","state":2}}} request_id is a unique identifier provided by the client. It is not returned externally for now. string
vmr_status 0 = start ; 1 = intermediate processing; 2 = end int
frame_number Corresponding frame count int
error_code Exception status of a single drive operation: 0 indicates success; any other value indicates an error. int
error_message Exception description of a single drive operation string

Reset (interrupt) protocol

Example

// request data
{
    "header": {
        "app_id": "",
        "uid": "",
        "session": "",
        "ctrl":"reset",//restore the virtual human to the silent streaming state
        "request_id": ""
      }
}
// response data
{
    "header": {
        "code": 0,
        "message": "success",
        "sid": "vms000eb24a@dx18df2c3ec946f19882"

    },
    "payload": {
        "avatar": {
          "request_id": "",
          "period":"driver",
          "event_type":"reset",
          "error_code":0,
          "error_message":""
         }
    }
}

Request

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string reset Yes
request_id Unique Request ID string "maxLength":50 Yes

Response

Header response

Segment Definition Type
code Return code: 0 indicates success; any other value indicates an error int
message Error Description string
sid Session ID string

Payload.avatar response

Segment Definition Type
request_id ID of the single drive operation string
period gloable/driver string
event_type reset: Returned in response to a user‑initiated reset request. string
error_code Exception status for a single execution: 0 indicates success; any other value indicates an error. int
error_message Exception description for a single execution string

Ping (keep-alive) protocol

Example

// request data
{
    "header": {
        "app_id": "",
        "uid": "",
        "session": "",
        "ctrl":"ping",
        "request_id": ""
      }
}
// response data
{
    "header": {
        "code": 0,
        "message": "success",
        "sid": "vms000eb24a@dx18df2c3ec946f19882"

    },
    "payload": {
        "avatar": {
          "request_id": "",
          "period":"driver",
          "event_type":"pong",
          "error_code":0,
          "error_message":""
        }
    }
}

Request

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string ping Yes
request_id Unique Request ID string "maxLength":50 Yes

Response

Header response

Payload.avatar response

Stop protocol

Example

// request data
{
    "header": {
        "app_id": "",
        "ctrl":"stop",
        "request_id": ""
      }
}
// response data
{
    "header": {
        "code": 0,                // error code  
        "message": "success",    // response message
        "sid": "",                 // session id
        "session": ""            // session information
    }"payload": {
        "avatar": {
            "request_id": "",
            "period": "gloable",
            "event_type": "stop",
            "error_code":0,
            "error_message":""
        }
    } 
}

Request

Header parameter

Parameter Definition Type Limitations Required
app_id APPID information applied from the platform string "maxLength":50 Yes
ctrl Control Parameter string Stop Yes
request_id Unique Request ID string "maxLength":50 Yes

Response

Header response

Segment Definition Type
code Return code: 0 indicates success; any other value indicates an error int
message Error Description string
sid Session ID string

Payload.avatar response

Segment Definition Type
request_id ID of the single drive operation string
period gloable/driver string
event_type stop: returned in response to a user‑initiated stop request. string
error_code Exception status for a single execution: 0 indicates success; any other value indicates an error. int
error_message Exception description for a single execution string

Demos

Virtual Broadcast API demo java

Virtual Broadcast API demo python3