# Online TTS WebAPI Document

# Description of the Interface

The stream interface for Online TTS converts text information into acoustic information, and provides a number of distinctive speakers (acoustic library) for you to choose from. Click here to experience the speech effect online. This function is a common interface provided to developers through Websocket API. Websocket API can achieve streaming, applicable to AI service scenarios that require streaming data transmission. Compared to SDK, API is featured as lightweight and cross-language; compared to the HTTP API, the Websocket API protocol has the advantage of natively supporting cross-domain.

The interface for the previous common version of WebAPI (http[s]: //api.xfyun.cn/v1/service/v1/tts) is not open to the public any more. The users who have already selected that version can still use it. At the same time, welcome the ones who would like to experience the new version, i.e. the stream interface, and complete the relocation as soon as possible.

Available Voices

iFLYTEK provides a variety of different voices in multiple languages for synthesizing speech from text.

Language

Name/Parameter

Gender

Japanese

 

qianhui

Female

x2_zhongcun

Male

Indonesian

x2_suid

Male

Russian

allabent

Female

French

mariane

Female

German

leonie

Female

Arabian

mohamed

Female

Urdu

baili

Female

Hindi

abha

Female

Vietnamese

xiaoyun

Female

Thai

yingying

Female

Malay

malayrole

Female

English

x_John

Male

x_Steve

Male

x_Catherine

Female

Chinese

x_xiaoyang_story

Female

x_xiaolin

Female

x_xiaoyan

Female

x_xiaoyuan

Female

x_laoma

Female

x_xiaoxi

Female

x_xiaomei

Female

x_xiaofeng

Female

x_xiaoxue

Female

x_yifeng

Male

x_john_ce

Male

x_catherine_ce

Female

x_steve_ce

Male

Hokkien dialect

jiaona

Female

  • In addition to the above voices, iFLYTEK can build you a custom Brand Voice that reflects your brand persona, providing you the opportunity to offer unique and exclusive NTTS voices to your customers.

  • If a minority language is used, unicode encoding must be used and tte = unicode (unicode is utf16 little endian encoding) should be set for uploading text encoding;

  • Only after the minority language speaker is enabled at the console first, can the minority language synthesis be achieved, otherwise, the error 11200 will be reported.

# Interface Demo

Interface DemoPlease click here to download
At present, only the demos for some development languages are provided. For other languages, please carry out the development referring to the interface document below. Welcome enthusiastic developers to visit the iFLYTEK Open Platform Community, and share your demos therein.

# Requirements for the Interface

The following requirements should be met if the stream API for speech synthesis online is integrated.

Content Description
Request Protocol ws[s]( wss is strongly recommended for improving the security)
Request URL ws[s]: //tts-api-sg.xf-yun.com/v2/tts
Request Line GET /v2/tts HTTP/1.1
Interface Authentication Signature mechanism, refer to [Interface Authentication](#Interface Authentication) below for details.
Character Encoding UTF8、GB2312、GBK、BIG5、UNICODE、GB18030
Response Format Unified JSON format
Development Language Any language which allows to make Websocket requests for iFLYTEK cloud services.
Operation System Any operation system
Audio Attributes Sampling rate: 16k or 18k
Audio Format pcm, mp3, speex(8k), speex-wb(16k)
Text Length The length for a single call should be less than 8000 bytes ( 2000 Chinese characters)
Speaker Multi-languages including Chinese, English, Japanese, Indonesian, Russian, French,etc, you can click Here (opens new window) to experience speaker effect online.

# Interface Calling Flow

  • Calculate the signature based on hmac-sha256 through the interface key, and send the Websocket protocol handshake request to the server-side. Refer to [Interface Authentication](#Interface Authentication) below for details.
  • After the handshake succeeds, the client can upload and receive data simultaneously through the Websocket connection. After the data is completely uploaded, the client should upload the end-of-data marker once. Refer to [Interface Data Transmitting and Receiving](#Interface Data Transmitting and Receiving) below.
  • Disconnect the Websocket connection after receiving all the returned indicators of the results from the server-side.

*Note: Observe the following cautions when using Websocket

  1. The websocket-version supported by the server-side is 13. Ensure the framework used by the client supports this version.
  2. The type of all frames returned by the server-side is TextMessage, which corresponds to opcode = 1 in the protocol frame of the native websocket. Please ensure that the frame type parsed by the client must be this type. Otherwise, try to upgrade the client frame version or replace the technical framework.
  3. If there is a framing problem, namely, a json data packet is returned to the client by multiple frames, causing the client’s failure in parsing the json. In most cases, this is because the client’s framework has a problem in parsing the websocket protocol. Therefore, try to upgrade the framework version first, or replace the technical framework in the event of this problem.
  4. If it is necessary to close the connection after the client session is over, try to ensure that the error code sent to the server-side is the error code 1000 of websocket (ignore this requirement if the client framework is not provided with an interface for sending the error code upon closing of the session).

# Whitelist

The IP whitelist is disabled by default, which means this service does not restrict from calling IP.
When calling this business interface:

  • If the IP whitelist is disabled, the IP is considered as unrestricted for the interface, and it will not be checked.
  • If the IP whitelist is enabled, the server-side will check if the caller’s IP is included in the IP whitelist configured by the iFLYTEK Open Platform. For the requests from an IP that is not configured in the whitelist, the server-side will reject to provide services.

Rules for IP Whitelist

  • Perform edit at console-IP whitelist for the corresponding service, and the edited content will become valid about five minutes after saved.
  • The IP whitelists should be set separately for different services of different APPIDs.
  • The IP whitelist should be set to Internet IP instead of LAN IP
  • If {"message":"Your IP address is not allowed"} is returned during the handshake phase, it indicates that the server refuses to provide the service because the IP whitelist is set incorrectly or is still invalid.

# Interface Authentication

In the handshake phase, the requester is required to sign the request, and the server-side verifies the validity of the request through the signature.

# Authentication Method

By adding authentication-related parameters after the request URL. Sample url:

wss://tts-api-sg.xf-yun.com/v2/tts?authorization=aG1hYyB1c2VybmFtZT0iZGE0ZjMyOWUyZmQwMGQ1NjE4NjVjNjRkZjU3NDNiMjAiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0ic1RtbzRobDBMdmRLWTRLRjltcGJKV0htRFFzNC8xZ2ZPdUgwZnBZbVdnbz0i&date=Thu%2C%2001%20Aug%202019%2001%3A53%3A21%20GMT&host=tts-api.xfyun.cn

Authentication Parameters:

Parameter Type Required Description Example
host string Yes Request host tts-api.xfyun.cn
date string Yes Current timestamp, in RFC1123 format Thu, 01 Aug 2019 01:53:21 GMT
authorization string Yes Information related to the signature encoded with base64 (the signature is based on hmac-sha256 calculation). Refer to the rules for generation of authorization parameters below.

· Rules for Generation of date Parameters

date must be based on UTC+0 or GMT time zone, in RFC1123 format (Thu, 01 Aug 2019 1:53:21 AM GMT).
The server-side will check the clock skew for Date, allowing the maximum deviation of 300 seconds. Once such value is exceeded, the request will be rejected.

· Rules for Generation of authorization Parameters

1)Get the interface keys APIKey and APISecret.
They are both 32-bit strings and can be viewed after creating an application at the iFLYTEK Open Platform to enter the speech synthesis (stream version) service.
2)The format of the parameter authorization before encoded with base64(authorization_origin) is as follows.

api_key="$api_key",algorithm="hmac-sha256",headers="host date request-line",signature="$signature"

Where, api_key is the APIKey got from the console, algorithm is an encryption algorithm (supports hmac-sha256 only), and headers are the parameters involved in signature. signature is a string obtained by signing the parameter participating in signature using the encryption algorithm and encoding it with base64. See below for details.

*Note: headers are the parameters participating in signature. Please note that they are fixed parameter names ("host date request-line"), instead of the values of these parameters.

3)The rules for the original field of signature (signature_origin) are as follows:

The original field of signature is composed of three parameters, i.e. host, date and request-line which are concatenated in a certain format. The concatenation format is (\n is a new line character, ’:’ is followed by a space):

host: $host\ndate: $date\n$request-line

Suppose

请求url = wss://tts-api-sg.xf-yun.com/v2/tts
date = Thu, 01 Aug 2019 01:53:21 GMT

Then, the original field of signature (signature_origin) is:

host: tts-api.xfyun.cn
date: Thu, 01 Aug 2019 01:53:21 GMT
GET /v2/tts HTTP/1.1

4)Sign the signature_origin using the hmac-sha256 algorithm in combination with apiSecret, to obtain the signed summary, i.e. signature_sha.

signature_sha=hmac-sha256(signature_origin,$apiSecret)

Where, apiSecret is the APISecret got from the console.

5)Encode the signature_sha with base64 to get the final signature.

signature=base64(signature_sha)

Suppose

APISecret = secretxxxxxxxx2df7900c09xxxxxxxx	
date = Thu, 01 Aug 2019 01:53:21 GMT

Then, the signature is

signature=sTmo4hl0LvdKY4KF9mpbJWHmDQs4/1gfOuH0fpYmWgo=

6)According to the above information, concatenate the string of authorization before it is encoded with base64(authorization_origin). See the example below.

api_key="keyxxxxxxxx8ee279348519exxxxxxxx", algorithm="hmac-sha256", headers="host date request-line", signature="sTmo4hl0LvdKY4KF9mpbJWHmDQs4/1gfOuH0fpYmWgo="

*Note: headers are the parameters participating in signature. Please note that they are fixed parameter names ("host date request-line"), instead of the values of these parameters.

7)Finally, encode the authorization_origin with base64 to get the authorization parameter.

authorization = base64(authorization_origin)
Example:
authorization=aG1hYyB1c2VybmFtZT0iZGE0ZjMyOWUyZmQwMGQ1NjE4NjVjNjRkZjU3NDNiMjAiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0ic1RtbzRobDBMdmRLWTRLRjltcGJKV0htRFFzNC8xZ2ZPdUgwZnBZbVdnbz0i

# Examples of Authentication (golang)

    //@hosturl :  like  wss://tts-api-sg.xf-yun.com/v2/tts
    //@apikey : apiKey
    //@apiSecret : apiSecret
    func assembleAuthUrl(hosturl string, apiKey, apiSecret string) string {
        ul, err := url.Parse(hosturl)
        if err != nil {
            fmt.Println(err)
        }
        //Signing Date:
        date := time.Now().UTC().Format(time.RFC1123)
        //fields participating in signature: host ,date and request-line
        signString := []string{"host: " + ul.Host, "date: " + date, "GET " + ul.Path + " HTTP/1.1"}
        //String consisting of concatenated signatures
        sgin := strings.Join(signString, "\n")
        //Signature results
        sha := HmacWithShaTobase64("hmac-sha256", sgin, apiSecret)
        //Construct request parameters,and urlencoding is not required now
        authUrl := fmt.Sprintf("api_key=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"", apiKey,
            "hmac-sha256", "host date request-line", sha)
        //Encode the request parameters with base64
        authorization:= base64.StdEncoding.EncodeToString([]byte(authUrl))
        v := url.Values{}
        v.Add("host", ul.Host)
        v.Add("date", date)
        v.Add("authorization", authorization)
        //Add the encoded string url encode after url
        callurl := hosturl + "?" + v.Encode()
        return callurl
    }

# Authentication Results

If the handshake succeeds, an HTTP 101 status code will be returned, indicating that the protocol upgrade is successful; if the authentication fails, different HTTP Code status codes will be returned, depending on the type of errors, along with error messages. See the detailed description of errors in the table below.

HTTP Code Description Error Message Solution
401 Request parameters of authorization are not available. {“message”:”Unauthorized”} Check if authorization parameters are available. Refer to[Authorization Parameters](#Authorization Parameters)
401 The parsing of signature parameters fails. {“message”:”HMAC signature cannot be verified”} Check if each signature parameter is available and correct, and if the copied api_key is correct.
401 The signature authentication fails. {“message”:”HMAC signature does not match”} There are many possible reasons for failure of signature authentication.
1. Check if api_key and api_secret is correct.
2. Check if the parameters, i.e. host, date and request-line for calculation of signature are concatenated according to protocol requirements.
3. Check if the base64 length of signature is normal (44 bytes normally).
403 The authentication of clock skew fails. {“message”:”HMAC signature cannot be verified, a valid date or x-date header is required for HMAC Authentication”} Check if the server time is standard. This error is reported when the deviation is more than 5 minutes.

Example of returned messages upon failure of handshake:

    HTTP/1.1 401 Forbidden
    Date: Thu, 06 Dec 2018 07:55:16 GMT
    Content-Length: 116
    Content-Type: text/plain; charset=utf-8
    {
        "message": "HMAC signature does not match"
    }

# Interface Data Transmitting and Receiving

After the handshake succeed, the connection between the client and the server-side will be established, through which, the client can simultaneously upload and receive data.

The client only sends text data and parameters once per session, and when the server-side has a recognition result, it will push the result to the client. When the engine’s data is completely synthesized, an end-of-data marker will be returned, specifically as follows:

{
  "data":{
      ....#other parameters
      "status":2
  }  
}

# Request Parameters

All the request data is json string.

Parameter Name Type Required Description
common object Yes Common parameters, refer to the content below
business object Yes Business parameters, refer to the content below
data object Yes Business data stream parameters, refer to the content below

# Description of Common Parameters (common)

Parameter Name Type Required Description
app_id string Yes APPID information applied from the platform

# Description of Business Parameters (business)

Parameter Name Type Required Description Example
aue string Yes Audio encoding, optional values:
raw: uncompressed pcm
lame: mp3 (transfer parameter sfl = 1 is required when aue = lame)
speex-org-wb; 7: Standard open source speex (for speex_wideband, i.e. 16K), the number indicates the specified compression level (level 8 by default)
speex-org-nb; 7: Standard open source speex (for speex_narrowband, i.e. 8k), the number indicates the specified compression level (level 8 by default)
speex; 7: compression format, compression level 1 ~ 10, level 7 by default (8k iFLYTEK custom speex)
speex-wb; 7: compression format, compression level 1 ~ 10, default is 7 (16k iFLYTEK custom speex)
"raw"
"speex-org-wb; 7”the number indicates the specified compression level (level 8 by default). Refer to Description of Audio Formats for description of the standard open source speex encoding and iFLYTEK custom speex required to transfer for numbers.
sfl int No Combine the use of aue = lame to enable the streaming return
mp3 format audio
Value: 1 enabled
1
auf string No Audio sampling rate, optional values:
audio / L16; rate = 8000: audio synthesized to 8K
audio / L16; rate = 16000: audio synthesized to 16K
auf without value passing: audio synthesized to 16K
"audio/L16;rate=16000"
vcn string Yes Speaker, optional value: Please add it at the console for trial or purchase. After it is added, the speaker parameter value will be displayed. "xiaoyan"
speed int No Speech rate, optional value: [0-100], 50 by default 50
volume int No Volume, optional value: [0-100], 50 by default 50
pitch int No Pitch, optional value: [0-100], 50 by default 50
bgs int No Background sound of synthesized audio
0: without background sound (default)
1: with background sound
0
tte string Yes Text encoding format
GB2312
GBK
BIG5
encoding must be used for minority languages, the utf16 little endian encoding should be used for the synthesized text. Refer to[java demos](#Examples of Calls) for details
GB18030
UTF8
"UTF8"
reg string No Setting the pronunciation mode for English:
0:Automatic judgment and processing, if not sure, it will be processed as per spelling of English words (default)
1:All English is pronounced alphabetically
2:Automatic judgment and processing, if not sure, it will be read aloud as per letters.
The pronunciation is as per English word by default.
"2"
rdn string No Pronunciation mode of synthesized audio numbers
0: Automatic judgment (default value)
1: Full value
2: Complete string
3: String first
"0"

# Description of Business Parameters (data)

Parameter Name Type Required Description
text string Yes Text content, to be encoded with base64;
The maximum length of base64 encoding should be less than 8000 bytes and about 2000 Chinese characters
status int Yes Data status, fixed at 2
*Note: Since the text from stream synthesis can only be transmitted at one time, and segmenting transmission is not supported, so the status here must be 2.

Examples of request parameters:

        {  
            "common":{
                "app_id":"123456"
            },
        	"business":{
        	     "vcn":"xiaoyan",
        	     "aue":"raw",
        	     "speed":"50"
        	},
        	"data":{
        	    "status":2,
                "encoding":"gbk",
                "text":"exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf..."    
        	 }
        }

Example of end-of-data-upload marker:

    {
    "data":{
      "status":2
        }
    }

Description of Returned Parameters:

Parameter Name Type Description
code int Returned code, 0 means success, and any other code means failure. Refer to[Error Codes](#Error Code)。
message string Error Message
data object
data.audio string Synthesized audio clips, encoded with base64
data.status int Current audio stream status, 0 means start of synthesis, 1 means proceeding of synthesis, 2 means end of synthesis
data.ced string Synthesis progress, refers to the number of bytes of current synthesized text
Note: Please note that during synthesis, the data is segmented by sentence. If the text has only one sentence, the ced of each returned result is the same.
sid string Current session id, which is only returned during the first frame request

Examples of Returned Parameters

    {
        "code":0,
        "message":"success",
        "sid":"ttsxxxxxxxxxxx",
        "data":{
            "audio":"QAfe..........",
            "ced":"14",
            "status":2
        }
    }

# Notes

  1. The server-side may return a frame with empty data, and the error code is 0. The client can directly ignore this type of frames which are not parsed.
    2. The frames returned by the synthesis has a large length, and the server-side may return a message to the client by multiple websocket frames. In this case, the client should joint these frames. Of course, this logic has been achieved in most of frameworks, but there may still be some frameworks in which this logic is not made, resulting in failure to parse.
    3. The synthesized audio is a meaningless audio, which is mostly because the character encoding format used by the client is not consistent with the parameters. Please make sure that the value passed by tte is consistent with the text encoding format
    4. The synthesized audio effect is not the desired effect. This can be solved by changing the speaker, (some speakers requires permission!)
    5. Unicode encoding must be used for the texts in minority languages, and tte = Unicode.

# Error Code

Error Code Error Message Description Solution
10005 licc fail APPID authorization fails Verify if APPID is correct and if the synthesis service is activated.
10006 Get audio rate fail Parameter required for request is missing Check if the parameter in the error message is correctly uploaded.
10007 get invalid rate Invalid Request Parameters Check if the parameter value in the error message is within the value range
10010 AIGES_ERROR_NO_LICENSE Insufficient authorization for engine Please submit a work order to the console to contact the technical personnel
10109 AIGES_ERROR_INVALID_DATA Invalid request text length Check if the text length is beyond the limit.
10019 service read buffer timeout, session timeout session timeout Check if the connection is not closed after the data is completely sent.
10101 engine inavtive Engine session ended. Check if the engine has ended the session but the client is still sending data. For example, the audio data is completely sent but the websocket connection is not closed and empty audio etc. is being sent.
10313 appid cannot be empty APPID cannot be empty Check if the common parameter is uploaded correctly, or whether the app_id parameter in common is uploaded correctly or is empty
10317 invalid version Invalid version Connect the technical personnel
11200 auth no license Not authorized Check if an unauthorized speaker is used, or the total number of calls has exceeded the upper limit
11201 auth no enough license Daily flow beyond the control value Contact the business department to increase the number of daily calls
10160 parse request json error Invalid request data format Check if the request data is a valid json
10161 parse base64 string error base64 decoding fails Check if the sent data uses base64 encoding
10163 param validate error:/common 'app_id' param is required Required parameters missing or invalid parameters Check if the parameter in the error message is correctly uploaded.
10200 read data timeout read data timeout Check if the accumulated time without data being sent has been 10s and the connection has not been closed yet.
10222 context deadline exceeded Abnormal network Check if the network is abnormal.

# Demos

API demo java (opens new window)

API demo python3 (opens new window)

# Q&A

# 11200 authorization error is reported for WebAPI online synthesis

Answer: This problem is generally caused by the use of an unauthorized speaker. Please go to the console to check if the speaker is not added or the authorization has expired. Besides, if the total synthesis interaction exceeds the upper limit, the error 11200 is also be reported.

# Audio formats supported by WebAPI online synthesis to save

Answer: It supports to save audios in pcm, mp3 and speex format.

# Website to query error codes and corresponding solutions

Answer: Query of Error Codes and Corresponding Solutions

# What is the limit for the bytes in the online speech synthesis?

Answer: The WebAPI interface is limited to transfer 8000 bytes at a time. For the ultra-long text, it is segmented as per paragraph, and multiple synthesis requests are made.

# Contributing to the Documentation

Is something missing/incorrect? Please let us know by contacting openplatform@iflytek.com. If you know how to fix it straight away, don’t hesitate to create a request (opens new window) to help us improve our document.