Short Form ASR WebAPI Document (Automatic Speech Recognition)

Description of the Interface

The stream interface for Short Form ASR (Automatic Speech Recognition) is used for instant speech-to-text technology within 1 minute. It supports real-time return of recognition results, and can achieve the effect of getting recognized text while uploading audio.
The high-level feature - dynamic correction is now open for free! Multiple minority languages are supported online!

Dynamic Correction:

  • Dynamic Correction Disabled: The recognition result is returned in real time, and the returned result each time is an addition to the previous result;
  • Dynamic Correction Enabled: The recognition result is returned in real time, and the returned result each time may be an addition to the previous result, or a replacement of the previous result (i.e. correction);
  • Compared the Dynamic Correction Disabled status, the granularity of the returned result is smaller, and the visual impact effect is better in the Dynamic Correction Enabled status.
  • To use the Dynamic Correction feature, you need to enable it at console-Short Form ASR-advanced features, then set the corresponding parameters before use, Refer to the [Description of Parameters](#Description of Parameters);
  • The Dynamic Correction feature is only supported in Chinese;
  • The format of the returned result in the Dynamic Correction Disabled status is different from that in the Dynamic Correction Enabled status. Refer to [Returned Result of Dynamic Correction](#Returned Result of Dynamic Correction);

Supporting Languages

Languages Parameter
Chinese zh_cn
English en_us
Malay ms_MY
Hindi hi_in
Russian ru-ru
Japanese ja_jp
Korean ko_kr
Vietnamese vi_VN
Thai th_TH
Bulgarian bg_bg
French fr_fr
German de_DE
Arabian ar_il
Indonesian id_ID
Bengali bn_BD
Spanish es_es

The URL for the use of a minority language is different from that for Chinese and English. Refer to [Requirements for the Interface](#Requirements for the Interface);

The method of setting the parameters for minority languages is detailed in [Description of Parameters](#Description of Parameters) ;

This feature is a common interface provided to developers through Websocket API. Websocket API can achieve streaming, applicable to AI service scenarios that require streaming data transmission, such as recognizing while speaking. Compared to SDK, API is featured as lightweight and cross-language; compared to the HTTP API, the Websocket API protocol has the advantage of natively supporting cross-domain

Interface Demos

Interface Demos Please click here to download demos.
At present, only the demos for some development languages are provided. For other languages, please carry out the development referring to the interface document below. Welcome enthusiastic developers to visit the iFLYTEK Open Platform Community, and share your demos therein.

Requirements for the Interface

The following requirements should be met if the stream API for Short Form ASR (Automatic Speech Recognition) is integrated.

Content Description
Request Protocol ws[s]( wss is strongly recommended for improving the security)
Request URL ws://iat-api-sg.xf-yun.com/v2/iat
Note: The server IP is not fixed. To ensure the stability of your interface, please call the interface by using the domain name instead of by specifying the IP.
Request Line GET /v2/iat HTTP/1.1
Interface Authentication Signature mechanism, refer to[Interface Authentication](#Interface Authentication)below for details.
Character Encoding UTF-8
Response Format Unified JSON format
Development Language Any language which allows to make Websocket requests for iFLYTEK cloud services.
Operation System Any operation system
Audio Attributes Sampling rate: 16k or 18k; Bit length: 16bit; mono
Audio Format pcm
speex(8k)
speex-wb(16k)
mp3 (Chinese Mandarin and English are supported only. Please stay tuned for other dialects and minority languages)
Audio Duration 60s max.
Language Chinese, English, minority languages and Chinese dialects, which can be added at console-Short Form ASR (Automatic Speech Recognition)-dialect / language for trial or purchase.

Interface Calling Flow

  • Calculate the signature based on hmac-sha256 through the interface key, and send the Websocket protocol handshake request to the server-side. Refer to [Interface Authentication](#Interface Authentication) below for details.
  • After the handshake succeeds, the client can upload and receive data simultaneously through the Websocket connection. After the data is completely uploaded, the client should upload the end-of-data marker once. Refer to [ Interface Data Transmitting and Receiving](# Interface Data Transmitting and Receiving)below.
  • Disconnect the Websocket after receiving the server-side’s marker indicating all the results are returned.

Note: Observe the following cautions when using Websocket :

1.The websocket-version supported by the server-side is 13. Ensure the framework used by the client supports this version.
2.The type of all frames returned by the server-side is TextMessage, which corresponds to opcode = 1 in the protocol frame of the native websocket. Please ensure that the frame type parsed by the client must be this type. Otherwise, try to upgrade the client frame version or replace the technical framework.
3.If there is a framing problem, namely, a json data packet is returned to the client by multiple frames, causing the client’s failure in parsing the json. In most cases, this is because the client’s framework has a problem in parsing the websocket protocol. Therefore, try to upgrade the framework version first, or replace the technical framework in the event of this problem.
4.If it is necessary to close the connection after the client session is over, try to ensure that the error code sent to the server-side is the error code 1000 of websocket (ignore this requirement if the client framework is not provided with an interface for sending the error code upon closing of the session).

Whitelist

The IP whitelist is disabled by default, which means this service does not restrict from calling IP.
When calling this business interface:

  • If the IP whitelist is disabled, the IP is considered as unrestricted and the interface will not check it.
  • If the IP whitelist is enabled, the server-side will check if the caller’s IP is included in the IP whitelist configured by the iFLYTEK Open Platform. For the requests from an IP that is not configured in the whitelist, the server-side will reject to provide services.

Rules for IP Whitelist

  • Perform edit at console-IP whitelist for the corresponding service, and the edited content will become valid about five minutes after saved.
  • The IP whitelists should be set separately for different services of different APPIDs.
  • The IP whitelist should be set to Internet IP instead of LAN IP
  • If {"message":"Your IP address is not allowed"} is returned during the handshake phase, it indicates that the server-side refuses to provide the service because the IP whitelist is not set correctly or it is still invalid.

Interface Authentication

In the handshake phase, the requester is required to sign the request, and the server-side verifies the validity of the request through the signature.

Authentication Method

By adding authentication-related parameters after the request URL. Sample url:

wss://iat-api-sg.xf-yun.com/v2/iat?authorization=YXBpX2tleT0ia2V5eHh4eHh4eHg4ZWUyNzkzNDg1MTlleHh4eHh4eHgiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0iSHAzVHk0WmtTQm1MOGpLeU9McFFpdjlTcjVudm1lWUVIN1dzTC9aTzJKZz0i&date=Wed%2C%2010%20Jul%202019%2007%3A35%3A43%20GMT&host=iat-api.xfyun.cn

Authentication Parameters:

Parameter Type Required Description Example
host string Yes Request host iat-api.xfyun.cn
date string Yes Current timestamp, in RFC1123 format Wed, 10 Jul 2019 07:35:43 GMT
authorization string Yes Information related to the signature encoded with base64 (the signature is calculated based on hmac-sha256). Refer to the rules for generation of authorization parameters below.

· Rules for Generation of date Parameters

date must be based on UTC+0 or GMT time zone, in RFC1123 format (Wed, 10 Jul 2019 7:35:43 AM GMT).
The server-side will check the clock skew for Date, allowing the maximum deviation of 300 seconds. Once such value is exceeded, the request will be rejected.

· Rules for Generation of authorization Parameters

  1. Get the interface keys APIKey and APISecret.
    They are both 32-bit strings and can be viewed after creating a WebAPI platform application and adding Short Form ASR (Automatic Speech Recognition) service at the console of the iFLYTEK Open Platform.
  2. The format of the parameter authorization before encoded with base64(authorization_origin) is as follows.
api_key="$api_key",algorithm="hmac-sha256",headers="host date request-line",signature="$signature"

Where, api_key is the APIKey got from the console,
algorithm is an encryption algorithm (supports hmac-sha256 only), and headers are parameters participating in signature. signature is a string obtained by signing the parameter participating in signature using the encryption algorithm and encoding it with base64. See below for details.

*Note: headers are the parameters participating in signature. Please note that they are fixed parameter names ("host date request-line"), instead of the values of these parameters.

  1. The rules for the original field of signature (signature_origin) are as follows:

The original field of signature is composed of three parameters, i.e. host, date and request-line which are concatenated in a certain format. The concatenation format is (\n is a new line character, ’:’ is followed by a space):

host: $host\ndate: $date\n$request-line

Suppose

Request url = wss://iat-api-sg.xf-yun.com/v2/iat
date = Wed, 10 Jul 2019 07:35:43 GMT

Then, the original field of signature (signature_origin) is:

host: iat-api.xfyun.cn
date: Wed, 10 Jul 2019 07:35:43 GMT
GET /v2/iat HTTP/1.1
  1. Sign the signature_origin using the hmac-sha256 algorithm in combination with apiSecret, to obtain the signed summary, i.e. signature_sha.
signature_sha=hmac-sha256(signature_origin,$apiSecret)

Where, apiSecret is the APISecret got from the console.

5)Encode the signature_sha with base64 to get the final signature.

signature=base64(signature_sha)

Suppose

APISecret = secretxxxxxxxx2df7900c09xxxxxxxx	
date = Wed, 10 Jul 2019 07:35:43 GMT

Then, signature is

signature=Hp3Ty4ZkSBmL8jKyOLpQiv9Sr5nvmeYEH7WsL/ZO2Jg=

6)According to the above information, concatenate the string of authorization before it is encoded with base64(authorization_origin). See the example below.

api_key="keyxxxxxxxx8ee279348519exxxxxxxx", algorithm="hmac-sha256", headers="host date request-line", signature="Hp3Ty4ZkSBmL8jKyOLpQiv9Sr5nvmeYEH7WsL/ZO2Jg="

*Note: headers are the parameters participating in signature. Please note that they are fixed parameter names ("host date request-line"), instead of the values of these parameters.

  1. Finally, encode the authorization_origin with base64 to get the authorization parameter.
authorization = base64(authorization_origin)
Example:
authorization=YXBpX2tleT0ia2V5eHh4eHh4eHg4ZWUyNzkzNDg1MTlleHh4eHh4eHgiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0iSHAzVHk0WmtTQm1MOGpLeU9McFFpdjlTcjVudm1lWUVIN1dzTC9aTzJKZz0i

Examples of Authentication (golang)

    //@hosturl :  like  wss://iat-api-sg.xf-yun.com/v2/iat
    //@apikey : apiKey
    //@apiSecret : apiSecret
    func assembleAuthUrl(hosturl string, apiKey, apiSecret string) string {
        ul, err := url.Parse(hosturl)
        if err != nil {
            fmt.Println(err)
        }
        //Signing date
        date := time.Now().UTC().Format(time.RFC1123)
        //fields participating in signature: host ,date and request-line
        signString := []string{"host: " + ul.Host, "date: " + date, "GET " + ul.Path + " HTTP/1.1"}
        //String consisting of concatenated signatures
        sgin := strings.Join(signString, "\n")
        //Signature results
        sha := HmacWithShaTobase64("hmac-sha256", sgin, apiSecret)
        //Construct request parameters,and urlencoding is not required now
        authUrl := fmt.Sprintf("api_key=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"", apiKey,
            "hmac-sha256", "host date request-line", sha)
        //Encode the request parameters with base64
        authorization:= base64.StdEncoding.EncodeToString([]byte(authUrl))
        v := url.Values{}
        v.Add("host", ul.Host)
        v.Add("date", date)
        v.Add("authorization", authorization)
        //Add the encoded string url encode after url
        callurl := hosturl + "?" + v.Encode()
        return callurl
    }

Authentication Results

If the handshake succeeds, an HTTP 101 status code will be returned, indicating that the protocol upgrade is successful; if the handshake fails, different HTTP Code status codes will be returned, depending on the type of the error, along with the error message. See the detailed description of errors in the table below.

HTTP Code Description Error Message Solution
401 Request parameters of authorization are not available. {“message”:”Unauthorized”} Check if authorization parameters are available. Refer to [Authorization Parameters](#Authorization Parameters)
401 The parsing of signature parameters fails. {“message”:”HMAC signature cannot be verified”} Check if each signature parameter is available and correct, and if the copied api_key is correct.
401 The signature authentication fails. {“message”:”HMAC signature does not match”} There are many possible reasons for failure of signature authentication.
1. Check if api_key,api_secret is correct.
2. Check if the parameters, i.e. host, date and request-line required for the calculation of signature are concatenated according to the protocol requirements.
3. Check if the base64 length of signature is normal (normally 44 bytes).
403 The authentication of clock skew fails. {“message”:”HMAC signature cannot be verified, a valid date or x-date header is required for HMAC Authentication”} Check if the server time is standard. This error is reported when the deviation is more than 5 minutes.

Example of returned messages upon failure of handshake:

    HTTP/1.1 401 Forbidden
    Date: Thu, 06 Dec 2018 07:55:16 GMT
    Content-Length: 116
    Content-Type: text/plain; charset=utf-8
    {
        "message": "HMAC signature does not match"
    }

Interface Data Transmitting and Receiving

After the handshake succeeds, the Websocket connection will be established between the client and the server-side, through which, the client can upload and receive data simultaneously.
When the server-side has a recognition result, it will push the result to the client through the Websocket connection.
When data is being sent, if the interval is too short, it may cause the engine’s incorrect recognition.
It is recommended that the interval of sending audios should be 40ms, and the number of audio bytes (i.e. the frameSize in the java demo)sent each time is an integer multiple of the audio size of one frame.

//The connection succeed, start sending data.
int frameSize = 1280; //for the integer multiple of the audio size of each frame,note that different audio formats have different numbers of bytes of one audio frame. Refer to the recommendations below
int intervel = 40;
int status = 0;  // audio status
try (FileInputStream fs = new FileInputStream(file)) {
    byte[] buffer = new byte[frameSize];
    // sending audios

Please note that the numbers of bytes of one frame in different audio formats are different, we recommend:

  1. For the uncompressed PCM format, the interval for sending audios should be 40ms, and1280B audio bytes is sent each time;
  2. For the iFLYTEK custom speex format, the interval for sending audios should be 40ms. If the compression level of 16k is 7, then the audio should be sent by an integer multiple of 61B each time;
  3. For the standard open source speex format, the interval for sending audios should be 40ms. If the compression level of 16k is 7, then the audio should be sent by an integer multiple of 60B each time;
iFLYTEK custom speex (compression level) 0 1 2 3 4 5 6 7 8 9 10
speex 8k 7 11 16 21 21 29 29 39 39 47 63
speex-wb 16k 11 16 21 26 33 43 53 61 71 87 107
iFLYTEK custom speex (compression level) 0 1 2 3 4 5 6 7 8 9 10
speex 8k 6 10 15 20 20 28 28 38 38 46 62
speex-wb 16k 10 15 20 25 32 42 52 60 70 86 106

The duration of the entire session can last up to 60s, or if no data is sent for more than 10s, the server-side actively close the connection. After the data upload is completed, the client should upload the end-of-data marker once to indicate the session is ended. See the description of data parameter below for details.

Request Parameters

All the request data is json string.

Parameter Name Type Required Description
common object Yes Common parameters, which are only uploaded during the first frame request after a successful handshake. See below for more information.
business object Yes Business parameters, which are only uploaded during the first frame request after a successful handshake. See below for more information.
data object Yes Business data stream parameters, which should be uploaded during all requests after a successful handshake. See below for more information.
Description of Common Parameters

common

Parameter Name Type Required Description
app_id string Yes APPID information applied from the platform
Business Parameter

business

Parameter Name Type Required Description Example
language string Yes Language
zh_cn: Chinese (supports simple English recognition)
en_us: English
ja_jp: Japanese
ko_kr: Korean
ru-ru: Russian
fr_fr: French
es_es: Spanish
th_TH: Thai
vi_VN: Vietnamese
de_DE: German
ar_il: Arabic
bg_bg: Bulgarian
Note: Any minority language cannot be used without authorization, and an error 11200 will be reported. In this case, you can add it at console-Short Form ASR (Automatic Speech Recognition)-dialect / language for trial or purchase. Besides, the interface for minority languages is different from that for Chinese and English. Refer to the [Requirements for the Interface](#Requirements for the Interface) for more information;
"zh_cn"
domain string Yes Application field
iat: daily expressions
medical: medical field
Note:The medical field cannot be used without authorization, you can add it at console-Short Form ASR (Automatic Speech Recognition) (streaming version)-advanced features for trial or purchase; if this parameter is set in the unauthorized state, no error will be reported, but the setting will not be valid.
"iat"
accent string Yes Dialect, which is only supported when language is Chinese currently; mandarin: Mandarin Chinese, other languages;Other dialects: which can be added at console-Short Form ASR (Automatic Speech Recognition)-dialect / language for trial or purchase, and the parameter value of the dialect will be displayed after it is added; the dialect cannot be used without authorization, an error 11200will be reported. "mandarin"
vad_eos int No Used to set the silence period of endpoint detection, in millisecond, and the value is 2000 by default.
It means that the engine will believe the audio ends after the silence period.
3000
dwa string No Dynamic correction (supported in Chinese Mandarin only)
wpgs: used to enable the streaming result return feature
Note: This expanded feature cannot be used without authorization, you can activate it at console-Short Form ASR (Automatic Speech Recognition) -advanced features for free; if this parameter is set in the unauthorized state, no error will be reported, but the setting will not be valid.
"wpgs"
pd string No ((Chinese is supported only) Field personalized parameters
game: game
health: health
shopping: shopping
trip: travel
Note: This expanded feature cannot be used without authorization, you can add it at console-Short Form ASR (Automatic Speech Recognition) -advanced features for trial or purchase;if this parameter is set in the unauthorized state, no error will be reported, but the setting will not be valid.
"game"
ptt int No Whether to enable the addition of punctuation(supported in Chinese only)
1: Enabled (default)
0: Disabled
0
rlang string No Font(supported in Chinese only)
zh-cn: Simplified Chinese (default)
zh-hk: Chinese-Hong Kong
Note: This Chinese-HongKong feature cannot be used without authorization, you can activate it at console-Short Form ASR (Automatic Speech Recognition)-advanced features if this parameter is set as Chinese-Hong Kong in the unauthorized state, no error will be reported, but it will not be valid.
"zh-cn"
vinfo int No Deviation value of starting and ending endpoint frames corresponding to the returned clause result. The deviation value of endpoint frames indicates the length of the frame that has passed from the beginning of the audio.
0: Disabled (default)
1: Enabled
The data.result.vad field will be added to the returned result after it is enabled. See the returned results below for details. Note: If the dynamic correction feature is activated and used, this feature cannot be used.
1
nunum int No Ruling the number format of the returned result to be Arabic number format (supported in Chinese, English and Japanese) , which is enabled by default
0: Disabled
1: Enabled
0
speex_size int No Length of speex audio frame, only used for speex audio
1.It must be specified when the speex encoding is the standard open source speex encoding;
2. It should not be set when the speex encoding is iFLYTEK custom speex encoding;
70
nbest int No Value range [1,5], by setting this parameter, multiple candidate results of sentences can be obtained when their pronunciations are similar. Setting multiple candidates affects performance, and the response time is delayed by about 200ms.
Note: This expanded feature cannot be used without authorization, you can activate it at console-Short Form ASR (Automatic Speech Recognition)-advanced features for free; if this parameter is set in the unauthorized state, no error will be reported, but the setting will not be valid.
3
wbest int No Value range [1,5], by setting this parameter, multiple candidate results of words and expressions can be obtained when their pronunciations are similar. Setting multiple candidates can affect performance, and the response time is delayed by about 200ms.
Note: This expanded feature cannot be used without authorization, you can activate it at console-Short Form ASR (Automatic Speech Recognition)-advanced features for free; if this parameter is set in the unauthorized state, no error will be reported, but the setting will not be valid.
5

Note: The multi-candidate effect depends on the engine, but it is not absolute. Even if multiple candidates are set, if the engine fails to recognize the candidate words or sentences, the returned result is still single.
Note: The above common and business parameters only need to be attached during the first frame request after a successful handshake.

Business Data Stream Parameters

data

Parameter Name Type Required Description
status int Yes Audio status
0: first frame audio
1: middle audio
2: Last frame audio, the last frame must be sent
format string Yes Audio sampling rate:16k and 8k
16k audio:audio/L16;rate=16000
8k audio:audio/L16;rate=8000
encoding string Yes Audio data format
raw: native audio (supports mono pcm)
speex: speex-compressed audio (8k)
speex-wb: speex-compressed audio (16k)
Please note that it must also be a pcm with a sampling rate of 16k or 8k mono before compression.
lame: mp3 format (supported in Chinese Mandarin and English only, and not supported in dialects and minority languages currently)
audio string Yes Audio content, encoded with base64

Examples of request parameters:

    {  
        "common":{
           // common request parameters
           "app_id":"123456"  
        },
        "business":{
            "language":"zh_cn",
            "domain":"iat",
            "accent":"mandarin"
        },
        "data":{
                "status":0,
                "format":"audio/L16;rate=16000",
                "encoding":"raw",
                "audio":"exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf..."    
        }
    }

Example of end-of-data-upload marker:

    {
    "data":{
      "status":2
        }
    }

Returned Parameters

Parameter Type Description
sid string Current session id, which is only returned during the first frame request after a successful handshake
code int Returned code, 0 means success, and any other code means failure. Refer to[Error Codes](#Error Codes)
message string Error Message
data object ASR results
data.status int Marker indicating whether the recognition result is ended:
0: the first-block recognition result
1: the intermediate recognition result
2: the last-block recognition result
data.result object ASR recognition result
data.result.sn int Serial number of the returned result
data.result.ls bool Whether it is the last-slice result.
data.result.bg int Reserved field, no care is required
data.result.ed int Reserved field, no care is required
data.result.ws array ASR result
data.result.ws.bg int Deviation value of the starting endpoint frame, in frame (1 frame = 10ms).
Please note that this field is only valid when vinfo = 1 is set
data.result.ws.cw array Chinese segmentation
data.result.ws.cw.w string Words and expressions
data.result.ws.cw.other fields
sc/wb/wc/we/wp
int/string Reserved fields, no care is required
Returned Parameters of Dynamic Correction

If the dynamic correction feature is enabled and dwa = wpgs is set (supported in Chinese only), the following fields are also returned:
*Note: For the parsing of dynamic correction results, please refer to the java demo at the bottom of the page. *

Parameter Type Description
data.result.pgs string This field appears when wpgs is enabled.
When the value is "apd", it means that this slice of result is the final result added to the previous result; when the value is "rpl", it means that it replaces part of the previous result , and the replacement range is the rg field
data.result.rg array Replacement range, and this field when wpgs is enabled
Suppose the value is [2,5], it means that the results to be replaced are those from the second to fifth returns
Vinfo returned parameters

If vinfo = 1 is set, the following fields are also returned (if dwa = wpgs is enabled and set as well, vinfo will be invalid)::

Parameter Type Description
data.result.vad object Information of endpoint frame deviation value
data.result.vad.ws array Result of endpoint frame deviation value
data.result.vad.bg int Deviation value of the starting endpoint frame, in frame (1 frame = 10ms)
data.result.vad.ed int Deviation value of the ending endpoint frame, in frame (1 frame = 10ms)
data.result.vad.eg number No care is required

Example of returned parameter (dynamic correction dwa = wpgs)
Note: For the parsing of dynamic correction results, please refer to the java demo at the bottom of the page

	{
	  "code": 0,
	  "message": "success",
	  "sid": "iatxxxxxxxxxxxxx",
	  "data": {
	    "result": {
	      "bg": 0,
	      "ed": 0,
	      "ls": false,
	      "pgs": "rpl",
	      "rg": [
	        1,
	        1
	      ],
	      "sn": 2,
	      "ws": [
	        {
	          "bg": 0,
	          "cw": [
	            {
	              "sc": 0,
	              "w": "test"
	            }
	          ]
	        },
	        {
	          "bg": 0,
	          "cw": [
	            {
	              "sc": 0,
	              "w": "test"
	            }
	          ]
	        }
	      ]
	    },
	    "status": 1
	  }
	}

Example of returned parameters (vinfo=1)

{
  "code": 0,
  "message": "success",
  "sid": "iatxxxxxxxxxxxxxx",
  "data": {
    "result": {
      "bg": 0,
      "ed": 0,
      "ls": false,
      "sn": 1,
      "vad": {
        "ws": [
          {
            "bg": 40,
            "ed": 366,
            "eg": 63.58
          }
        ]
      },
      "ws": [
        {
          "bg": 53,
          "cw": [
            {
              "sc": 0,
              "w": "April"
            }
          ]
        },
        {...},
        {
          "bg": 293,
          "cw": [
            {
              "sc": 0,
              "w": "competitor"
            }
          ]
        }
      ]
    },
    "status": 1
  }
}

Examples of returned parameters (multiple sentence candidates nbest)

{
  "code": 0,
  "message": "success",
  "sid": "iatxxxxxxxxxxxxx",
  "data": {
    "result": {
      "bg": 0,
      "ed": 0,
      "ls": false,
      "sn": 1,
      "ws": [
        {
          "bg": 35,
          "cw": [
            {
              "sc": 0,
              "w": "打电话给梁玉生"
            },
            {
              "sc": 0,
              "w": "打电话给梁玉升"
            }
          ]
        }
      ]
    },
    "status": 0
  }
}

Example of returned parameters (word level multiple candidate wbest)

{
  "code": 0,
  "message": "success",
  "sid": "iatxxxxxxxxxxxxxx",
  "data": {
    "result": {
      "bg": 0,
      "ed": 0,
      "ls": false,
      "sn": 1,
      "ws": [
        {...},
        {
          "bg": 159,
          "cw": [
            {
              "sc": 0,
              "w": "梁"
            }
          ]
        },
        {
          "bg": 191,
          "cw": [
            {
              "sc": 0,
              "w": "玉"
            },
            {
              "sc": 0,
              "w": "育"
            }
          ]
        },
        {
          "bg": 215,
          "cw": [
            {
              "sc": 0,
              "w": "生"
            },
            {
              "sc": 0,
              "w": "升"
            }
          ]
        }
      ]
    },
    "status": 0
  }
}

Error Code

Error Code Error Message Description Solution
10005 licc fail APPID authorization fails Verify if APPID is correct and if the ASR service is activated.
10006 Get audio rate fail Failure in getting a certain parameter Check if the parameters in the error message is correctly uploaded.
10007 get invalid rate Invalid parameter value Check if the parameter value in the error message is within the value range
10010 AIGES_ERROR_NO_LICENSE Insufficient authorization for engine Please submit a work order to the console to contact the technical personnel
10014 AIGES_ERROR_TIME_OUT Session timeout
10019 service read buffer timeout, session timeout session timeout Check if the connection is not closed after the data is completely sent.
10043 Syscall AudioCodingDecode error Audio decoding fails Check the aue parameter, if it is speex, please make sure the audio is speex audio, and compressed by segment, and consistent with the frame size
10101 engine inavtive Engine session ended. Check if the engine has ended the session but the client is still sending data. For example, the audio data is completely sent but the websocket connection is not closed and empty audio etc. is being sent.
10114 session timeout Session timeout Check if the entire session has be timeout for 60s
10139 invalid param Parameter error Engine decoding error
10313 appid cannot be empty APPID cannot be empty Check if the common parameter is uploaded correctly, or if the app_id parameter in common is uploaded correctly or is empty
10317 invalid version Invalid version Connect the technical personnel
11200 auth no license Not authorized Check if an unauthorized feature is used, or the total number of calls has exceeded the upper limit
11201 auth no enough license Daily flow beyond the control value Contact the business department to increase the number of daily calls
10160 parse request json error Invalid request data format Check if the request data is a valid json
10161 parse base64 string error base64 decoding fails Check if the sent data uses base64 encoding
10163 param validate error:/common 'app_id' param is required Required parameters missing or invalid parameters Check if the parameter in the error message is correct.
10200 read data timeout read data timeout Check if the accumulated time without data being sent has been 10s and the connection has not been closed yet.

Demos

Short Form ASR (Automatic Speech Recognition) Stream API demo java

Short Form ASR (Automatic Speech Recognition) Stream API demo python3

Short Form ASR (Automatic Speech Recognition) Stream API demo go

Short Form ASR (Automatic Speech Recognition) Stream API demo nodejs

Short Form ASR (Automatic Speech Recognition) Stream API demo C#

Q&A

Where can I find the APIKey of Short Form ASR (Automatic Speech Recognition)?

Answer: Find the Short Form ASR (Automatic Speech Recognition)(stream) service of the corresponding application at console-My Application, and then you can see APIKey.

How many concurrent channels does the Short Form ASR (Automatic Speech Recognition) Web api support?

Answer: 50 channels are supported by default.

WebAPI streaming ASR can get the empty Short Form ASR (Automatic Speech Recognition) result, or wrong content or incomplete result, what is the reason for that?

Answer: The reasons may be as follows
1、The audio format is incorrect, please use the Cool Edit Pro tool (to be downloaded from the webpage) to view the audio format, WebAPI ASR stream version: the supported formats are pcm, speex and speex-wb;
The audio sampling rate should be 16k or 8k, the sampling accuracy should be 16-bit, and the audio should be mono. 2、There is mute or noise audio in the middle of the audio that exceeds the setting of the rear endpoint (2000ms by default if not set), please use the Cool Edit Pro tool to view the audio content, and set the rear endpoint (vad_eos) to a maximum value of 10000ms.
Incomplete audio recognition is normal when there is silence or noise exceeding the maximum value of the rear endpoint.

Contributing to the Documentation

Is something missing/incorrect? Please let us know by contacting openplatform@iflytek.com. If you know how to fix it straight away, don’t hesitate to create a request to help us improve our document.