Real-time ASR(Automated Speech Recognition)

Description of the Interface

Real-time ASR based on the deep fully convolutional neural network framework (DFCNN), converting audio stream data (within 5 hours) into text stream data results in real-time. It provides a basis for information processing and data mining, which is more suitable for daily dialogue.

  • Supported Audio Type: Mono 16bit inter PCM
  • Supported Audio Sampling Rate: 16k

Interface Demo

Interface demo Please click Here to download。 At present, only the demo for some development languages is provided. For other languages, please refer to the interface document below.

Requirements for the Interface

The following requirements should be met if the stream API for Real-time ASR online is integrated.

Content Description
Request Protocol ws[s]
Request URL ws://ist-api-sg.xf-yun.com/v2/ist
Request Line GET /v2/ist HTTP/1.1
Interface Authentication Signature mechanism, refer to Interface Authentication below for details.
Character Encoding UTF-8
Response Format Unified JSON
Development Language Any language which allows to make Websocket requests for iFLYTEK cloud services.
Operation System Any operation system
Audio Attributes Sampling rate: 16k ; Bit length: 16bit; Mono
Audio Format pcm
Audio Length Within 5 hours

Interface Calling Flow

  • Calculate the signature based on hmac-sha256 through the interface key, and send the Websocket protocol handshake request to the server-side. Refer to Interface Authentication below for details.
  • After the handshake succeeds, the client can upload and receive data simultaneously through the WebSocket connection. After the data is completely uploaded, the client should upload the end-of-data marker once. Refer to Interface Data Transmitting and Receiving below.
  • Disconnect the Websocket connection after receiving all the returned indicators of the results from the server-side.

*Note: Observe the following cautions when using Websocket

  1. The WebSocket version supported by the server-side is 13. Ensure the framework used by the client supports this version.
  2. The type of all frames returned by the server-side is TextMessage, which corresponds to opcode = 1 in the protocol frame of the native WebSocket. Please ensure that the frame type parsed by the client must be this type. Otherwise, try to upgrade the client frame version or replace the technical framework.
  3. If there is a framing problem, namely, a JSON data packet is returned to the client by multiple frames, causing the client’s failure in parsing the JSON. In most cases, this is because the client’s framework has a problem in parsing the WebSocket protocol. Therefore, try to upgrade the framework version first, or replace the technical framework.
  4. If it is necessary to close the connection after the client session is over, try to ensure that the error code sent to the server-side is the error code 1000 of WebSocket (if clients’ frameworks do not provide an interface for transmitting error code when closing, there is no need to pay attention to this requirement).

Interface Authentication

In the handshake phase, the requester is required to sign the request, and the server-side verifies the legitimacy of the request through the signature.

Authentication Method

By adding authentication-related parameters after the request URL. Sample URL:

ws[s]://ist-api-sg.xf-yun.com/v2/ist?authorization=aG1hYyB1c2VybmFtZT0iMTAwSU1FIiwgYWxnb3JpdGhtPSJobWFjLXNoYTI1NiIsIGhlYWRlcnM9Imhvc3QgZGF0ZSByZXF1ZXN0LWxpbmUiLCBzaWduYXR1cmU9IlVSbnk4M3o1elJsNWF1ODl1YXhUL1dGdUtWejZVNkdkWDdDV25SMGdueWc9Ig%3D%3D&date=Tue%2C+18+Dec+2018+09%3A08%3A49+UTC&host=ist-api-sg.xf-yun.com

Authentication Parameters:

Parameter Type Required Description Example
host string Yes Request Host ist-api.xfyun.cn
date string Yes Current timestamp, in RFC1123 format Wed, 10 Jul 2019 07:35:43 GMT
authorization string Yes Information related to the signature encoded with base64 (the signature is calculated based on hmac-sha256).

· Rules for Generation of Data Parameters

The date must be based on UTC+0 or GMT zone, in RFC1123 format (Wed, 10 Jul 2019 7:35:43 AM GMT). The server-side will check the clock skew for Date, allowing the maximum deviation of 300 seconds. Once such value is exceeded, the request will be rejected.

· Rules for Generation of Authorization Parameters

1)Get the interface keys APIKey and APISecret. They are both 32-bit strings and can be viewed after creating a WebAPI platform application and adding Real-time ASR (Automatic Speech Recognition) service at the console of the iFLYTEK Open Platform.

2)The format of the parameter authorization before encoded with base64(authorization_origin) is as follows.

api_key="$api_key",algorithm="hmac-sha256",headers="host date request-line",signature="$signature"

Besides, api_key is the APIKey got from the console, the algorithm is an encryption algorithm (supports hmac-sha256 only), and headers are parameters participating in signature. A signature is a string obtained by signing the parameter participating in signature using the encryption algorithm and encoding it with base64. See below for details.

*Note: Headers are the parameters participating in signature. Please note that they are fixed-parameter names ("host date request-line"), instead of the values of these parameters.

3)The rules for the original field of signature (signature_origin) are as follows:

The original field of signature is composed of three parameters, i.e. host, date, and request-line which are concatenated in a certain format. The concatenation format is (\n is a line break character, ’:’ is followed by a space)::

host: $host\ndate: $date\n$request-line

Suppose

Requested url = wss://ist-api-sg.xf-yun.com/v2/ist
date = Fri, 25 Feb 2022 03:01:13 GMT

Then, the original field of signature (signature_origin) is:

host: ist-api-sg.xf-yun.com
date: Fri, 25 Feb 2022 03:01:13 GMT
GET /v2/ist HTTP/1.1

4)Sign the signature_origin using the hmac-sha256 algorithm in combination with apiSecret, to obtain the signed summary, i.e. signature_sha.

signature_sha=hmac-sha256(signature_origin,$apiSecret)

Where, apiSecret is the APISecret obtained from the console.

5)Encode the signature_sha with base64 to get the final signature.

signature=base64(signature_sha)

Suppose

APISecret = e6d4824ba9xxxxxxff2b66f7c6738ead	
date = Fri, 25 Feb 2022 03:01:13 GMT

Then, the signature will be

signature=Vcban+QQerK4GVKqGjmx2ZolNtoZUl808/DgrfGB/c8=

6)According to the above information, concatenate the string of authorization before it is encoded with base64(authorization_origin). See the example below:

api_key="4c18179638d2e487b50f3cfd129ffaca", algorithm="hmac-sha256", headers="host date request-line", signature="Vcban+QQerK4GVKqGjmx2ZolNtoZUl808/DgrfGB/c8="

Note: Headers are the parameters participating in signature. Please note that they are fixed-parameter names ("host date request-line"), instead of the values of these parameters.

7)Finally, encode the authorization_origin with base64 to get the authorization parameter.

authorization = base64(authorization_origin)
Example:
authorization=YXBpX2tleT0iNGMxODE3OTYzOGQyZTQ4N2I1MGYzY2ZkMTI5ZmZhY2EiLCBhbGdvcml0aG09ImhtYWMtc2hhMjU2IiwgaGVhZGVycz0iaG9zdCBkYXRlIHJlcXVlc3QtbGluZSIsIHNpZ25hdHVyZT0iVmNiYW4rUVFlcks0R1ZLcUdqbXgyWm9sTnRvWlVsODA4L0RncmZHQi9jOD0i

Examples of Authentication URL Code

golang

//@hosturl :  like  wss://ws-api.xfyun.cn/v2/iat
//@apikey : apiKey
//@apiSecret : apiSecret
//@method: request method , GET 、POST ...
func assembleAuthUrl(hosturl string,method, apiKey, apiSecret string) string {
    ul, err := url.Parse(hosturl)
    if err != nil {
        fmt.Println(err)
    }
    //Signing Date
    date := time.Now().UTC().Format(time.RFC1123)
    //fields participating in signature: host ,date, request-line
    signString := []string{"host: " + ul.Host, "date: " + date, method + " " + ul.Path + " HTTP/1.1"}
    //String consisting of concatenated signatures
    sgin := strings.Join(signString, "\n")
    //Signature results
    sha := HmacWithShaTobase64("hmac-sha256", sgin, apiSecret)
    //Construct request parameters, and urlencoding is not required now
    authUrl := fmt.Sprintf("api_key=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"", apiKey,
        "hmac-sha256", "host date request-line", sha)
    //Encode the request parameters with base64
    authorization:= base64.StdEncoding.EncodeToString([]byte(authUrl))

    v := url.Values{}
    v.Add("host", ul.Host)
    v.Add("date", date)
    v.Add("authorization", authorization)
    //Add the encoded string url encode after url
    callurl := hosturl + "?" + v.Encode()
    return callurl
}

func HmacWithShaTobase64(algorithm, data, key string) string {
    mac := hmac.New(sha256.New, []byte(key))
    mac.Write([]byte(data))
    encodeData := mac.Sum(nil)
    return base64.StdEncoding.EncodeToString(encodeData)
}

java:

package com.iflytek.webgatews.wsclient;


import okhttp3.*;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.io.IOException;
import java.net.URL;
import java.net.URLEncoder;
import java.nio.charset.Charset;
import java.security.MessageDigest;
import java.text.SimpleDateFormat;
import java.util.*;

/**
 * @Author:sjliu7
 * @Date:2019/7/31 15:23
 */
public class AuthUtils {

    /**
     * Generate URL,websocket interface for authentication
     * @param requestUrl
     * @param apiKey
     * @param apiSecret
     * @return final requestUrl
     */
    public static String assembleRequestUrl(String requestUrl,String method, String apiKey, String apiSecret) {
        URL url = null;
        String  httpRequestUrl = requestUrl.replace("ws://", "http://").replace("wss://","https://" );
        try {
            url = new URL(httpRequestUrl);
            SimpleDateFormat format = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss z", Locale.US);
            format.setTimeZone(TimeZone.getTimeZone("UTC"));
            String date = format.format(new Date());
//            date = "Thu, 19 Dec 2024 07:47:57 GMT";
            String host = url.getHost();
            StringBuilder builder = new StringBuilder("host: ").append(host).append("\n").//
                    append("date: ").append(date).append("\n").//
                    append(method).append(" "). 
                    append(url.getPath()).append(" HTTP/1.1");
            Charset charset = Charset.forName("UTF-8");
            Mac mac = Mac.getInstance("hmacsha256");
            System.out.println(builder.toString());
            SecretKeySpec spec = new SecretKeySpec(apiSecret.getBytes(charset), "hmacsha256");
            mac.init(spec);
            byte[] hexDigits = mac.doFinal(builder.toString().getBytes(charset));
            String sha = Base64.getEncoder().encodeToString(hexDigits);
            String authorization = String.format("hmac username=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"", apiKey, "hmac-sha256", "host date request-line", sha);
            String authBase = Base64.getEncoder().encodeToString(authorization.getBytes(charset));
            return String.format("%s?authorization=%s&host=%s&date=%s", requestUrl, URLEncoder.encode(authBase), URLEncoder.encode(host), URLEncoder.encode(date));

        } catch (Exception e) {
           throw new RuntimeException("assemble requestUrl error:"+e.getMessage());
        }
    }
}

javascript:

function assembleRequestUrl(host,path,apiKey,apiSecret) {
  var url = "wss://"+host+path
  var date = new Date().toGMTString()
  var algorithm = 'hmac-sha256'
  var headers = 'host date request-line'
  var signatureOrigin = `host: ${host}\ndate: ${date}\nGET ${path} HTTP/1.1`
  var signatureSha = CryptoJS.HmacSHA256(signatureOrigin, apiSecret)
  var signature = CryptoJS.enc.Base64.stringify(signatureSha)
  var authorizationOrigin = `api_key="${apiKey}", algorithm="${algorithm}", headers="${headers}", signature="${signature}"`
  var authorization = btoa(authorizationOrigin)
  url = `${url}?authorization=${authorization}&date=${date}&host=${host}`
  return url
}

*python

from datetime import datetime
from wsgiref.handlers import format_date_time
from time import mktime
import hashlib
import base64
import hmac
from urllib.parse import urlencode
import os
import traceback
import json

class AssembleHeaderException(Exception):
    def __init__(self, msg):
        self.message = msg


class Url:
    def __init__(this, host, path, schema):
        this.host = host
        this.path = path
        this.schema = schema
        pass


# calculate sha256 and encode to base64
def sha256base64(data):
    sha256 = hashlib.sha256()
    sha256.update(data)
    digest = base64.b64encode(sha256.digest()).decode(encoding='utf-8')
    return digest


def parse_url(requset_url):
    stidx = requset_url.index("://")
    host = requset_url[stidx + 3:]
    schema = requset_url[:stidx + 3]
    edidx = host.index("/")
    if edidx <= 0:
        raise AssembleHeaderException("invalid request url:" + requset_url)
    path = host[edidx:]
    host = host[:edidx]
    u = Url(host, path, schema)
    return u


# build websocket auth request url
def assemble_ws_auth_url(requset_url, method="GET", api_key="", api_secret=""):
    u = parse_url(requset_url)
    host = u.host
    path = u.path
    now = datetime.now()
    date = format_date_time(mktime(now.timetuple()))
    print(date)
    # date = "Thu, 12 Dec 2019 01:57:27 GMT"
    signature_origin = "host: {}\ndate: {}\n{} {} HTTP/1.1".format(host, date, method, path)
    # print(signature_origin)
    signature_sha = hmac.new(api_secret.encode('utf-8'), signature_origin.encode('utf-8'),
                             digestmod=hashlib.sha256).digest()
    signature_sha = base64.b64encode(signature_sha).decode(encoding='utf-8')
    authorization_origin = "api_key=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"" % (
        api_key, "hmac-sha256", "host date request-line", signature_sha)
    authorization = base64.b64encode(authorization_origin.encode('utf-8')).decode(encoding='utf-8')
    # print(authorization_origin)
    values = {
        "host": host,
        "date": date,
        "authorization": authorization
    }

    return requset_url + "?" + urlencode(values)

Authentication Results

If the handshake successes, an HTTP 101 status code will be returned, indicating that the protocol upgrade is successful; if the handshake fails, different HTTP Code status codes will be returned, depending on the type of the error, along with the error message. See the detailed description of errors in the table below.

HTTP Code Description Error Message Solution
401 Request parameters of authorization are not available {“message”:”Unauthorized”} Check if authorization parameters are available. Refer to Authorization Parameters.
401 The parsing of signature parameters fails. {“message”:”H MAC signature cannot be verified”} Check if each signature parameter is available and correct, and if the copied api_key is correct.
401 The signature authentication fails. {“message”:”HMAC signature does not match”} There are many possible reasons for the failure of signature authentication.
\1. Check if api_key,api_secret is correct.
\2. Check if the parameters, i.e. host, date, and request-line required for the calculation of signature are concatenated according to the protocol requirements.
\3. Check if the base64 length of the signature is normal (normally 44 bytes).
403 The authentication of clock skew fails. {“message”:”HMAC signature cannot be verified, a valid date or x-date header is required for HMAC Authentication”} Check if the server time is standard. This error is reported when the deviation is more than 5 minutes.
403 Authentication of IP Whitelist fails. {"message":"Your IP address is not allowed"} The IP whitelist should be disabled on the console or check whether the IP address set in the IP whitelist is the WAN (external) IP address of the current machine.

Example of returned messages upon failure of handshake:

    HTTP/1.1 401 Forbidden
    Date: Thu, 06 Dec 2018 07:55:16 GMT
    Content-Length: 116
    Content-Type: text/plain; charset=utf-8
    {
        "message": "HMAC signature does not match"
    }

Interface Data Transmitting and Receiving

  • After the handshake succeeds, the WebSocket connection will be established between the client and the server-side, through which, the client can upload and receive data simultaneously. For the uncompressed PCM format, the interval for sending audios should be 40ms, and 1280B audio bytes are sent each time; When the server has the recognition result, it will push the recognition result to the client through the WebSocket connection.
  • End-of-data Marker: The last frame of audio should be ended with the marker,status=2:
{
    "data":{
        ......
        "status":2
    }
}
  • If no data is sent for more than 10s, the server-side actively close the connection.

  • If the connection is closed for any reason during the session, the session ends and unrecoverable. If the client continues to use the transcription service, it needs to re-establish the connection and open a new session.

  • The server will use the status field as the last frame of data. When status=2, it means that all data has been sent, and the client should stop sending data and disconnect immediately after processing the frame. See the details below:

    {
    "data":{
     "result":{..},
     "status":2
    }
    }
    

Request Parameters

All the request data is JSON string.

Parameter Type Required Description
common object Yes Common parameter, which is only uploaded during the first frame request after a successful handshake. See below for more information.
business object Yes Business parameter, which is only uploaded during the first frame request after a successful handshake. See below for more information.
data object Yes Business data stream parameter, which should be uploaded during all requests after a successful handshake. See below for more information.
Description of Common Parameter
Parameter Type Required Description
app_id string Yes Appid message applied from the platform

Description of Business Parameter

Parameter Type Required Description
language string Yes Along with the domain and accent parameters in each engine type. For the detailed relationship, see the parameter list below.
domain string Yes Along with the language and accent parameters in each engine type. For the detailed relationship, see the parameter list below.
accent string yes Along with the language and domain parameters in each engine type. For the detailed relationship, see the parameter list below.
dwa string No Engine extension parameters.
Wpgs: used to enable the streaming result return feature
Note: This expanded feature cannot be used without AppID, you can activate it at the console- Real-time ASR (Automatic Speech Recognition)
punc int No Punctuation control (with punctuation by default). Pass punc=0 to disabled punctuation.
nunum int No Ruling the number format
0: Disabled 1: Enabled
(The default is 1)

List of Parameters:

Languages Parameter accent domain Note
Chinese zh_cn mandarin ist_open sms_ed_open
English en_us mandarin ist_open sms_en_open
Arabic ar_il mandarin ist_huanyu sms_en_open
French fr_fr mandarin ist_open sms_en_open
Indonesian id_id mandarin ist_hy sms_en_open
Thai th_TH mandarin ist_open sms_en_open
Vietnamese vi_vn mandarin ist_open sms_en_open
  • Data Parameter Description:
Parameter Name Type Required Description
frame_id int No It is used to mark the audio frame serial number, which is accumulated from 1. If a server error is received, it will be reset. It must be passed when using the breakpoint resume feature.
status int Yes Audio status
0: first frame audio
1: middle audio
2: Last frame audio
format string Yes Audio sampling rate: 16k audio/L16;rate=16000. Real-time ASR only supports 16k.
encoding string Yes raw
audio string Yes Audio content, encoded with base64.

Example of Returned Parameter

Default (enable wpgs)

{
  "code": 0,
  "message": "success",
  "sid": "iat000704fa@dx16ade44e4d87a1c802",
  "context_id":"135e2a1c",
  "data": {
    "result": {
      "bg": 0,
      "ed": 0,
      "ls": false,
      "pgs": "rpl",
      "rg": [
        1,
        1
      ],
      "sn": 2,
      "ws": [
        {
          "bg": 0,
          "cw": [
            {
              "sc": 0,
              "w": "Hello"
            }
          ]
        },
        {
          "bg": 0,
          "cw": [
            {
              "sc": 0,
              "w": "Great"
            }
          ]
        }
      ]
    },
    "status": 1
  }
}

Returned Parameters Description

Parameter Type Description
sid string Unique ID for each session
context_id string Business context ID. It uniquely identifies a complete session (including successful reconnection requests).
code int Error code, 0 indicates success.
message string Error Massage
data object Recognition Results
data.status int Marker indicating whether the recognition result is ended:
1: Recognition Processing
2: Recognition Completed
data.result object Recognition Result
data.result.sn int Serial number of the returned result data.
data.result.pgs string This field appears when wpgs is enabled. When the value is "apd", it means that this slice of result is the final result added to the previous result; when the value is "rpl", it means that it replaces part of the previous result, and the replacement range is the rg field.
data.result.rg array Replacement range, and this field when wpgs is enabled.
data.result.ls bool Whether it is the last-slice result.
  • 10163: Parameter verification failed, caused by client-side. Please change the request parameters according to the description in the returned message.
  • 10313: The first frame of the request parameter failed to transmit app_id, or the transmitted app_id does not match api_key. Check if the common parameter is uploaded correctly, or if the app_id parameter in common is uploaded correctly or is empty.
  • 10043: Audio decoding fails. Please ensure that the transmitted audio encoding format is consistent with the requested parameter.
  • 11201: The usage of the interface has exceeded the purchased maximum limit, please continue to use it after re-purchase