Speech to Text | API Documentation

Speech to Text API

POSThttps://stt.infer.nt-ai.cloud/predict

Header

X-API-Key string required

Your API key

multipart/form-data

form-data body

Request Body

files Filerequired
Audio raw files in a form of multi-part form data using the key name files.
The maximum size for each file is 50 MB.
The duration of each file should less than 1 minute.

optional

Send with the form of multi-part form data

files_speakers file

Speaker file Maximum 5 files can be provided. Each file size must not exceed 20 MB.

boosting_words string

Enhances recognition accuracy for specific words. Maximum 10 words can be provided. e.g., สวัสดี

Responses

application/json

Schema

Example (from schema)

Array [

An array of the transcription proces.

object

status string

success | failed

The status of the transcription process.

duration string

The total duration of the audio file in seconds (e.g., 20.856).

filename string

File name

result Array [

An array containing the individual transcript segments. Each object in this array represents a segment of the transcription.

object

start_time string

The start time of the segment in the audio file, in seconds (e.g., 1.6382252559726962).

end_time string

The end time of the segment in the audio file, in seconds (e.g., 1.6382252559726962).

speaker string

The identifier of the speaker in SPEAKER_{number} format

transcript string

The transcribed text

]

[
  {
    "filename": "Record.wav",
    "status": "success",
    "result": [
      {
        "speaker": "SPEAKER_00",
        "transcript": "วิสัย",
        "start_time": 1.6382252559726962,
        "end_time": 3.9761092150170647
      },
      {
        "speaker": "SPEAKER_00",
        "transcript": "บริษัทผู้พัฒนาแพลตฟอร์ม มีเป้าหมายหลักในการเป็นศูนย์กลางการให้บริการปัญญาประดิษฐ์",
        "start_time": 4.658703071672355,
        "end_time": 20.870307167235495
      }
    ],
    "duration": 20.856
  }
]

Speech to Text API​

Speech to Text API