Speech-to-Text API Documentation

高质量的语音识别服务，支持多种音频/视频格式的转写功能

Documentation Navigation

Introduction

Official Documentation

The Speech-to-Text API provides high-quality speech recognition services, supporting transcription of various audio/video formats. This documentation details how to perform speech-to-text operations through the API.

Service Principles

Important Note

Important Note: The file transcription service processes tasks submitted through the API on a best-effort basis. After submission, the task will enter a queue (PENDING) status. Queue time depends on queue length and file duration, typically within a few minutes. Once processing begins, speech recognition will complete at hundreds of times accelerated speed.

API Basic Information

API Base URL:
https://api.speech-to-text.cn/api/v1/
Authentication Method:
API Key (X-API-Key Header)
Supported Formats:
Various audio and video formats
Recommended Method:
Strongly recommend using URL-based asynchronous submission, especially for larger files

API Key Authentication

All API requests require an API key in the HTTP header for authentication:

X-API-Key: your_api_key_here

If the API key is invalid or disabled, the system will return a 401 error.

API Endpoints

1. Asynchronous Transcription - URL Method (Recommended)

POST

URL

https://api.speech-to-text.cn/api/v1/recognition/async/

Headers

X-API-Key: {{api_key}}
Content-Type: application/json

Body (raw - JSON)

{
  "url": "https://example.com/audio.mp3"
}

(200 OK)

{
  "code": 200,
  "message": "Task submitted successfully",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "dashscope_task_id": "dsope-12345678",
    "task_status": "DOWNLOADING",
    "submit_time": "2025-03-16 10:30:45.123",
    "consumed_time": 0.123,
    "message": "DOWNLOADING"
  }
}

2. Asynchronous Transcription - File Upload Method

POST

URL

https://api.speech-to-text.cn/api/v1/recognition/async/

Headers

X-API-Key: {{api_key}}

Body (form-data)

Key	Value	描述
file	[Select File]	Audio or video file

Same as above, returns task ID and status.

3. Synchronous Transcription - URL Method (Recommended only for short audio)

POST

URL

https://api.speech-to-text.cn/api/v1/recognition/sync/

Headers

X-API-Key: {{api_key}}
Content-Type: application/json

Body (raw - JSON)

{
  "url": "https://example.com/short-audio.mp3"
}

(200 OK)

{
  "code": 200,
  "message": "Recognition successful",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "task_status": "SUCCEEDED",
    "submit_time": "2025-03-16 10:30:45.123",
    "end_time": "2025-03-16 10:31:15.456",
    "duration": 30000,
    "transcription_text": "This is the recognized text content.",
    "consumed_points": 1,
    "consumed_time": 30.123
  }
}

4. Synchronous Transcription - File Upload Method (Recommended only for short audio)

POST

URL

https://api.speech-to-text.cn/api/v1/recognition/sync/

Headers

X-API-Key: {{api_key}}

Body (form-data)

Key	Value	描述
file	[Select File]	Audio or video file

Same as above, returns recognition results.

5. Query Transcription Results

GET

URL

https://api.speech-to-text.cn/api/v1/recognition/status/

Headers

X-API-Key: {{api_key}}

Query Params

Key	Value	描述
task_id	550e8400-e29b-41d4-a716-446655440000	Task ID

(200 OK)

{
  "code": 200,
  "message": "Query successful",
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "dashscope_task_id": "dsope-12345678",
    "task_status": "SUCCEEDED",
    "submit_time": "2025-03-16 10:30:45.123",
    "end_time": "2025-03-16 10:31:15.456",
    "duration": 30000,
    "transcription_text": "This is the recognized text content.",
    "consumed_points": 1,
    "consumed_time": 0.056
  }
}

Task Status Explanation

Task status will go through the following stages during the entire processing flow:

DOWNLOADING: System is downloading or processing the file
PENDING: File is ready, task is queued for processing
SUCCEEDED: Transcription task completed successfully
FAILED: Transcription task failed

Supported Audio Formats

The system supports the following audio and video formats:

.aac, .amr, .avi, .flac, .flv, .m4a, .mkv, .mov, .mp3, .mp4, .mpeg, .ogg, .opus, .wav, .webm, .wma, .wmv

Optimization Suggestions

1 Automatic Audio Extraction

The system will automatically extract audio from video files, and perform compression and optimization, without requiring manual processing by users.

2 Large File Processing Strategy

For large files, strongly recommend using URL-based asynchronous submission
Avoid directly uploading large files, prefer using accessible URLs
Large file processing may take longer, please be patient

Error Code Explanation

错误码	说明
200	Success
1001	Parameter error
1002	Unauthorized (invalid or disabled API key)
1003	Access forbidden
1004	Resource not found
429	Request frequency limit
1005	Server error
1006	Insufficient balance

Usage Notes

1 Interface Selection Recommendations

For audio under 5 minutes: Synchronous interface can be used
For audio over 5 minutes: Strongly recommended to use asynchronous interface
For any larger files, always recommend using asynchronous interface

2 Asynchronous Processing Flow

Call the asynchronous interface to submit a task
Save the returned task_id
Use task_id to periodically query results (recommended interval 10-30 seconds)
When status is SUCCEEDED or FAILED, process the final result

3 URL Requirements

Ensure the URL is publicly accessible, our servers need to download the content
URLs should point directly to media files, supports automatic extraction from mainstream video platforms

4 File Size Limitations

Maximum support for 12GB audio/video files
Maximum support for 12-hour audio duration

Frequently Asked Questions

Question: How to obtain an API key?

Answer: Create a new API key in the API Key Management page of your user dashboard.

Question: How is billing done?

Answer: Registration provides a 5-minute free quota. After that, billing is based on audio duration, with less than 1 minute counted as 1 minute. Price is approximately $0.00139/minute.

Question: What audio formats are supported?

Answer: Multiple audio and video formats are supported, including: .aac, .amr, .avi, .flac, .flv, .m4a, .mkv, .mov, .mp3, .mp4, .mpeg, .ogg, .opus, .wav, .webm, .wma, .wmv

Question: What URL formats are supported?

Answer: Supports automatic extraction from mainstream video platforms like TikTok, Instagram, as well as regular file URLs.

Question: Will I be charged if extraction fails?

Answer: We do not charge any fees for failed extractions, feel free to use the service.

Question: How to handle large file transcriptions?

Answer: For large files, we recommend using the asynchronous interface + URL method to avoid timeout issues during upload.

Question: Do I need to extract audio from my video files?

Answer: No, the system will automatically extract audio from videos and perform optimization processing.

Question: What are the billing standards?

Answer: Billing is based on audio duration, with less than 1 minute counted as 1 minute. Each account has an initial free quota.

Question: How to handle tasks that remain in PENDING status?

Answer: PENDING indicates that the task has been submitted but is still waiting in the queue for processing. Please be patient. If processing has not started after 30 minutes, please contact customer service.

Contact Us

If you have any questions, please contact us through the following methods:

[email protected]

Website

https://speech-to-text.cn/