Documentation Navigation
Introduction
Official DocumentationThe Speech-to-Text API provides high-quality speech recognition services, supporting transcription of various audio/video formats. This documentation details how to perform speech-to-text operations through the API.
Service Principles
Important Note: The file transcription service processes tasks submitted through the API on a best-effort basis. After submission, the task will enter a queue (PENDING) status. Queue time depends on queue length and file duration, typically within a few minutes. Once processing begins, speech recognition will complete at hundreds of times accelerated speed.
API Basic Information
- API Base URL:https://api.speech-to-text.cn/api/v1/
- Authentication Method:API Key (X-API-Key Header)
- Supported Formats:Various audio and video formats
- Recommended Method:Strongly recommend using URL-based asynchronous submission, especially for larger files
API Key Authentication
All API requests require an API key in the HTTP header for authentication:
X-API-Key: your_api_key_here
If the API key is invalid or disabled, the system will return a 401 error.
API Endpoints
1. Asynchronous Transcription - URL Method (Recommended)
POSTURL
https://api.speech-to-text.cn/api/v1/recognition/async/
Headers
X-API-Key: {{api_key}} Content-Type: application/json
Body (raw - JSON)
{ "url": "https://example.com/audio.mp3" }
(200 OK)
{ "code": 200, "message": "Task submitted successfully", "data": { "task_id": "550e8400-e29b-41d4-a716-446655440000", "dashscope_task_id": "dsope-12345678", "task_status": "DOWNLOADING", "submit_time": "2025-03-16 10:30:45.123", "consumed_time": 0.123, "message": "DOWNLOADING" } }
2. Asynchronous Transcription - File Upload Method
POSTURL
https://api.speech-to-text.cn/api/v1/recognition/async/
Headers
X-API-Key: {{api_key}}
Body (form-data)
Key | Value | 描述 |
---|---|---|
file | [Select File] | Audio or video file |
Same as above, returns task ID and status.
3. Synchronous Transcription - URL Method (Recommended only for short audio)
POSTURL
https://api.speech-to-text.cn/api/v1/recognition/sync/
Headers
X-API-Key: {{api_key}} Content-Type: application/json
Body (raw - JSON)
{ "url": "https://example.com/short-audio.mp3" }
(200 OK)
{ "code": 200, "message": "Recognition successful", "data": { "task_id": "550e8400-e29b-41d4-a716-446655440000", "task_status": "SUCCEEDED", "submit_time": "2025-03-16 10:30:45.123", "end_time": "2025-03-16 10:31:15.456", "duration": 30000, "transcription_text": "This is the recognized text content.", "consumed_points": 1, "consumed_time": 30.123 } }
4. Synchronous Transcription - File Upload Method (Recommended only for short audio)
POSTURL
https://api.speech-to-text.cn/api/v1/recognition/sync/
Headers
X-API-Key: {{api_key}}
Body (form-data)
Key | Value | 描述 |
---|---|---|
file | [Select File] | Audio or video file |
Same as above, returns recognition results.
5. Query Transcription Results
GETURL
https://api.speech-to-text.cn/api/v1/recognition/status/
Headers
X-API-Key: {{api_key}}
Query Params
Key | Value | 描述 |
---|---|---|
task_id | 550e8400-e29b-41d4-a716-446655440000 | Task ID |
(200 OK)
{ "code": 200, "message": "Query successful", "data": { "task_id": "550e8400-e29b-41d4-a716-446655440000", "dashscope_task_id": "dsope-12345678", "task_status": "SUCCEEDED", "submit_time": "2025-03-16 10:30:45.123", "end_time": "2025-03-16 10:31:15.456", "duration": 30000, "transcription_text": "This is the recognized text content.", "consumed_points": 1, "consumed_time": 0.056 } }
Task Status Explanation
Task status will go through the following stages during the entire processing flow:
- DOWNLOADING: System is downloading or processing the file
- PENDING: File is ready, task is queued for processing
- SUCCEEDED: Transcription task completed successfully
- FAILED: Transcription task failed
Supported Audio Formats
The system supports the following audio and video formats:
Optimization Suggestions
1 Automatic Audio Extraction
The system will automatically extract audio from video files, and perform compression and optimization, without requiring manual processing by users.
2 Large File Processing Strategy
- For large files, strongly recommend using URL-based asynchronous submission
- Avoid directly uploading large files, prefer using accessible URLs
- Large file processing may take longer, please be patient
Error Code Explanation
错误码 | 说明 |
---|---|
200 | Success |
1001 | Parameter error |
1002 | Unauthorized (invalid or disabled API key) |
1003 | Access forbidden |
1004 | Resource not found |
429 | Request frequency limit |
1005 | Server error |
1006 | Insufficient balance |
Usage Notes
1 Interface Selection Recommendations
- For audio under 5 minutes: Synchronous interface can be used
- For audio over 5 minutes: Strongly recommended to use asynchronous interface
- For any larger files, always recommend using asynchronous interface
2 Asynchronous Processing Flow
- Call the asynchronous interface to submit a task
- Save the returned task_id
- Use task_id to periodically query results (recommended interval 10-30 seconds)
- When status is SUCCEEDED or FAILED, process the final result
3 URL Requirements
- Ensure the URL is publicly accessible, our servers need to download the content
- URLs should point directly to media files, supports automatic extraction from mainstream video platforms
4 File Size Limitations
- Maximum support for 12GB audio/video files
- Maximum support for 12-hour audio duration
Frequently Asked Questions
Question: How to obtain an API key?
Answer: Create a new API key in the API Key Management page of your user dashboard.
Question: How is billing done?
Answer: Registration provides a 5-minute free quota. After that, billing is based on audio duration, with less than 1 minute counted as 1 minute. Price is approximately $0.00139/minute.
Question: What audio formats are supported?
Answer: Multiple audio and video formats are supported, including: .aac, .amr, .avi, .flac, .flv, .m4a, .mkv, .mov, .mp3, .mp4, .mpeg, .ogg, .opus, .wav, .webm, .wma, .wmv
Question: What URL formats are supported?
Answer: Supports automatic extraction from mainstream video platforms like TikTok, Instagram, as well as regular file URLs.
Question: Will I be charged if extraction fails?
Answer: We do not charge any fees for failed extractions, feel free to use the service.
Question: How to handle large file transcriptions?
Answer: For large files, we recommend using the asynchronous interface + URL method to avoid timeout issues during upload.
Question: Do I need to extract audio from my video files?
Answer: No, the system will automatically extract audio from videos and perform optimization processing.
Question: What are the billing standards?
Answer: Billing is based on audio duration, with less than 1 minute counted as 1 minute. Each account has an initial free quota.
Question: How to handle tasks that remain in PENDING status?
Answer: PENDING indicates that the task has been submitted but is still waiting in the queue for processing. Please be patient. If processing has not started after 30 minutes, please contact customer service.
Contact Us
If you have any questions, please contact us through the following methods: