Supported Formats and Limits
This page is about the Real-time transcription API (websocket).
- For limits on Batch SaaS, see the Batch SaaS supported formats and limits.
- For limits on Flow Voice AI, see the Flow Voice AI supported formats and limits.
Supported File Types
The following input types are supported for transcription:
Raw Audio streaming:
PCM F32 LE
raw audio stream (32-bit float)PCM S16 LE
raw audio stream (16- bit signed int)mu-law
Files:
wav
mp3
aac
ogg
mpeg
amr
m4a
mp4
flac
The list above is exhaustive - any file format outside the list above is explicitly not supported.
Only files where the type can be determined by data inspection are supported.
Rate Limiting and Fair Usage
Speechmatics Batch SaaS applies rate limiting and fair queueing to provide a consistently high quality of service to all users.
If you make a large number of requests in a short period of time, some of these requests may fail with the response HTTP 429 - Rate Limited
. To minimize the possibility of encountering rate limiting errors, we recommend that you do not exceed the following rates:
- 10 new jobs per second (POST API calls)
- 50 job status requests per second (GET API calls). Note that Speechmatics recommends using Notifications for job status updates in production
Aside from rate limiting, there is no limit to the number of jobs that you can submit. However, Speechmatics Batch SaaS applies a fair queueing policy which means that if you have a large number of jobs in progress at one time, the most recently submitted jobs may take longer to complete.
Real-time transcription usage Limits
Speechmatics limits the number of hours of audio users can process each month to help manage load on our servers. The current limits (in hours) by account type are listed in the table below:
Max. hours per month | Max. concurrent sessions | |
---|---|---|
Free Tier | 4 | 2 |
Paid Tier | 6,000 | 20 |
Enterprise | Custom | Custom |
Please reach out to Support if you need to increase the above limits.
Session Limits
Real-time SaaS sessions will be automatically ended if any of the following criteria are met:
- Session duration reaches 48 hours
- No audio data (AddAudio messages) sent for 1 hour
- No audio or ping/pongs sent for 3 minutes.
When a session is automatically ended, the Real-time SaaS service will send an in-band error followed by a closing handshake with code 1008
. For more information, see the Real-time API Reference.
Guidance for users
Clients can disconnect a session before it is automatically terminated and immediately reconnect a new session. Note that new sessions will typically start in less than a second. If seamless transition is required, the new session can be connected a few seconds before disconnecting the old session.
Since unpredictable network issues can cause WebSocket connections to be dropped, we recommend graceful handling of session termination for long-running sessions.
Data Retention Limits
Speechmatics Real-Time SaaS does not store audio files, transcripts, or configuration data.