Latency settings
Balance speed and accuracy in your Realtime transcription by adjusting latency settings.
Configuration options
Add these parameters to your StartRecognition message:
{
"type": "transcription",
"transcription_config": {
"max_delay": 0.7,
"max_delay_mode": "flexible",
"enable_partials": true,
"language": "en",
"operating_point": "enhanced"
}
}
max_delay
: Time in seconds (0.7-4.0, default: 4.0) between speech end and final transcript deliverymax_delay_mode
: Mode setting (fixed
orflexible
, default:flexible
) for handling numeral formattingenable_partials
: Boolean (default: false) to enable partial transcripts for faster feedback
Speed vs. accuracy trade-offs
Choose the right max_delay
setting for your use case:
Setting | Accuracy Impact | Recommended Use Cases |
---|---|---|
0.7-1.5s | < 5% degradation | Conversational AI, voice assistants |
2.0s | ~1% degradation | Live captioning, broadcast media |
4.0s | No degradation | Highest accuracy needs with partial transcripts |
Lower latency settings trade some accuracy for speed. Test thoroughly with your specific audio.
Partial transcripts
Get preliminary results faster while waiting for final, more accurate transcripts.
How partial transcripts work
- Delivered in under 500ms (vs. final transcripts at your configured
max_delay
) - Updated continuously as more speech context becomes available
- Enabled with
enable_partials: true
in your configuration
Limitations
- Accuracy is typically 10-25% lower than final transcripts
- Punctuation and capitalization may be incorrect
- Confidence scores are not meaningful and should be ignored
Numeral formatting
Improve transcript readability with properly formatted numbers, dates, and currencies.
Flexible mode
When using max_delay_mode: "flexible"
(default):
- System waits until an entity (number, date, currency) is fully spoken
- Ensures proper formatting of complex numerical expressions
- Slightly increases latency only when entities are detected
Fixed mode
For applications with strict latency requirements:
- Set
max_delay_mode: "fixed"
to enforce consistent timing - System won't wait for entities to complete before returning results
Fixed mode reduces accuracy and readability of numbers, currencies, and dates.
Example output comparison
Finals only (default)
With only final transcripts (default configuration):
(Final): I am 35.
Partials with flexible mode
With enable_partials: true
and max_delay_mode: "flexible"
:
(Partial): I
(Partial): I am
(Partial): I am third
(Partial): I am 30
(Final): I am 35.
Note how the system corrects "30" to "35" in the final transcript.
Partials with fixed mode
With enable_partials: true
and max_delay_mode: "fixed"
:
(Partial): I
(Final): I am
(Partial): third
(Final): 30
(Partial): five
(Final): five.
Final output: "I am 30 five." Note how the number isn't properly formatted.