Latency settings

Balance speed and accuracy in your Realtime transcription by adjusting latency settings.

Configuration options

Add these parameters to your StartRecognition message:

{
  "type": "transcription",
  "transcription_config": {
    "max_delay": 0.7,
    "max_delay_mode": "flexible",
    "enable_partials": true,
    "language": "en",
    "operating_point": "enhanced"
  }
}

max_delay: Time in seconds (0.7-4.0, default: 4.0) between speech end and final transcript delivery
max_delay_mode: Mode setting (fixed or flexible, default: flexible) for handling numeral formatting
enable_partials: Boolean (default: false) to enable partial transcripts for faster feedback

Speed vs. accuracy trade-offs

Choose the right max_delay setting for your use case:

Setting	Accuracy Impact	Recommended Use Cases
0.7-1.5s	< 5% degradation	Conversational AI, voice assistants
2.0s	~1% degradation	Live captioning, broadcast media
4.0s	No degradation	Highest accuracy needs with partial transcripts

WARNING

Lower latency settings trade some accuracy for speed. Test thoroughly with your specific audio.

Partial transcripts

Get preliminary results faster while waiting for final, more accurate transcripts.

How partial transcripts work

Delivered in under 500ms (vs. final transcripts at your configured max_delay)
Updated continuously as more speech context becomes available
Enabled with enable_partials: true in your configuration

Limitations

Accuracy is typically 10-25% lower than final transcripts
Punctuation and capitalization may be incorrect
Confidence scores are not meaningful and should be ignored

Numeral formatting

Improve transcript readability with properly formatted numbers, dates, and currencies.

Flexible mode

When using max_delay_mode: "flexible" (default):

System waits until an entity (number, date, currency) is fully spoken
Ensures proper formatting of complex numerical expressions
Slightly increases latency only when entities are detected

Fixed mode

For applications with strict latency requirements:

Set max_delay_mode: "fixed" to enforce consistent timing
System won't wait for entities to complete before returning results

WARNING

Fixed mode reduces accuracy and readability of numbers, currencies, and dates.

Example output comparison

Finals only (default)

With only final transcripts (default configuration):

(Final): I am 35.

Partials with flexible mode

With enable_partials: true and max_delay_mode: "flexible":

(Partial): I
(Partial): I am
(Partial): I am third
(Partial): I am 30
(Final): I am 35.

Note how the system corrects "30" to "35" in the final transcript.

Partials with fixed mode

With enable_partials: true and max_delay_mode: "fixed":

(Partial): I
(Final): I am
(Partial): third
(Final): 30
(Partial): five
(Final): five.

Final output: "I am 30 five." Note how the number isn't properly formatted.

Configuration options​

Speed vs. accuracy trade-offs​

Partial transcripts​

How partial transcripts work​

Limitations​

Numeral formatting​

Flexible mode​

Fixed mode​

Example output comparison​

Finals only (default)​

Partials with flexible mode​

Partials with fixed mode​