Skip to main content

Quickstart

Learn how to convert streaming audio to text.

This guide will show you how to use WebSocket API to transcribe voice.

TIP

The easiest way to try real-time transcription is via the web portal.

Using the Real-time SaaS WebSocket API

1. Create an API key

Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.

INFO

Enterprise customers may need to speak to Support to get your API keys.

2. Pick and install a library

Check out our Javascript client or Python client to get started.

npm install @speechmatics/real-time-client @speechmatics/auth

3. Insert your API key

Paste your API key into YOUR_API_KEY in the code.

import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;

const client = new RealtimeClient();

const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";

const stream = https.get(streamURL, (response) => {
// Handle the response stream
response.on("data", (chunk) => {
client.sendAudio(chunk);
});

response.on("end", () => {
console.log("Stream ended");
client.stopRecognition({ noTimeout: true });
});

response.on("error", (error) => {
console.error("Stream error:", error);
client.stopRecognition();
});
});

stream.on("error", (error) => {
console.error("Request error:", error);
client.stopRecognition();
});

client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
for (const result of data.results) {
if (result.type === "word") {
process.stdout.write(" ");
}
process.stdout.write(`${result.alternatives?.[0].content}`);
if (result.is_eos) {
process.stdout.write("\n");
}
}
} else if (data.message === "EndOfTranscript") {
process.stdout.write("\n");
process.exit(0);
} else if (data.message === "Error") {
process.stdout.write(`\n${JSON.stringify(data)}\n`);
process.exit(1);
}
});

createSpeechmaticsJWT({
type: "rt",
apiKey,
ttl: 60, // 1 minute
}).then((jwt) => {
client.start(jwt, {
transcription_config: {
language: "en",
operating_point: "enhanced",
max_delay: 1.0,
transcript_filtering_config: {
remove_disfluencies: true,
},
},
});
});

Transcript outputs

The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.

Final transcripts

Final transcripts are the definitive result.

  • They reflect the best transcription for the spoken audio.
  • Once displayed, they are not updated.
  • Words arrive incrementally, with some delay.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config. Larger values of max_delay increase accuracy by giving the system more time to process audio context.

TIP

Best for accurate, completed transcripts where some delay is acceptable

Partial transcripts

Partial transcripts are low-latency and can update later as more conversation context arrives.

  • You must enable them using enable_partials in your transcription_config.
  • Partials are emitted quickly (typically less than 500ms).
  • The engine may revise them as more audio is processed.

You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config.

TIP

Use partials for: real-time captions, voice interfaces, or any case where speed matters