IntegrationsCapabilities
TTS
Implement Text-to-Speech — batch synthesis + optional realtime streaming.
Batch mode synthesises an audio buffer in one shot; realtime streams chunks bidirectionally.
Define the capability
import { defineCapabilityTTS } from "@scrydon/sdk-authoring/integrations/define";
const ttsCapability = defineCapabilityTTS({
models: [
{ id: "tts-standard", name: "Standard TTS", voices: 6, benchmarks: [{ name: "MOS", score: 3.5, source: "Internal" }] },
{ id: "tts-hd", name: "HD TTS", voices: 6, benchmarks: [{ name: "MOS", score: 4.2, source: "Internal" }] },
],
defaultModel: "tts-standard",
// Batch mode — send text, get audio back
async synthesize(request) {
const response = await fetch("https://api.example.com/v1/synthesize", {
method: "POST",
headers: {
Authorization: `Bearer ${request.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
text: request.text,
voice: request.voice,
model: request.model,
speed: request.speed,
}),
});
return {
audioBuffer: await response.arrayBuffer(),
format: "mp3",
mimeType: "audio/mpeg",
};
},
// Realtime streaming (optional)
realtime: {
protocol: "websocket",
async createSession(config) {
// Mirror the STT realtime pattern
},
features: { streamingInput: true, streamingOutput: true },
},
});Model metadata
| Field | Type | Required | Description |
|---|---|---|---|
id | string | yes | Unique model identifier (e.g. "tts-hd") |
name | string | yes | Display name in the UI |
voices | number | no | Number of available voice presets |
benchmarks | BenchmarkScore[] | no | MOS scores — higher is better |
Standard TTS benchmark is MOS (Mean Opinion Score, 1–5) — higher is better. voices tells the UI how many voice presets to surface.