TTS

Batch mode synthesises an audio buffer in one shot; realtime streams chunks bidirectionally.

Define the capability

import { defineCapabilityTTS } from "@scrydon/sdk-authoring/integrations/define";

const ttsCapability = defineCapabilityTTS({
  models: [
    { id: "tts-standard", name: "Standard TTS", voices: 6, benchmarks: [{ name: "MOS", score: 3.5, source: "Internal" }] },
    { id: "tts-hd",       name: "HD TTS",       voices: 6, benchmarks: [{ name: "MOS", score: 4.2, source: "Internal" }] },
  ],
  defaultModel: "tts-standard",

  // Batch mode — send text, get audio back
  async synthesize(request) {
    const response = await fetch("https://api.example.com/v1/synthesize", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${request.apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        text: request.text,
        voice: request.voice,
        model: request.model,
        speed: request.speed,
      }),
    });
    return {
      audioBuffer: await response.arrayBuffer(),
      format: "mp3",
      mimeType: "audio/mpeg",
    };
  },

  // Realtime streaming (optional)
  realtime: {
    protocol: "websocket",
    async createSession(config) {
      // Mirror the STT realtime pattern
    },
    features: { streamingInput: true, streamingOutput: true },
  },
});

Model metadata

Field	Type	Required	Description
`id`	string	yes	Unique model identifier (e.g. `"tts-hd"`)
`name`	string	yes	Display name in the UI
`voices`	number	no	Number of available voice presets
`benchmarks`	`BenchmarkScore[]`	no	MOS scores — higher is better

Standard TTS benchmark is MOS (Mean Opinion Score, 1–5) — higher is better. voices tells the UI how many voice presets to surface.

TTS

Define the capability

Model metadata

On this page