Skip to main content
SpatialReal services accept mono 16-bit PCM (s16le) audio. The sample rate is configured at SDK initialization and must match the audio your application sends.

Audio Input Requirements

PropertyValue
Sample RateOne of 8000, 16000, 22050, 24000, 32000, 44100, 48000
Channels1 (mono)
Bit Depth16-bit
FormatRaw PCM bytes
PCM Encodings16le
If your source audio uses stereo channels, floating point samples, compressed codecs, or an unsupported sample rate, convert it before sending it to SpatialReal.

What Is s16le?

  • s16 means each audio sample is a signed 16-bit integer.
  • le means the bytes are stored in little-endian order.
  • mono means the stream contains a single channel only.

Supported Sample Rates

Sample RateCommon Use Case
8000 HzTelephony, low-bandwidth voice
16000 HzSpeech recognition, voice assistants (default)
22050 HzLow-quality speech synthesis
24000 HzCommon TTS output (OpenAI, ElevenLabs)
32000 HzWideband audio
44100 HzCD-quality audio
48000 HzProfessional audio, WebRTC default
Sending audio at a sample rate that doesn’t match your SDK configuration will produce distorted or silent playback. Always ensure the configured rate matches your audio source.

How to Choose a Sample Rate

  • Use 16000 Hz (default) for most speech-driven integrations — it balances quality and bandwidth well.
  • Match your TTS provider’s native output rate to avoid resampling. For example, many TTS services output at 24000 Hz.
  • In RTC Mode, the sample rate is typically governed by the RTC framework (e.g., LiveKit defaults to 48000 Hz).
The sample rate only affects audio sent to the driving service. The avatar’s rendered animation quality is independent of sample rate.

Configuration

Set the sample rate when initializing the client-side SDK through the AudioFormat parameter.
await AvatarSDK.initialize({
  appId: 'YOUR_APP_ID',
  audioFormat: { channelCount: 1, sampleRate: 16000 },
  // ...
});

Before Sending Audio

Before sending audio to the SDK or service, make sure your pipeline outputs:
  • a supported sample rate
  • mono audio
  • 16-bit signed PCM samples
  • raw PCM bytes rather than WAV headers or compressed audio frames
For a practical overview of common audio encodings and container formats, see the FFmpeg audio types reference.