s16le) audio. The sample rate is configured at SDK initialization and must match the audio your application sends.
Audio Input Requirements
| Property | Value |
|---|---|
| Sample Rate | One of 8000, 16000, 22050, 24000, 32000, 44100, 48000 |
| Channels | 1 (mono) |
| Bit Depth | 16-bit |
| Format | Raw PCM bytes |
| PCM Encoding | s16le |
If your source audio uses stereo channels, floating point samples, compressed codecs, or an unsupported sample rate, convert it before sending it to SpatialReal.
What Is s16le?
s16means each audio sample is a signed 16-bit integer.lemeans the bytes are stored in little-endian order.monomeans the stream contains a single channel only.
Supported Sample Rates
| Sample Rate | Common Use Case |
|---|---|
8000 Hz | Telephony, low-bandwidth voice |
16000 Hz | Speech recognition, voice assistants (default) |
22050 Hz | Low-quality speech synthesis |
24000 Hz | Common TTS output (OpenAI, ElevenLabs) |
32000 Hz | Wideband audio |
44100 Hz | CD-quality audio |
48000 Hz | Professional audio, WebRTC default |
How to Choose a Sample Rate
- Use
16000Hz (default) for most speech-driven integrations — it balances quality and bandwidth well. - Match your TTS provider’s native output rate to avoid resampling. For example, many TTS services output at
24000Hz. - In RTC Mode, the sample rate is typically governed by the RTC framework (e.g., LiveKit defaults to
48000Hz).
The sample rate only affects audio sent to the driving service. The avatar’s rendered animation quality is independent of sample rate.
Configuration
Set the sample rate when initializing the client-side SDK through theAudioFormat parameter.
- Web
- iOS
- Android
Before Sending Audio
Before sending audio to the SDK or service, make sure your pipeline outputs:- a supported sample rate
- mono audio
- 16-bit signed PCM samples
- raw PCM bytes rather than WAV headers or compressed audio frames

