Audio Format and Sample Rate

SpatialReal services accept mono 16-bit PCM (s16le) audio. The sample rate is configured at SDK initialization and must match the audio your application sends.

Audio Input Requirements

Property	Value
Sample Rate	One of `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000`
Channels	`1` (mono)
Bit Depth	`16-bit`
Format	Raw PCM bytes
PCM Encoding	`s16le`

If your source audio uses stereo channels, floating point samples, compressed codecs, or an unsupported sample rate, convert it before sending it to SpatialReal.

What Is `s16le`?

s16 means each audio sample is a signed 16-bit integer.
le means the bytes are stored in little-endian order.
mono means the stream contains a single channel only.

Supported Sample Rates

Sample Rate	Common Use Case
`8000` Hz	Telephony, low-bandwidth voice
`16000` Hz	Speech recognition, voice assistants (default)
`22050` Hz	Low-quality speech synthesis
`24000` Hz	Common TTS output (OpenAI, ElevenLabs)
`32000` Hz	Wideband audio
`44100` Hz	CD-quality audio
`48000` Hz	Professional audio, WebRTC default

Sending audio at a sample rate that doesn’t match your SDK configuration will produce distorted or silent playback. Always ensure the configured rate matches your audio source.

How to Choose a Sample Rate

Use 16000 Hz (default) for most speech-driven integrations — it balances quality and bandwidth well.
Match your TTS provider’s native output rate to avoid resampling. For example, many TTS services output at 24000 Hz.
In RTC Mode, the sample rate is typically governed by the RTC framework (e.g., LiveKit defaults to 48000 Hz).

The sample rate only affects audio sent to the driving service. The avatar’s rendered animation quality is independent of sample rate.

Configuration

Set the sample rate when initializing the client-side SDK through the AudioFormat parameter.

Web
iOS
Android

await AvatarSDK.initialize({
  appId: 'YOUR_APP_ID',
  audioFormat: { channelCount: 1, sampleRate: 16000 },
  // ...
});

AvatarSDK.initialize(
    appID: "YOUR_APP_ID",
    configuration: Configuration(
        environment: .intl,
        audioFormat: AudioFormat(sampleRate: 16000),
        drivingServiceMode: .sdk,
        logLevel: .warning
    )
)

AvatarSDK.initialize(
    context = applicationContext,
    appId = "YOUR_APP_ID",
    configuration = Configuration(
        environment = Environment.Intl,
        audioFormat = AudioFormat(16000),
        drivingServiceMode = DrivingServiceMode.SDK,
        logLevel = LogLevel.INFO
    )
)

Before Sending Audio

Before sending audio to the SDK or service, make sure your pipeline outputs:

a supported sample rate
mono audio
16-bit signed PCM samples
raw PCM bytes rather than WAV headers or compressed audio frames

For a practical overview of common audio encodings and container formats, see the FFmpeg audio types reference.

​Audio Input Requirements

​What Is s16le?

​Supported Sample Rates

​How to Choose a Sample Rate

​Configuration

​Before Sending Audio

Audio Input Requirements

What Is `s16le`?

Supported Sample Rates

How to Choose a Sample Rate

Configuration

Before Sending Audio