Skip to main content
SpatialReal currently provides avatar-only services, focusing on generating and rendering real-time avatar animations based on audio input. Voice conversation logic, speech synthesis, and other agent functionalities are managed by your application or third-party services.
Agent Mode is coming soon — a fully managed voice agent solution with built-in conversation logic, speech synthesis, and avatar rendering.

Choose Your Integration Mode

Spatialreal offers four distinct integration modes to suit different architectural needs, latency requirements, and development preferences.

SDK Mode

In this mode, the client-side application manages the audio input. The developer passes the audio to the Spatialreal Client SDK, which handles the server interaction to retrieve animation data and render the avatar.
1

Pass Audio

Developer passes audio to the Spatialreal SDK on the client.
2

Inference

SDK calls the inference service.
3

Render

SDK receives drive parameters and plays the avatar.
Best Suited For:
  • Client-Centric Logic: Scenarios where the voice agent logic resides primarily on the device.
  • Moderate Latency: Projects where ultra-low latency is not the absolute priority.
  • Simplified Architecture: Minimal server-side development required (only for authentication), allowing most logic to remain on the client.

View SDK Mode Guide


Host Mode

In Host Mode, the developer acts as the bridge. You use the Spatialreal Server SDK to stream audio to the service and receive streaming drive parameters back. It is then your responsibility to transport this data to the client.
1

Stream Audio

Developer sends streaming audio via Server SDK.
2

Receive Parameters

Spatialreal returns streaming drive parameters to the Developer’s Server.
3

Transport

Developer transmits audio and parameters to the Client SDK via a custom transport layer.
4

Render

Client SDK renders the avatar.
The custom transport layer must ensure data is delivered without duplication, loss, or disorder.
Best Suited For:
  • Custom Transport: Teams that already have a reliable, controllable transport layer.
  • Deep Integration: Developers willing to handle server-side adaptation for maximum control.
  • High Low-Latency Demands: When you need to optimize the network path manually.

View Host Mode Guide


RTC Mode

This mode leverages Real-Time Communication (RTC) infrastructure (currently supporting LiveKit and Agora). The developer streams audio to Spatialreal, but instead of returning data to the developer, Spatialreal pushes the drive parameters directly into an RTC room/channel.
1

Stream Audio

Developer streams audio via Server SDK.
2

Push to RTC

Spatialreal Service pushes audio and avatar drive data to an RTC Room.
3

Subscribe & Play

Client joins the room using the Spatialreal RTC Client to subscribe, parse, and play.
The stream contains binary drive parameters and audio, not a pre-rendered video feed. Therefore, it cannot be played by standard video players; it requires the Spatialreal Client to render.
Best Suited For:
  • Existing RTC Users: Teams already using LiveKit or Agora for voice agents but not using their specific agent frameworks (e.g., LiveKit Agents or TEN Framework).
  • Server-Managed State: Scenarios requiring server-side management of conversation state (e.g., handling interruptions).
  • Ultra-Low Latency: Leveraging established RTC networks for minimal delay.

View RTC Mode Guide


Framework Plugin

This is the most streamlined approach for modern voice agent frameworks. Developers use a provided plugin that sits inside the voice agent pipeline (e.g., LiveKit Agents, TEN Framework).
1

Intercept Audio

The Plugin intercepts audio from the agent pipeline.
2

Send to Spatialreal

It sends audio to Spatialreal and configures the RTC push parameters.
3

Push & Render

Spatialreal pushes data to the RTC room (Client side is identical to RTC Mode).
Best Suited For:
  • Framework Users: Teams already using frameworks like LiveKit Agents or TEN Framework.
  • Rapid Integration: Low implementation cost; the plugin automatically handles signal processing and conversation state (interruption logic).
  • Migration: Easy to switch if you are currently using other avatar services within these frameworks.

Comparison

ModeCharacteristicLatencyIntegration EffortIdeal Scenario
SDK ModeClient-centricModerateLowTeams that don’t want a lot server side change.
Host ModeCustom Transport LayerLowHighApps requiring total control over data transport.
RTC ModeTransport via Agora/LiveKitUltra-LowMediumExisting RTC users needing server-side state control.
PluginVoice Agent FrameworkUltra-LowLowUsers of LiveKit Agents or TEN Framework.

Next Steps