Agent Mode is coming soon — a fully managed voice agent solution with built-in conversation logic, speech synthesis, and avatar rendering.
Choose Your Integration Mode
Spatialreal offers four distinct integration modes to suit different architectural needs, latency requirements, and development preferences.SDK Mode
Client-centric integration with minimal server-side changes
Host Mode
Full control with custom transport layer
RTC Mode
Ultra-low latency via LiveKit or Agora
Framework Plugin
Seamless integration with LiveKit Agents or TEN Framework
SDK Mode
In this mode, the client-side application manages the audio input. The developer passes the audio to the Spatialreal Client SDK, which handles the server interaction to retrieve animation data and render the avatar.
Best Suited For:
- Client-Centric Logic: Scenarios where the voice agent logic resides primarily on the device.
- Moderate Latency: Projects where ultra-low latency is not the absolute priority.
- Simplified Architecture: Minimal server-side development required (only for authentication), allowing most logic to remain on the client.
View SDK Mode Guide
Host Mode
In Host Mode, the developer acts as the bridge. You use the Spatialreal Server SDK to stream audio to the service and receive streaming drive parameters back. It is then your responsibility to transport this data to the client.
Best Suited For:
- Custom Transport: Teams that already have a reliable, controllable transport layer.
- Deep Integration: Developers willing to handle server-side adaptation for maximum control.
- High Low-Latency Demands: When you need to optimize the network path manually.
View Host Mode Guide
RTC Mode
This mode leverages Real-Time Communication (RTC) infrastructure (currently supporting LiveKit and Agora). The developer streams audio to Spatialreal, but instead of returning data to the developer, Spatialreal pushes the drive parameters directly into an RTC room/channel.The stream contains binary drive parameters and audio, not a pre-rendered video feed. Therefore, it cannot be played by standard video players; it requires the Spatialreal Client to render.
- Existing RTC Users: Teams already using LiveKit or Agora for voice agents but not using their specific agent frameworks (e.g., LiveKit Agents or TEN Framework).
- Server-Managed State: Scenarios requiring server-side management of conversation state (e.g., handling interruptions).
- Ultra-Low Latency: Leveraging established RTC networks for minimal delay.
View RTC Mode Guide
Framework Plugin
This is the most streamlined approach for modern voice agent frameworks. Developers use a provided plugin that sits inside the voice agent pipeline (e.g., LiveKit Agents, TEN Framework).
Best Suited For:
- Framework Users: Teams already using frameworks like LiveKit Agents or TEN Framework.
- Rapid Integration: Low implementation cost; the plugin automatically handles signal processing and conversation state (interruption logic).
- Migration: Easy to switch if you are currently using other avatar services within these frameworks.
Comparison
| Mode | Characteristic | Latency | Integration Effort | Ideal Scenario |
|---|---|---|---|---|
| SDK Mode | Client-centric | Moderate | Low | Teams that don’t want a lot server side change. |
| Host Mode | Custom Transport Layer | Low | High | Apps requiring total control over data transport. |
| RTC Mode | Transport via Agora/LiveKit | Ultra-Low | Medium | Existing RTC users needing server-side state control. |
| Plugin | Voice Agent Framework | Ultra-Low | Low | Users of LiveKit Agents or TEN Framework. |

