Skip to main content

What is RTC Mode?

RTC Mode enables real-time voice communication with avatars through LiveKit. It builds on top of Host Mode — the @spatialwalk/avatarkit-rtc adapter package handles the RTC connection and feeds audio/animation data to the SDK automatically.
Web Only: RTC Mode is currently available for Web applications only. iOS and Android support is planned for a future release. In the meantime, native mobile platforms can use Host Mode with your own RTC implementation.

AvatarKit Voice Agent Demo

A full reference repository with implementation details, including different frontend UI options and multiple backend agent patterns.

When to Use

  • Real-time voice conversation — users talk to an avatar via microphone
  • Low-latency interaction — WebRTC provides sub-second latency
  • Server-side AI — your RTC server processes audio and generates responses

Packages Required

PackagePurposeRequired
@spatialwalk/avatarkitAvatar rendering SDKYes
@spatialwalk/avatarkit-rtcRTC adapterYes
[email protected]LiveKit RTC SDKYes
Critical: We are only compatable with livekit-client under version 2.17, since 2.17 introduced single PC by default. We are working on adding support for 2.17+ in a future release, but for now please ensure you install version 2.16.1 to avoid compatibility issues.

LiveKit Browser Compatibility

BrowserMinimum Version
Chrome94+
Firefox117+
Safari15.4+
Edge94+

How It Works

RTC Mode uses the SDK in Host Mode internally. The AvatarPlayer acts as a bridge:
  1. Initializes the avatar SDK with DrivingServiceMode.host
  2. Connects to the RTC server via the chosen provider
  3. Publishes your microphone audio to the RTC server
  4. Receives animation and audio data from the RTC server
  5. Feeds animation data into the avatar SDK for rendering; audio is played through the native WebRTC audio track by the RTC provider
You don’t need to call yieldAudioData() or yieldFramesData() manually — the adapter handles this.

What RTC Mode Does NOT Use

Although RTC Mode builds on Host Mode internally, the following SDK/Host Mode features are not used in RTC Mode:
FeatureUsed in SDK/Host ModeUsed in RTC Mode
initializeAudioContext()Yes — required for Web Audio API playbackNo — audio is played via WebRTC tracks
Internal audio playerYes — SDK decodes and plays audio internallyNo — RTC provider handles audio playback
yieldAudioData() / yieldFramesData()Yes — you call these manuallyNo — the adapter calls them internally
start() / send() / close()Yes (SDK Mode only)No
Audio path difference: In SDK Mode and Host Mode, the SDK plays avatar audio through its internal audio player (Web Audio API). In RTC Mode, avatar audio arrives as a native WebRTC audio track and is played by the browser’s WebRTC stack directly — the SDK’s internal audio player is not involved. This distinction matters for audio processing (e.g., echo cancellation, noise suppression).

Server-Side Setup

Your backend is responsible for sending audio to the avatar service and having the resulting avatar stream published to your RTC room. Two approaches are available:
PlatformFramework pluginServer SDK + egress
LiveKitLiveKit Agents plugin — hooks into your agent pipeline and publishes to the roomLiveKit Server (section 2) — use AvatarKit Server SDK with LiveKit egress config
  • Framework plugin: Best if you already use LiveKit Agents. The plugin handles audio → SpatialReal → RTC publish for you.
  • Server SDK + egress: Use the Golang or Python Server SDK, create a session with LiveKit egress config, and send audio. The avatar service publishes audio and animation directly to the RTC room; your server does not relay that data.
Client-side setup is the same either way: the client joins the room and uses @spatialwalk/avatarkit-rtc to render the avatar. See the client guides below.

Get Started

Client (browser)

LiveKit Client

Connect with LiveKit RTC
Server (backend)

LiveKit Server

Plugin or Server SDK with LiveKit egress