# Server API Reference Source: https://docs.spatialreal.ai/api-reference/api-reference Server-side API Reference ## Obtain a session token Call the server endpoint `POST /v1/console/session-tokens`. ### Console API host In the examples below, replace `` with the host for your region: | Region | `` | | -------------- | ---------------------------------------- | | `ap-northeast` | `console.ap-northeast.spatialwalk.cloud` | | `us-west` | `console.us-west.spatialwalk.cloud` | ### Request parameters * **Body** * **expireAt**: int, expiration timestamp in seconds. Range: now \< expireAt \< now + 24 hours * **modelVersion**: string, reserved field; omit * **Header** * **X-Api-Key**: your API key ### Request example ```bash theme={null} curl --location --request POST 'https:///v1/console/session-tokens' \ # replace with your console API domain --header 'X-Api-Key: ' \ # replace with your API key --header 'Content-Type: application/json' \ --data-raw '{ "expireAt": 1754824283, "modelVersion": "" }' # set expireAt to the desired expiration time ``` ### Response example ```json theme={null} { "sessionToken": "..." } ``` # Authentication Source: https://docs.spatialreal.ai/api-reference/auth Authentication Flow ## Before you start Before you begin, make sure you have your app-id and api-key. AvatarKit authentication requires a server-side component, so you need to implement an authentication endpoint on your own server. ## Connection flow 1. The client sends an authentication request to your business server. 2. The business server sends a request to SpatialReal's server to generate a session token, including information such as expiration time in the request body, and providing the api-key. 3. The SpatialReal server returns the session token to your business server. 4. The business server returns the session token to the client. 5. The client initializes AvatarKit with the session token. 6. Inside AvatarKit, a connection request is created to the SpatialReal server using the session token. ```mermaid theme={null} sequenceDiagram participant Client as Client participant BizServer as Business Server participant SW as SpatialReal Server participant Kit as AvatarKit Client->>BizServer: Authentication request BizServer->>SW: Generate session token (with api-key, expiration) SW-->>BizServer: Return session token BizServer-->>Client: Return session token Client->>Kit: Initialize with session token Kit->>SW: Create connection request (session token) SW-->>Kit: Connection response Note over SW,Kit: After token expiration, new connections are rejected
Existing connections are not affected ``` ## Token expiration If you attempt to establish a new connection after the token's configured expiration time, it will be rejected. Existing established connections are not affected. ## Notes * Avoid leaking your api-key; ensure it is only used on the server. * The session token is designed to be single-use. Ensure a new token is used for each connection. > For detailed authentication API docs, see the [API Reference](/api-reference/api-reference) # Regions Source: https://docs.spatialreal.ai/api-reference/regions Available regions, how to configure and how to fallback to other region for higher availability. ## Globally distributed service Our avatar service runs in multiple regions. You can choose the region that’s closest to your infrastructure to minimize latency. ## Available regions Use these region names when selecting a deployment: * **ap-northeast** * **us-west** ## Console API hosts When a doc example uses ``, substitute the host for your region: | Region | `` | | -------------- | ---------------------------------------- | | `ap-northeast` | `console.ap-northeast.spatialwalk.cloud` | | `us-west` | `console.us-west.spatialwalk.cloud` | ## Console and Ingress endpoint URLs When a doc example uses console endpoint url or ingress endpoint url, substitute the URL for your region: | Region | Console Endpoint URL | Ingress Endpoint URL (v2) | | -------------- | ----------------------------------------------------------- | ----------------------------------------------------------- | | `ap-northeast` | `https://console.ap-northeast.spatialwalk.cloud/v1/console` | `wss://api.ap-northeast.spatialwalk.cloud/v2/driveningress` | | `us-west` | `https://console.us-west.spatialwalk.cloud/v1/console` | `wss://api.us-west.spatialwalk.cloud/v2/driveningress` | ## Cross-region portability The following identifiers/credentials are **not region-specific** and can be used across regions: * **App ID** * **API Key** * **Avatar ID** * **Session Token** (generated by the console API) This means you can switch regions or run multi-region without recreating apps, rotating keys, or re-generating avatars. ## High availability with regional fallback Services in different regions run on **separate infrastructure**, so you can improve availability by implementing fallback: * **Primary + backup**: if requests to the primary region fail, retry against another region. * **Data sync**: since the identifiers/credentials mentioned above are portable, you can keep using them when failing over to another region. # Audio Format and Sample Rate Source: https://docs.spatialreal.ai/concepts/audio Audio input format requirements and sample rate configuration across platforms. SpatialReal services accept **mono 16-bit PCM (`s16le`)** audio. The sample rate is configured at SDK initialization and must match the audio your application sends. ## Audio Input Requirements | Property | Value | | ------------ | ------------------------------------------------------------------- | | Sample Rate | One of `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000` | | Channels | `1` (mono) | | Bit Depth | `16-bit` | | Format | Raw PCM bytes | | PCM Encoding | `s16le` | If your source audio uses stereo channels, floating point samples, compressed codecs, or an unsupported sample rate, convert it before sending it to SpatialReal. ## What Is [`s16le`](https://trac.ffmpeg.org/wiki/audio%20types)? * `s16` means each audio sample is a signed 16-bit integer. * `le` means the bytes are stored in little-endian order. * `mono` means the stream contains a single channel only. ## Supported Sample Rates | Sample Rate | Common Use Case | | ----------- | ---------------------------------------------- | | `8000` Hz | Telephony, low-bandwidth voice | | `16000` Hz | Speech recognition, voice assistants (default) | | `22050` Hz | Low-quality speech synthesis | | `24000` Hz | Common TTS output (OpenAI, ElevenLabs) | | `32000` Hz | Wideband audio | | `44100` Hz | CD-quality audio | | `48000` Hz | Professional audio, WebRTC default | Sending audio at a sample rate that doesn't match your SDK configuration will produce distorted or silent playback. Always ensure the configured rate matches your audio source. ## How to Choose a Sample Rate * Use `16000` Hz (default) for most speech-driven integrations β€” it balances quality and bandwidth well. * Match your TTS provider's native output rate to avoid resampling. For example, many TTS services output at `24000` Hz. * In RTC Mode, the sample rate is typically governed by the RTC framework (e.g., LiveKit defaults to `48000` Hz). The sample rate only affects audio sent to the driving service. The avatar's rendered animation quality is independent of sample rate. ## Configuration Set the sample rate when initializing the client-side SDK through the `AudioFormat` parameter. ```typescript theme={null} await AvatarSDK.initialize({ appId: 'YOUR_APP_ID', audioFormat: { channelCount: 1, sampleRate: 16000 }, // ... }); ``` ```swift theme={null} AvatarSDK.initialize( appID: "YOUR_APP_ID", configuration: Configuration( environment: .intl, audioFormat: AudioFormat(sampleRate: 16000), drivingServiceMode: .sdk, logLevel: .warning ) ) ``` ```kotlin theme={null} AvatarSDK.initialize( context = applicationContext, appId = "YOUR_APP_ID", configuration = Configuration( environment = Environment.Intl, audioFormat = AudioFormat(16000), drivingServiceMode = DrivingServiceMode.SDK, logLevel = LogLevel.INFO ) ) ``` ## Before Sending Audio Before sending audio to the SDK or service, make sure your pipeline outputs: * a supported sample rate * mono audio * 16-bit signed PCM samples * raw PCM bytes rather than WAV headers or compressed audio frames For a practical overview of common audio encodings and container formats, see the [FFmpeg audio types reference](https://trac.ffmpeg.org/wiki/audio%20types). # Events Source: https://docs.spatialreal.ai/concepts/events Available events to handle in your application, and how to handle them. AvatarKit client SDKs expose event callbacks so your app can react to connection changes, conversation state transitions, and errors in real time. ## Event Callbacks | Callback | Payload | Location | Description | | --------------------- | ------------------- | ------------------ | ----------------------------------------------------------------- | | `onFirstRendering` | β€” | `AvatarView` | Fires once when the avatar renders its first frame. | | `onConnectionState` | `ConnectionState` | `AvatarController` | Fires when the WebSocket connection state changes. SDK mode only. | | `onConversationState` | `ConversationState` | `AvatarController` | Fires when the avatar's playback state changes. | | `onError` | `AvatarError` | `AvatarController` | Fires when a runtime error occurs. | ## ConnectionState Indicates the current state of the WebSocket connection to the driving service. | State | Description | | -------------- | ----------------------------------------------------------------------------- | | `disconnected` | No active connection. | | `connecting` | Connection is being established. | | `connected` | Connection is active and ready. | | `failed` | Connection failed. On iOS and Android, includes `code` and `message` details. | ## ConversationState Tracks the avatar's current playback state. | State | Description | | --------- | ---------------------------------------------------------- | | `idle` | No active conversation β€” avatar shows breathing animation. | | `playing` | Avatar is actively playing audio and animation. | | `paused` | Playback is paused. | ## Usage Examples ```typescript theme={null} const controller = avatarView.controller; // First frame rendered avatarView.onFirstRendering = () => { console.log('First frame rendered'); }; // Connection state (SDK mode only) controller.onConnectionState = (state) => { console.log('Connection:', state); }; // Conversation state controller.onConversationState = (state) => { console.log('Conversation:', state); }; // Error handling controller.onError = (error) => { console.error('Error:', error.code, error.message); }; ``` ```swift theme={null} let controller = avatarView.controller // First frame rendered avatarView.onFirstRendering = { print("First frame rendered") } // Connection state (SDK mode only) controller.onConnectionState = { state in switch state { case .connected: print("Connected") case .failed(let code, let message): print("Failed: \(code) \(message)") default: break } } // Conversation state controller.onConversationState = { state in print("Conversation: \(state)") } // Error handling controller.onError = { error in print("Error: \(error.localizedDescription)") } ``` ```kotlin theme={null} val controller = avatarView.controller // First frame rendered avatarView.onFirstRendering = { Log.d("Avatar", "First frame rendered") } // Connection state (SDK mode only) controller?.onConnectionState = { state -> when (state) { is ConnectionState.Connected -> Log.d("Avatar", "Connected") is ConnectionState.Failed -> Log.e("Avatar", "Failed: ${state.message}") else -> {} } } // Conversation state controller?.onConversationState = { state -> Log.d("Avatar", "Conversation: $state") } // Error handling controller?.onError = { error -> Log.e("Avatar", "Error: ${error.message}") } ``` # Integration Modes Source: https://docs.spatialreal.ai/concepts/integrations Choose the right integration approach for your use case ## Choose Your Integration Mode SpatialReal offers four distinct integration modes to suit different architectural needs, latency requirements, and development preferences. Client-centric integration with minimal server-side changes Ultra-low latency via LiveKit or Agora Seamless integration with LiveKit Agents or TEN Framework Full control with custom transport layer *** ## SDK Mode In this mode, the client-side application manages the audio input. The developer passes the audio to the SpatialReal Client SDK, which handles the server interaction to retrieve animation data and render the avatar. Developer passes audio to the SpatialReal SDK on the client. SDK calls the inference service. SDK receives drive parameters and plays the avatar. **Best Suited For:** * **Client-Centric Logic:** Scenarios where the voice agent logic resides primarily on the device. * **Moderate Latency:** Projects where ultra-low latency is not the absolute priority. * **Simplified Architecture:** Minimal server-side development required (only for authentication), allowing most logic to remain on the client.
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#64748b" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#111827" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- flowchart LR subgraph Client ["πŸ–₯️ Client"] App["Client App (Voice Logic)"] SDK["SpatialReal SDK"] end Cloud["☁️ SpatialReal Cloud"] App -- "Audio Data" --> SDK SDK -- "Send Audio" --> Cloud Cloud -- "Drive Params" --> SDK SDK -- "Render & Play" --> App style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 ```
*** ## RTC Mode This mode leverages Real-Time Communication (RTC) infrastructure (currently supporting LiveKit and Agora). The developer streams audio to SpatialReal, but instead of returning data to the developer, SpatialReal pushes the drive parameters directly into an RTC room/channel. Developer streams audio via Server SDK. SpatialReal Service pushes audio and avatar drive data to an RTC Room. Client joins the room using the SpatialReal RTC Client to subscribe, parse, and play. The stream contains **binary drive parameters and audio**, not a pre-rendered video feed. Therefore, it cannot be played by standard video players; it requires the SpatialReal Client to render. **Best Suited For:** * **Existing RTC Users:** Teams already using LiveKit or Agora for voice agents but *not* using their specific agent frameworks (e.g., LiveKit Agents or TEN Framework). * **Server-Managed State:** Scenarios requiring server-side management of conversation state (e.g., handling interruptions). * **Ultra-Low Latency:** Leveraging established RTC networks for minimal delay.
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#64748b" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#111827" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- flowchart BT subgraph Server ["πŸ”§ Developer Backend"] DevServer["Server Logic"] ServerSDK["SpatialReal Server SDK"] DevServer --> ServerSDK end subgraph Cloud ["☁️ SpatialReal Cloud"] Service["Avatar Service"] RTCPub["RTC Egress"] Service --> RTCPub end subgraph RTC ["πŸ“‘ RTC Provider"] Room["LiveKit Room / Agora Channel"] end subgraph Client ["πŸ–₯️ Client"] RTCClient["SpatialReal RTC Client"] end ServerSDK -->|Audio Stream| Service RTCPub -->|Drive Params + Audio| Room Room -->|Subscribe| RTCClient style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style RTC fill:#f3e5f5,stroke:#9C27B0,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 style Server fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 ```
*** ## Framework Plugin This is the most streamlined approach for modern voice agent frameworks. Developers use a provided plugin that sits inside the voice agent pipeline (e.g., LiveKit Agents, TEN Framework). The Plugin intercepts audio from the agent pipeline. It sends audio to SpatialReal and configures the RTC push parameters. SpatialReal pushes data to the RTC room (Client side is identical to RTC Mode). **Best Suited For:** * **Framework Users:** Teams already using frameworks like LiveKit Agents or TEN Framework. * **Rapid Integration:** Low implementation cost; the plugin automatically handles signal processing and conversation state (interruption logic). * **Migration:** Easy to switch if you are currently using other avatar services within these frameworks.
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#64748b" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#111827" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- flowchart BT subgraph Agent ["πŸ€– Voice Agent"] Pipeline["Agent Pipeline (LiveKit / TEN)"] Plugin["SpatialReal Plugin"] end subgraph Cloud ["☁️ SpatialReal Cloud"] Service["Inference Service"] RTCPub["RTC Egress"] end subgraph RTC ["πŸ“‘ RTC Provider"] Room["LiveKit Room / Agora Channel"] end subgraph Client ["πŸ–₯️ Client"] RTCClient["SpatialReal RTC Client"] end Pipeline -. Audio .-> Plugin Plugin -- Audio & Config --> Service Service --> RTCPub RTCPub -- RTC Push --> Room Room -- Subscribe --> RTCClient style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style RTC fill:#f3e5f5,stroke:#9C27B0,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 style Agent fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 ```
*** ## Host Mode In Host Mode, the developer acts as the bridge. You use the SpatialReal Server SDK to stream audio to the service and receive streaming drive parameters back. It is then your responsibility to transport this data to the client. Developer sends streaming audio via Server SDK. SpatialReal returns streaming drive parameters to the Developer's Server. Developer transmits audio and parameters to the Client SDK via a custom transport layer. Client SDK renders the avatar. The custom transport layer must ensure data is delivered **without duplication, loss, or disorder**. **Best Suited For:** * **Custom Transport:** Teams that already have a reliable, controllable transport layer. * **Deep Integration:** Developers willing to handle server-side adaptation for maximum control. * **High Low-Latency Demands:** When you need to optimize the network path manually.
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#64748b" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#111827" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- flowchart LR subgraph Client ["πŸ–₯️ Client"] ClientSDK["SpatialReal Client SDK"] end subgraph Server ["πŸ”§ Developer Backend"] DevServer["Server Logic"] ServerSDK["SpatialReal Server SDK"] end Cloud["☁️ SpatialReal Cloud"] DevServer -- "Audio Stream" --> ServerSDK ServerSDK -- "Send Audio" --> Cloud Cloud -- "Drive Params" --> ServerSDK ServerSDK -- "Animation Stream" --> DevServer DevServer == "Custom Transport: Audio + Animation" ==> ClientSDK style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style Server fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 ```
*** ## Still Not Sure? Check integration overview for a side-by-side comparison and runnable example projects. *** ## Want a Quick Taste? Check this quick start guide to get a simple demo up and running in minutes. *** ## Next Steps Get started with the simplest integration approach Learn how to implement custom transport Set up real-time communication with LiveKit or Agora Explore sample implementations # Client Lifecycle Source: https://docs.spatialreal.ai/concepts/lifecycle SDK initialization, avatar loading, connection management, and resource cleanup. AvatarKit follows a four-stage lifecycle: **Initialize β†’ Load β†’ Connect β†’ Cleanup**. Each stage maps to a core component. ## Overview
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#64748b" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#111827" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- graph LR A["INITIALIZE
AvatarSDK.initialize()"] B["LOAD
AvatarManager.load()"] C["RENDER
AvatarView(avatar)"] D["CONNECT
controller.start()"] A -->B B -->C C -->D ```
| Stage | Component | What happens | | ---------- | ------------------ | --------------------------------------------------------------------------------------------------------- | | Initialize | `AvatarSDK` | Configures app ID, environment, audio format, and driving mode. Must be called once before any other API. | | Load | `AvatarManager` | Downloads and caches avatar assets. Returns an `Avatar` instance. | | Render | `AvatarView` | Creates the rendering surface and its associated `AvatarController`. | | Connect | `AvatarController` | Opens a WebSocket to the driving service. The avatar begins responding to audio. | ## Stage 1: Initialize Call `AvatarSDK.initialize()` once at app startup. This sets global configuration that all subsequent operations depend on. ```typescript theme={null} await AvatarSDK.initialize({ appId: 'YOUR_APP_ID', audioFormat: { channelCount: 1, sampleRate: 16000 }, drivingServiceMode: 'sdk', }); ``` ```swift theme={null} AvatarSDK.initialize( appID: "YOUR_APP_ID", configuration: Configuration( environment: .intl, audioFormat: AudioFormat(sampleRate: 16000), drivingServiceMode: .sdk, logLevel: .warning ) ) ``` ```kotlin theme={null} AvatarSDK.initialize( context = applicationContext, appId = "YOUR_APP_ID", configuration = Configuration( environment = Environment.Intl, audioFormat = AudioFormat(16000), drivingServiceMode = DrivingServiceMode.SDK, logLevel = LogLevel.INFO ) ) ``` Set `AvatarSDK.sessionToken` before connecting. The token authenticates your session with the driving service. ## Stage 2: Load Avatar Use `AvatarManager` to download and cache avatar assets. Loading is asynchronous and supports progress tracking. ```typescript theme={null} const avatar = await AvatarManager.load('AVATAR_ID', (progress) => { console.log('Loading:', progress); }); ``` ```swift theme={null} let avatar = try await AvatarManager.shared.load(id: "AVATAR_ID") { progress in print("Loading: \(progress)") } ``` ```kotlin theme={null} val avatar = AvatarManager.load("AVATAR_ID") { progress -> Log.d("Avatar", "Loading: $progress") } ``` Loaded assets are cached locally. Subsequent loads for the same avatar ID skip the download. Use `AvatarManager.clear(id:)` or `clearAll()` to manage cache. ## Stage 3: Render Create an `AvatarView` with the loaded avatar. The view automatically creates an `AvatarController` you can access to manage the connection and send audio. ```typescript theme={null} const avatarView = new AvatarView(avatar, containerElement); const controller = avatarView.controller; ``` ```swift theme={null} let avatarView = AvatarView(avatar: avatar) let controller = avatarView.controller ``` ```kotlin theme={null} val avatarView = AvatarView(context) avatarView.init(avatar, lifecycleScope) val controller = avatarView.controller ``` ## Stage 4: Connect and Interact Call `controller.start()` to open the WebSocket connection. Once connected, send audio with `controller.send()` (SDK Mode) or `controller.yield()` (Host Mode).
```mermaid actions={false} theme={null} --- config: "look": "handDrawn" "theme": "base" "themeVariables": "background": "#ffffff" "textColor": "#111827" "lineColor": "#000000" "primaryColor": "#e8f4fd" "primaryTextColor": "#111827" "primaryBorderColor": "#2196F3" "secondaryColor": "#f3e5f5" "secondaryTextColor": "#000000" "secondaryBorderColor": "#9C27B0" "tertiaryColor": "#fff3e0" "tertiaryTextColor": "#111827" "tertiaryBorderColor": "#FF9800" "clusterBkg": "#f8fafc" "clusterBorder": "#cbd5e1" "edgeLabelBackground": "#ffffff" "actorBkg": "#e8f4fd" "actorBorder": "#2196F3" "actorTextColor": "#111827" "noteBkgColor": "#fff7ed" "noteTextColor": "#111827" "signalColor": "#64748b" "signalTextColor": "#111827" --- sequenceDiagram autonumber participant C as Controller participant S as Connection/Voice State C->>S: controller.start() Note right of S: onConnectionState: connecting S-->>C: onConnectionState: connected S-->>C: onConversationState: idle rect rgb(240, 240, 240) Note over C,S: Audio Streaming C->>S: send(audio, end: false) S-->>C: onConversationState: playing C->>S: ... multiple chunks ... C->>S: send(audio, end: false) S-->>C: onConversationState: playing C->>S: send(audio chunk, end: true) end S-->>C: onConversationState: playing S-->>C: onConversationState: idle ```
Call `controller.interrupt()` to stop the current playback immediately. Call `controller.close()` when done. ## Cleanup Clean up resources when the avatar is no longer needed. ```typescript theme={null} avatarView.dispose(); ``` ```swift theme={null} // AvatarView automatically releases resources when removed from the view hierarchy. // No explicit cleanup call needed. ``` ```kotlin theme={null} avatarView.dispose() ``` On Web and Android, always call `avatarView.dispose()` when the view is no longer needed. On iOS, resources are released automatically when the view is deallocated. # Agent Quickstart Walkthrough Source: https://docs.spatialreal.ai/guide/agent-quickstart-walkthrough Step-by-step code walkthrough for the LiveKit Agent + SpatialReal quickstart This page explains the quickstart by backend/frontend steps and keeps each key file in copyable code blocks. Use the clone-and-run flow first, then come back here for implementation details. The backend uses two processes: token server and agent worker.
```bash title="backend/.env" theme={null} LIVEKIT_URL=wss://your-project.livekit.cloud # https://cloud.livekit.io LIVEKIT_API_KEY=your_api_key # https://cloud.livekit.io LIVEKIT_API_SECRET=your_api_secret # https://cloud.livekit.io GOOGLE_API_KEY=your_google_api_key # https://aistudio.google.com/api-keys E2E_GOOGLE_MODEL=gemini-2.5-flash-native-audio-preview-12-2025 E2E_GOOGLE_VOICE=Puck SPATIALREAL_API_KEY=your_api_key # https://app.spatialreal.ai/apps SPATIALREAL_APP_ID=your_app_id # https://app.spatialreal.ai/apps SPATIALREAL_AVATAR_ID=6aed28f9-674c-4ffb-89ee-b447b28aa3ed # https://app.spatialreal.ai/avatars/library ``` ```toml title="backend/pyproject.toml" theme={null} [project] name = "spatialreal-agent-quickstart-backend" version = "0.1.0" requires-python = ">=3.10,<3.15" dependencies = [ "flask>=3.0.0", "flask-cors>=4.0.0", "python-dotenv>=1.0.0", "livekit-api>=1.1.0", "livekit-agents==1.4.5", "livekit-plugins-google==1.4.5", "livekit-plugins-spatialreal==1.4.5", ] ```
`token_server.py` issues the browser token and dispatches the LiveKit agent.
```python title="backend/token_server.py" theme={null} import asyncio import os from datetime import timedelta from uuid import uuid4 from dotenv import load_dotenv from flask import Flask, jsonify, request from flask_cors import CORS from livekit import api load_dotenv() app = Flask(__name__) CORS(app) LIVEKIT_URL = os.getenv("LIVEKIT_URL") LIVEKIT_API_KEY = os.getenv("LIVEKIT_API_KEY") LIVEKIT_API_SECRET = os.getenv("LIVEKIT_API_SECRET") async def create_room_and_dispatch(room_name: str) -> None: lkapi = api.LiveKitAPI(LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET) try: try: await lkapi.room.create_room(api.CreateRoomRequest(name=room_name)) except Exception: pass await lkapi.agent_dispatch.create_dispatch( api.CreateAgentDispatchRequest(room=room_name, agent_name="voice-assistant") ) finally: await lkapi.aclose() @app.route("/token", methods=["POST"]) def token(): if not LIVEKIT_API_KEY or not LIVEKIT_API_SECRET: return jsonify({"error": "LiveKit credentials not configured"}), 500 body = request.get_json() or {} room_name = body.get("room", "voice-agent-room") requested_identity = body.get("identity") identity = ( requested_identity.strip() if isinstance(requested_identity, str) and requested_identity.strip() else f"browser-{uuid4().hex[:8]}" ) jwt = ( api.AccessToken(LIVEKIT_API_KEY, LIVEKIT_API_SECRET) .with_identity(identity) .with_name(identity) .with_ttl(timedelta(hours=1)) .with_grants( api.VideoGrants( room_join=True, room=room_name, can_publish=True, can_subscribe=True, can_publish_data=True, ) ) .to_jwt() ) try: asyncio.run(create_room_and_dispatch(room_name)) except Exception as exc: print(f"Warning: Failed to dispatch agent: {exc}") return jsonify( { "token": jwt, "url": LIVEKIT_URL, "room": room_name, "identity": identity, } ) if __name__ == "__main__": app.run(host="0.0.0.0", port=8080, debug=True) ```
`agent.py` runs the realtime LLM and starts SpatialReal avatar publishing in the same room.
```python title="backend/agent.py" theme={null} import os from dotenv import load_dotenv from livekit.agents import Agent, AgentSession, AutoSubscribe, JobContext, WorkerOptions, cli from livekit.plugins import google from livekit.plugins.spatialreal import AvatarSession load_dotenv() class VoiceAssistant(Agent): def __init__(self) -> None: super().__init__( instructions="You are a helpful voice assistant. Keep replies short and natural." ) async def entrypoint(ctx: JobContext) -> None: await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) session = AgentSession( llm=google.realtime.RealtimeModel( model=os.getenv("E2E_GOOGLE_MODEL", "gemini-2.5-flash"), voice=os.getenv("E2E_GOOGLE_VOICE", "Puck"), api_key=os.getenv("GOOGLE_API_KEY"), ) ) avatar = AvatarSession() await avatar.start(session, room=ctx.room) await session.start(agent=VoiceAssistant(), room=ctx.room) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, agent_name="voice-assistant")) ```
Frontend reads token endpoint and room name from env and proxies `/token` locally.
```bash title="frontend/.env" theme={null} VITE_SPATIALREAL_APP_ID=your_app_id # https://app.spatialreal.ai/apps VITE_SPATIALREAL_AVATAR_ID=6aed28f9-674c-4ffb-89ee-b447b28aa3ed # https://app.spatialreal.ai/avatars/library VITE_TOKEN_ENDPOINT=http://localhost:8080/token VITE_ROOM_NAME=voice-agent-room ``` ```ts title="frontend/vite.config.ts" theme={null} import { defineConfig } from 'vite' import react from '@vitejs/plugin-react' import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite' export default defineConfig({ plugins: [react(), avatarkitVitePlugin()], server: { port: 3000, proxy: { '/token': { target: 'http://localhost:8080', changeOrigin: true, }, }, }, }) ```
This frontend uses AvatarKit UI components (`SpatialRealAvatarProvider`, `SpatialRealAvatarCanvas`, etc.). To set up the same UI stack: 1. Complete shadcn base setup: [shadcn manual installation](https://ui.shadcn.com/docs/installation/manual) 2. Install dependencies and component files:
```bash theme={null} pnpm add @spatialwalk/avatarkit @livekit/components-react @livekit/components-styles npx shadcn@latest add https://ui.spatialreal.ai/r/spatialreal-avatar.json ```
This frontend is implemented with [AvatarKit UI](/sdk-reference/web-sdk/avatarkit-ui). `App.tsx` requests token, mounts `SpatialRealAvatarProvider`, and controls microphone state.
```tsx title="frontend/src/App.tsx" theme={null} import { useState } from 'react' import '@livekit/components-styles' import { Button } from '@/components/ui/button' import { Track } from 'livekit-client' import { SpatialRealAvatarCanvas, SpatialRealAvatarError, SpatialRealAvatarFrame, SpatialRealAvatarLoading, SpatialRealAvatarProvider, SpatialRealAvatarStatus, useSpatialRealAvatarContext, } from '@/components/spatialreal-avatar' type AvatarConnection = { url: string token: string roomName: string } type TokenResponse = { url: string token: string room: string } function AvatarPanel({ onExit }: { onExit: () => void }) { const avatar = useSpatialRealAvatarContext() const [pending, setPending] = useState(false) const micPublication = avatar.room?.localParticipant.getTrackPublication(Track.Source.Microphone) const hasPublishedMic = Boolean(micPublication?.track) const isMicMuted = micPublication?.isMuted ?? false const toggleMicrophone = async () => { if (pending || !avatar.isConnected) return setPending(true) try { if (!hasPublishedMic) { await avatar.startPublishingMicrophone() } else if (isMicMuted) { await micPublication?.unmute() } else { await micPublication?.mute() } } finally { setPending(false) } } const disconnect = async () => { if (pending) return setPending(true) try { await avatar.disconnect() } finally { setPending(false) onExit() } } return (
{avatar.error ? avatar.error.message : avatar.isConnected ? !hasPublishedMic ? 'Connected. Mic is off, enable mic to talk.' : isMicMuted ? 'Connected. Mic is muted.' : 'Connected. Mic is on, start speaking.' : 'Connecting...'}
) } export default function App() { const appId = import.meta.env.VITE_SPATIALREAL_APP_ID const avatarId = import.meta.env.VITE_SPATIALREAL_AVATAR_ID const tokenEndpoint = import.meta.env.VITE_TOKEN_ENDPOINT || '/token' const roomName = import.meta.env.VITE_ROOM_NAME || 'voice-agent-room' const [connection, setConnection] = useState(null) const [connecting, setConnecting] = useState(false) const [status, setStatus] = useState('Click Connect to start') const requestConnection = async () => { if (connecting || connection) return if (!appId || !avatarId) { setStatus('Missing VITE_SPATIALREAL_APP_ID or VITE_SPATIALREAL_AVATAR_ID in .env') return } setConnecting(true) setStatus('Requesting token...') try { const response = await fetch(tokenEndpoint, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ room: roomName }), }) if (!response.ok) throw new Error('Failed to fetch token') const payload = (await response.json()) as TokenResponse if (!payload.url || !payload.token || !payload.room) { throw new Error('Token response is missing url, token, or room') } setConnection({ url: payload.url, token: payload.token, roomName: payload.room, }) setStatus('Connecting avatar...') } catch (error) { setStatus(error instanceof Error ? error.message : 'Failed to request token') } finally { setConnecting(false) } } if (!appId || !avatarId) { return
Missing required environment variables. Check `.env`.
} return (
{connection ? ( setStatus('Connected. Click Enable Mic to talk.')} onDisconnected={() => setStatus('Disconnected')} onAvatarError={(error) => setStatus(error.message)} > { setConnection(null) setStatus('Disconnected') }} /> ) : (
{status}
)}
) } ```
Use this sequence to trace issues across backend, agent, and frontend.
```text theme={null} frontend -> /token -> token_server creates JWT + dispatches agent agent worker joins room -> starts Gemini session + AvatarSession frontend connects with token -> avatar renders -> mic publish/unpublish drives conversation ```
# Host Mode Source: https://docs.spatialreal.ai/guide/host-mode Self-managed networking with AvatarKit Server SDK ## What is Host Mode? In Host Mode, **your application** manages the network connection to AvatarKit's server-side SDK. Your server sends encoded messages to your client, and the client SDK receives and **decodes them internally** for synchronized playback and rendering. ```mermaid theme={null} flowchart LR A["πŸ–₯️ AvatarKit
Server SDK"] -->|Encoded Messages| B["Your App
(Network Layer)"] B -->|yieldAudioData| C["AvatarKit
Client SDK"] B -->|yieldFramesData| C C -->|Decode & Render| D["πŸ–₯️ Avatar Rendering"] ``` Host Mode **requires** AvatarKit's server-side SDK to generate the encoded messages. The data passed to `yieldAudioData()` and `yieldFramesData()` are encoded messages from the server SDK β€” not raw audio or animation data you create yourself. ## When to Use * **Custom network layer** β€” you manage the connection between your client and AvatarKit's server SDK yourself * **RTC integration** β€” messages are relayed through a real-time communication server (LiveKit, Agora, etc.) * **Proxy architecture** β€” your backend acts as a relay between the client and AvatarKit server ## Requirements | Requirement | Description | | ------------------------ | ---------------------------------------------------------------------------- | | **App ID** | Obtained from [SpatialReal Studio](https://app.spatialreal.ai) | | **Session Token** | **Not required** on the client side | | **AvatarKit Server SDK** | Your backend must integrate with AvatarKit's server SDK to generate messages | ## SDK Mode vs Host Mode | Aspect | SDK Mode | Host Mode | | -------------------- | ------------------------------------------------ | -------------------------------------------------- | | **Network** | Client SDK connects to AvatarKit server directly | Your app relays messages from AvatarKit Server SDK | | **Message Decoding** | Handled internally | Handled internally (same) | | **Session Token** | Required (client-side) | Not required (client-side) | | **Server SDK** | Not needed | **Required** on your backend | | **Key Methods** | `send()`, `start()`, `close()` | `yieldAudioData()`, `yieldFramesData()` | | **Use Case** | Simplest integration | Custom networking / RTC relay | ## Key Concepts ### ConversationId Management ConversationId links audio and animation messages for a single conversation session: 1. Call `yieldAudioData()` β€” returns a `conversationId` 2. Use that `conversationId` when calling `yieldFramesData()` 3. Messages with a **mismatched** conversationId will be **discarded** 4. Use `getCurrentConversationId()` to retrieve the current active session ID **Important:** Always use the conversationId returned by `yieldAudioData()` when sending animation messages. Mismatched IDs cause messages to be silently dropped. ### Fallback Mechanism If you provide empty animation data (empty array or undefined), the SDK automatically enters **audio-only mode** for that session. Once in audio-only mode, any subsequent animation data for that session is ignored β€” only audio continues playing. ## Get Started GitHub demo repository GitHub demo repository GitHub demo repository # Introduction Source: https://docs.spatialreal.ai/guide/introduction Step-by-step guides for integrating SpatialReal avatars into your application The **Guide** walks you through integrating SpatialReal avatars by **integration mode**. Choose the mode that matches your architecture, then follow the linked guides for your platform (Web, iOS, Android) or backend (LiveKit, Server SDK). ## Integration Modes Client sends audio; SDK handles networking and rendering. Best for simple, client-centric apps. Real-time voice via LiveKit. Use framework plugins or Server SDK with egress. You control the transport. Server SDK sends audio and relays animation to the client SDK. Fully managed voice agent (coming soon). ## Where to Start * **New to SpatialReal?** Start with [Overview β†’ Speech-to-Avatar Quickstart](/overview/speech-to-avatar), then open the guide for your chosen mode. * **Already chose a mode?** Use the sidebar: pick **SDK Mode**, **RTC Mode**, or **Host Mode**, then follow the platform-specific quickstarts (Web, iOS, Android) or server guides (LiveKit). * **Want working code?** Check out our [Demo Projects](/overview/demo-projects) for complete examples you can run immediately. ## Quickstart Code Walkthroughs Understand the SDK Mode quickstart code structure and runtime flow. Understand the LiveKit agent quickstart backend and frontend architecture. A full reference repository with implementation details, including different frontend UI options and multiple backend agent patterns. ## Quick Links | Mode | Client / Platform | Server | | -------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | **SDK** | [Web](/guide/sdk-mode-web) Β· [iOS](/ios-sdk/quickstart) Β· [Android](/android-sdk/quickstart) | Session token from your backend only | | **RTC** | [LiveKit Client](/guide/rtc-livekit-client) | [LiveKit Server](/guide/rtc-livekit-server) | | **Host** | [Web](/web-sdk/host-mode) Β· [iOS](/ios-sdk/host-mode) Β· [Android](/android-sdk/host-mode) | [Python](/server/python-sdk/python-host-mode) Β· [Golang](/server/go-sdk/go-host-mode) Β· [JS](/server/js-sdk/js-host-mode) | # LiveKit Client Source: https://docs.spatialreal.ai/guide/rtc-livekit-client Real-time voice communication with avatars using LiveKit ```mermaid theme={null} flowchart LR A["🎀 Microphone"] --> B["AvatarPlayer"] B <-->|WebRTC| C["LiveKit Server"] B --> D["AvatarKit SDK"] D --> E["πŸ–₯️ Avatar Rendering"] ``` **Web Only:** RTC Mode is currently available for Web applications only. **Do not call `initializeAudioContext()`** β€” it is not needed in RTC Mode. Avatar audio is delivered as a native WebRTC audio track by the LiveKit client SDK, not through the AvatarKit SDK's internal audio player. The `AvatarPlayer` adapter only feeds **animation data** to the SDK for rendering; audio playback is handled entirely by LiveKit's WebRTC stack. ## Installation **Critical:** We are only compatable with `livekit-client` under version `2.17`, since `2.17` introduced single PC by default. We are working on adding support for `2.17`+ in a future release, but for now please ensure you install version `2.16.1` to avoid compatibility issues. ```bash theme={null} pnpm add @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` ```bash theme={null} npm install @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` ```bash theme={null} yarn add @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` You also need to configure your build tool for WASM files β€” see [Build Tool Configuration](/guide/sdk-mode-web#build-tool-configuration). ## Authentication | Credential | How to Obtain | Notes | | ----------------- | ------------------------------------------------------------- | --------------------------------- | | **App ID** | [SpatialReal Studio](https://app.spatialreal.ai) β†’ Create App | For SDK initialization | | **Session Token** | Your backend β†’ AvatarKit Server | For avatar loading (max 24 hours) | | **LiveKit Token** | Your backend β†’ LiveKit Server | For RTC room connection | ## Quick Start For a complete client + server implementation, see [AvatarKit Voice Agent Demo](https://github.com/spatialwalk/avatarkit-voice-agent-demo/tree/main/livekit-cascade-voice-agent). ```typescript theme={null} import { AvatarSDK, AvatarManager, AvatarView, DrivingServiceMode, Environment } from '@spatialwalk/avatarkit' await AvatarSDK.initialize('your-app-id', { environment: Environment.intl, drivingServiceMode: DrivingServiceMode.host, // MUST be host for RTC }) AvatarSDK.setSessionToken('your-session-token') ``` ```typescript theme={null} const avatar = await AvatarManager.shared.load('avatar-id') const container = document.getElementById('avatar-container')! const avatarView = new AvatarView(avatar, container) ``` ```typescript theme={null} import { AvatarPlayer, LiveKitProvider } from '@spatialwalk/avatarkit-rtc' const provider = new LiveKitProvider() const player = new AvatarPlayer(provider, avatarView, { logLevel: 'warning', }) ``` ```typescript theme={null} await player.connect({ url: 'wss://your-livekit-server.com', token: 'your-livekit-token', roomName: 'room-name', }) ``` ```typescript theme={null} // Start microphone publishing await player.startPublishing() // Stop microphone await player.stopPublishing() // Disconnect when done await player.disconnect() ``` ## AvatarPlayer API ### Constructor ```typescript theme={null} new AvatarPlayer(provider: LiveKitProvider, avatarView: AvatarView, options?: AvatarPlayerOptions) ``` ### AvatarPlayerOptions ```typescript theme={null} interface AvatarPlayerOptions { /** Start speaking transition frames, default 5 (~200ms at 25fps) */ transitionStartFrameCount?: number /** End speaking transition frames, default 40 (~1600ms at 25fps) */ transitionEndFrameCount?: number /** Log level: 'info' | 'warning' | 'error' | 'none', default 'warning' */ logLevel?: LogLevel } ``` ### Connection ```typescript theme={null} // Connect to LiveKit server await player.connect(config: LiveKitConnectionConfig) // Disconnect and clean up await player.disconnect() // Reconnect using last config (useful after stalls) await player.reconnect() // Check connection status player.isConnected // boolean player.getConnectionState() // ConnectionState ``` #### LiveKitConnectionConfig ```typescript theme={null} interface LiveKitConnectionConfig { url: string // LiveKit server URL (wss://...) token: string // Auth token from your backend roomName: string // Room name } ``` ### Microphone Control ```typescript theme={null} // Start microphone (requests permission automatically) await player.startPublishing() // Stop microphone await player.stopPublishing() ``` ### Custom Audio Publishing For non-microphone audio sources like audio elements or Web Audio API. ```typescript theme={null} // Publish a custom audio track await player.publishAudio(track: MediaStreamTrack) // Stop custom audio await player.unpublishAudio() ``` | Audio Source | How to Obtain Track | | ------------------ | ------------------------------------------------------------------------ | | `