# Server API Reference Source: https://docs.spatialreal.ai/api-reference/api-reference Server-side API Reference ## Obtain a session token Call the server endpoint `POST /v1/console/session-tokens`. ### Console API host In the examples below, replace `` with the host for your region: | Region | `` | | -------------- | ---------------------------------------- | | `ap-northeast` | `console.ap-northeast.spatialwalk.cloud` | | `us-west` | `console.us-west.spatialwalk.cloud` | ### Request parameters * **Body** * **expireAt**: int, expiration timestamp in seconds. Range: now \< expireAt \< now + 24 hours * **modelVersion**: string, reserved field; omit * **Header** * **X-Api-Key**: your API key ### Request example ```bash theme={null} curl --location --request POST 'https:///v1/console/session-tokens' \ # replace with your console API domain --header 'X-Api-Key: ' \ # replace with your API key --header 'Content-Type: application/json' \ --data-raw '{ "expireAt": 1754824283, "modelVersion": "" }' # set expireAt to the desired expiration time ``` ### Response example ```json theme={null} { "sessionToken": "..." } ``` # Authentication Source: https://docs.spatialreal.ai/api-reference/auth Authentication Flow ## Before you start Before you begin, make sure you have your app-id and api-key. AvatarKit authentication requires a server-side component, so you need to implement an authentication endpoint on your own server. ## Connection flow 1. The client sends an authentication request to your business server. 2. The business server sends a request to SpatialWalk's server to generate a session token, including information such as expiration time in the request body, and providing the api-key. 3. The SpatialWalk server returns the session token to your business server. 4. The business server returns the session token to the client. 5. The client initializes AvatarKit with the session token. 6. Inside AvatarKit, a connection request is created to the SpatialWalk server using the session token. ```mermaid theme={null} sequenceDiagram participant Client as Client participant BizServer as Business Server participant SW as SpatialWalk Server participant Kit as AvatarKit Client->>BizServer: Authentication request BizServer->>SW: Generate session token (with api-key, expiration) SW-->>BizServer: Return session token BizServer-->>Client: Return session token Client->>Kit: Initialize with session token Kit->>SW: Create connection request (session token) SW-->>Kit: Connection response Note over SW,Kit: After token expiration, new connections are rejected
Existing connections are not affected ``` ## Token expiration If you attempt to establish a new connection after the token's configured expiration time, it will be rejected. Existing established connections are not affected. ## Notes * Avoid leaking your api-key; ensure it is only used on the server. * The session token is designed to be single-use. Ensure a new token is used for each connection. > For detailed authentication API docs, see the [API Reference](/api-reference/api-reference) # Regions Source: https://docs.spatialreal.ai/api-reference/regions Available regions, how to configure and how to fallback to other region for higher availability. ## Globally distributed service Our avatar service runs in multiple regions. You can choose the region that’s closest to your infrastructure to minimize latency. ## Available regions Use these region names when selecting a deployment: * **ap-northeast** * **us-west** ## Console API hosts When a doc example uses ``, substitute the host for your region: | Region | `` | | -------------- | ---------------------------------------- | | `ap-northeast` | `console.ap-northeast.spatialwalk.cloud` | | `us-west` | `console.us-west.spatialwalk.cloud` | ## Console and Ingress endpoint URLs When a doc example uses console endpoint url or ingress endpoint url, substitute the URL for your region: | Region | Console Endpoint URL | Ingress Endpoint URL (v2) | | -------------- | ----------------------------------------------------------- | ----------------------------------------------------------- | | `ap-northeast` | `https://console.ap-northeast.spatialwalk.cloud/v1/console` | `wss://api.ap-northeast.spatialwalk.cloud/v2/driveningress` | | `us-west` | `https://console.us-west.spatialwalk.cloud/v1/console` | `wss://api.us-west.spatialwalk.cloud/v2/driveningress` | ## Cross-region portability The following identifiers/credentials are **not region-specific** and can be used across regions: * **App ID** * **API Key** * **Avatar ID** * **Session Token** (generated by the console API) This means you can switch regions or run multi-region without recreating apps, rotating keys, or re-generating avatars. ## High availability with regional fallback Services in different regions run on **separate infrastructure**, so you can improve availability by implementing fallback: * **Primary + backup**: if requests to the primary region fail, retry against another region. * **Data sync**: since the identifiers/credentials mentioned above are portable, you can keep using them when failing over to another region. # Audio Specification Source: https://docs.spatialreal.ai/concepts/audio-spec Accepted PCM audio format for SpatialReal services. ## Supported Input Audio SpatialReal services currently accept **mono 16-bit PCM (`s16le`)** audio as raw PCM bytes. Raw PCM bytes in `s16le` format `1` channel (mono) `16-bit` samples `8000` to `48000` Hz from the supported set below ## Specification | Property | Value | | ------------ | ------------------------------------------------------------------- | | Sample Rate | One of `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000` | | Channels | `1` (mono) | | Bit Depth | `16-bit` | | Format | Raw PCM bytes | | PCM Encoding | `s16le` | If your source audio uses stereo channels, floating point samples, compressed codecs, or an unsupported sample rate, convert it before sending it to SpatialReal. ## What `s16le` Means * `s16` means each audio sample is a signed 16-bit integer. * `le` means the bytes are stored in little-endian order. * `mono` means the stream contains a single channel only. ## Common Conversion Checklist Before sending audio to the SDK or service, make sure your pipeline outputs: * a supported sample rate * mono audio * 16-bit signed PCM samples * raw PCM bytes rather than WAV headers or compressed audio frames For a practical overview of common audio encodings and container formats, see the [FFmpeg audio types reference](https://trac.ffmpeg.org/wiki/audio%20types). # Integration Modes Source: https://docs.spatialreal.ai/concepts/integrations Choose the right integration approach for your use case ## Choose Your Integration Mode SpatialReal offers four distinct integration modes to suit different architectural needs, latency requirements, and development preferences. Client-centric integration with minimal server-side changes Ultra-low latency via LiveKit or Agora Seamless integration with LiveKit Agents or TEN Framework Full control with custom transport layer *** ## SDK Mode In this mode, the client-side application manages the audio input. The developer passes the audio to the SpatialReal Client SDK, which handles the server interaction to retrieve animation data and render the avatar. Developer passes audio to the SpatialReal SDK on the client. SDK calls the inference service. SDK receives drive parameters and plays the avatar. **Best Suited For:** * **Client-Centric Logic:** Scenarios where the voice agent logic resides primarily on the device. * **Moderate Latency:** Projects where ultra-low latency is not the absolute priority. * **Simplified Architecture:** Minimal server-side development required (only for authentication), allowing most logic to remain on the client. ```mermaid actions={false} theme={null} --- config: "look": "handDrawn" --- flowchart LR subgraph Client ["🖥️ Client"] App["Client App (Voice Logic)"] SDK["SpatialReal SDK"] end Cloud["☁️ SpatialReal Cloud"] App -- "Audio Data" --> SDK SDK -- "Send Audio" --> Cloud Cloud -- "Drive Params" --> SDK SDK -- "Render & Play" --> App style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 ``` *** ## RTC Mode This mode leverages Real-Time Communication (RTC) infrastructure (currently supporting LiveKit and Agora). The developer streams audio to SpatialReal, but instead of returning data to the developer, SpatialReal pushes the drive parameters directly into an RTC room/channel. Developer streams audio via Server SDK. SpatialReal Service pushes audio and avatar drive data to an RTC Room. Client joins the room using the SpatialReal RTC Client to subscribe, parse, and play. The stream contains **binary drive parameters and audio**, not a pre-rendered video feed. Therefore, it cannot be played by standard video players; it requires the SpatialReal Client to render. **Best Suited For:** * **Existing RTC Users:** Teams already using LiveKit or Agora for voice agents but *not* using their specific agent frameworks (e.g., LiveKit Agents or TEN Framework). * **Server-Managed State:** Scenarios requiring server-side management of conversation state (e.g., handling interruptions). * **Ultra-Low Latency:** Leveraging established RTC networks for minimal delay. ```mermaid actions={false} theme={null} --- config: "look": "handDrawn" --- flowchart BT subgraph Server ["🔧 Developer Backend"] DevServer["Server Logic"] ServerSDK["SpatialReal Server SDK"] DevServer --> ServerSDK end subgraph Cloud ["☁️ SpatialReal Cloud"] Service["Avatar Service"] RTCPub["RTC Egress"] Service --> RTCPub end subgraph RTC ["📡 RTC Provider"] Room["LiveKit Room / Agora Channel"] end subgraph Client ["🖥️ Client"] RTCClient["SpatialReal RTC Client"] end ServerSDK -->|Audio Stream| Service RTCPub -->|Drive Params + Audio| Room Room -->|Subscribe| RTCClient style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style RTC fill:#f3e5f5,stroke:#9C27B0,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 style Server fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 ``` *** ## Framework Plugin This is the most streamlined approach for modern voice agent frameworks. Developers use a provided plugin that sits inside the voice agent pipeline (e.g., LiveKit Agents, TEN Framework). The Plugin intercepts audio from the agent pipeline. It sends audio to SpatialReal and configures the RTC push parameters. SpatialReal pushes data to the RTC room (Client side is identical to RTC Mode). **Best Suited For:** * **Framework Users:** Teams already using frameworks like LiveKit Agents or TEN Framework. * **Rapid Integration:** Low implementation cost; the plugin automatically handles signal processing and conversation state (interruption logic). * **Migration:** Easy to switch if you are currently using other avatar services within these frameworks. ```mermaid actions={false} theme={null} --- config: "look": "handDrawn" --- flowchart BT subgraph Agent ["🤖 Voice Agent"] Pipeline["Agent Pipeline (LiveKit / TEN)"] Plugin["SpatialReal Plugin"] end subgraph Cloud ["☁️ SpatialReal Cloud"] Service["Inference Service"] RTCPub["RTC Egress"] end subgraph RTC ["📡 RTC Provider"] Room["LiveKit Room / Agora Channel"] end subgraph Client ["🖥️ Client"] RTCClient["SpatialReal RTC Client"] end Pipeline -. Audio .-> Plugin Plugin -- Audio & Config --> Service Service --> RTCPub RTCPub -- RTC Push --> Room Room -- Subscribe --> RTCClient style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style RTC fill:#f3e5f5,stroke:#9C27B0,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 style Agent fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 ``` *** ## Host Mode In Host Mode, the developer acts as the bridge. You use the SpatialReal Server SDK to stream audio to the service and receive streaming drive parameters back. It is then your responsibility to transport this data to the client. Developer sends streaming audio via Server SDK. SpatialReal returns streaming drive parameters to the Developer's Server. Developer transmits audio and parameters to the Client SDK via a custom transport layer. Client SDK renders the avatar. The custom transport layer must ensure data is delivered **without duplication, loss, or disorder**. **Best Suited For:** * **Custom Transport:** Teams that already have a reliable, controllable transport layer. * **Deep Integration:** Developers willing to handle server-side adaptation for maximum control. * **High Low-Latency Demands:** When you need to optimize the network path manually. ```mermaid actions={false} theme={null} --- config: "look": "handDrawn" --- flowchart LR subgraph Client ["🖥️ Client"] ClientSDK["SpatialReal Client SDK"] end subgraph Server ["🔧 Developer Backend"] DevServer["Server Logic"] ServerSDK["SpatialReal Server SDK"] end Cloud["☁️ SpatialReal Cloud"] DevServer -- "Audio Stream" --> ServerSDK ServerSDK -- "Send Audio" --> Cloud Cloud -- "Drive Params" --> ServerSDK ServerSDK -- "Animation Stream" --> DevServer DevServer == "Custom Transport: Audio + Animation" ==> ClientSDK style Client fill:#e8f4fd,stroke:#2196F3,stroke-width:2px,color:#000 style Server fill:#e8eaf6,stroke:#3F51B5,stroke-width:2px,color:#000 style Cloud fill:#fff3e0,stroke:#FF9800,stroke-width:2px,color:#000 ``` *** ## Still Not Sure? Check integration overview for a side-by-side comparison and runnable example projects. *** ## Want a Quick Taste? Check this quick start guide to get a simple demo up and running in minutes. *** ## Next Steps Get started with the simplest integration approach Learn how to implement custom transport Set up real-time communication with LiveKit or Agora Explore sample implementations # Host Mode Source: https://docs.spatialreal.ai/guide/host-mode Self-managed networking with AvatarKit Server SDK ## What is Host Mode? In Host Mode, **your application** manages the network connection to AvatarKit's server-side SDK. Your server sends encoded messages to your client, and the client SDK receives and **decodes them internally** for synchronized playback and rendering. ```mermaid theme={null} flowchart LR A["🖥️ AvatarKit
Server SDK"] -->|Encoded Messages| B["Your App
(Network Layer)"] B -->|yieldAudioData| C["AvatarKit
Client SDK"] B -->|yieldFramesData| C C -->|Decode & Render| D["🖥️ Avatar Rendering"] ``` Host Mode **requires** AvatarKit's server-side SDK to generate the encoded messages. The data passed to `yieldAudioData()` and `yieldFramesData()` are encoded messages from the server SDK — not raw audio or animation data you create yourself. ## When to Use * **Custom network layer** — you manage the connection between your client and AvatarKit's server SDK yourself * **RTC integration** — messages are relayed through a real-time communication server (LiveKit, Agora, etc.) * **Proxy architecture** — your backend acts as a relay between the client and AvatarKit server ## Requirements | Requirement | Description | | ------------------------ | ---------------------------------------------------------------------------- | | **App ID** | Obtained from [SpatialReal Studio](https://app.spatialreal.ai) | | **Session Token** | **Not required** on the client side | | **AvatarKit Server SDK** | Your backend must integrate with AvatarKit's server SDK to generate messages | ## SDK Mode vs Host Mode | Aspect | SDK Mode | Host Mode | | -------------------- | ------------------------------------------------ | -------------------------------------------------- | | **Network** | Client SDK connects to AvatarKit server directly | Your app relays messages from AvatarKit Server SDK | | **Message Decoding** | Handled internally | Handled internally (same) | | **Session Token** | Required (client-side) | Not required (client-side) | | **Server SDK** | Not needed | **Required** on your backend | | **Key Methods** | `send()`, `start()`, `close()` | `yieldAudioData()`, `yieldFramesData()` | | **Use Case** | Simplest integration | Custom networking / RTC relay | ## Key Concepts ### ConversationId Management ConversationId links audio and animation messages for a single conversation session: 1. Call `yieldAudioData()` — returns a `conversationId` 2. Use that `conversationId` when calling `yieldFramesData()` 3. Messages with a **mismatched** conversationId will be **discarded** 4. Use `getCurrentConversationId()` to retrieve the current active session ID **Important:** Always use the conversationId returned by `yieldAudioData()` when sending animation messages. Mismatched IDs cause messages to be silently dropped. ### Fallback Mechanism If you provide empty animation data (empty array or undefined), the SDK automatically enters **audio-only mode** for that session. Once in audio-only mode, any subsequent animation data for that session is ignored — only audio continues playing. ## Get Started GitHub demo repository GitHub demo repository GitHub demo repository # Introduction Source: https://docs.spatialreal.ai/guide/introduction Step-by-step guides for integrating SpatialReal avatars into your application The **Guide** walks you through integrating SpatialReal avatars by **integration mode**. Choose the mode that matches your architecture, then follow the linked guides for your platform (Web, iOS, Android) or backend (LiveKit, Server SDK). ## Integration Modes Client sends audio; SDK handles networking and rendering. Best for simple, client-centric apps. Real-time voice via LiveKit. Use framework plugins or Server SDK with egress. You control the transport. Server SDK sends audio and relays animation to the client SDK. Fully managed voice agent (coming soon). ## Where to Start * **New to SpatialReal?** Start with [Overview → Speech-to-Avatar Quickstart](/overview/speech-to-avatar), then open the guide for your chosen mode. * **Already chose a mode?** Use the sidebar: pick **SDK Mode**, **RTC Mode**, or **Host Mode**, then follow the platform-specific quickstarts (Web, iOS, Android) or server guides (LiveKit). * **Want working code?** Check out our [Demo Projects](/overview/demo-projects) for complete examples you can run immediately. A full reference repository with implementation details, including different frontend UI options and multiple backend agent patterns. ## Quick Links | Mode | Client / Platform | Server | | -------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | **SDK** | [Web](/guide/sdk-mode-web) · [iOS](/ios-sdk/quickstart) · [Android](/android-sdk/quickstart) | Session token from your backend only | | **RTC** | [LiveKit Client](/guide/rtc-livekit-client) | [LiveKit Server](/guide/rtc-livekit-server) | | **Host** | [Web](/web-sdk/host-mode) · [iOS](/ios-sdk/host-mode) · [Android](/android-sdk/host-mode) | [Python](/server/python-sdk/python-host-mode) · [Golang](/server/go-sdk/go-host-mode) · [JS](/server/js-sdk/js-host-mode) | # LiveKit Client Source: https://docs.spatialreal.ai/guide/rtc-livekit-client Real-time voice communication with avatars using LiveKit ```mermaid theme={null} flowchart LR A["🎤 Microphone"] --> B["AvatarPlayer"] B <-->|WebRTC| C["LiveKit Server"] B --> D["AvatarKit SDK"] D --> E["🖥️ Avatar Rendering"] ``` **Web Only:** RTC Mode is currently available for Web applications only. **Do not call `initializeAudioContext()`** — it is not needed in RTC Mode. Avatar audio is delivered as a native WebRTC audio track by the LiveKit client SDK, not through the AvatarKit SDK's internal audio player. The `AvatarPlayer` adapter only feeds **animation data** to the SDK for rendering; audio playback is handled entirely by LiveKit's WebRTC stack. ## Installation **Critical:** We are only compatable with `livekit-client` under version `2.17`, since `2.17` introduced single PC by default. We are working on adding support for `2.17`+ in a future release, but for now please ensure you install version `2.16.1` to avoid compatibility issues. ```bash theme={null} pnpm add @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` ```bash theme={null} npm install @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` ```bash theme={null} yarn add @spatialwalk/avatarkit @spatialwalk/avatarkit-rtc livekit-client@2.16.1 ``` You also need to configure your build tool for WASM files — see [Build Tool Configuration](/guide/sdk-mode-web#build-tool-configuration). ## Authentication | Credential | How to Obtain | Notes | | ----------------- | ------------------------------------------------------------- | --------------------------------- | | **App ID** | [SpatialReal Studio](https://app.spatialreal.ai) → Create App | For SDK initialization | | **Session Token** | Your backend → AvatarKit Server | For avatar loading (max 24 hours) | | **LiveKit Token** | Your backend → LiveKit Server | For RTC room connection | ## Quick Start For a complete client + server implementation, see [AvatarKit Voice Agent Demo](https://github.com/spatialwalk/avatarkit-voice-agent-demo/tree/main/livekit-cascade-voice-agent). ```typescript theme={null} import { AvatarSDK, AvatarManager, AvatarView, DrivingServiceMode, Environment } from '@spatialwalk/avatarkit' await AvatarSDK.initialize('your-app-id', { environment: Environment.intl, drivingServiceMode: DrivingServiceMode.host, // MUST be host for RTC }) AvatarSDK.setSessionToken('your-session-token') ``` ```typescript theme={null} const avatar = await AvatarManager.shared.load('avatar-id') const container = document.getElementById('avatar-container')! const avatarView = new AvatarView(avatar, container) ``` ```typescript theme={null} import { AvatarPlayer, LiveKitProvider } from '@spatialwalk/avatarkit-rtc' const provider = new LiveKitProvider() const player = new AvatarPlayer(provider, avatarView, { logLevel: 'warning', }) ``` ```typescript theme={null} await player.connect({ url: 'wss://your-livekit-server.com', token: 'your-livekit-token', roomName: 'room-name', }) ``` ```typescript theme={null} // Start microphone publishing await player.startPublishing() // Stop microphone await player.stopPublishing() // Disconnect when done await player.disconnect() ``` ## AvatarPlayer API ### Constructor ```typescript theme={null} new AvatarPlayer(provider: LiveKitProvider, avatarView: AvatarView, options?: AvatarPlayerOptions) ``` ### AvatarPlayerOptions ```typescript theme={null} interface AvatarPlayerOptions { /** Start speaking transition frames, default 5 (~200ms at 25fps) */ transitionStartFrameCount?: number /** End speaking transition frames, default 40 (~1600ms at 25fps) */ transitionEndFrameCount?: number /** Log level: 'info' | 'warning' | 'error' | 'none', default 'warning' */ logLevel?: LogLevel } ``` ### Connection ```typescript theme={null} // Connect to LiveKit server await player.connect(config: LiveKitConnectionConfig) // Disconnect and clean up await player.disconnect() // Reconnect using last config (useful after stalls) await player.reconnect() // Check connection status player.isConnected // boolean player.getConnectionState() // ConnectionState ``` #### LiveKitConnectionConfig ```typescript theme={null} interface LiveKitConnectionConfig { url: string // LiveKit server URL (wss://...) token: string // Auth token from your backend roomName: string // Room name } ``` ### Microphone Control ```typescript theme={null} // Start microphone (requests permission automatically) await player.startPublishing() // Stop microphone await player.stopPublishing() ``` ### Custom Audio Publishing For non-microphone audio sources like audio elements or Web Audio API. ```typescript theme={null} // Publish a custom audio track await player.publishAudio(track: MediaStreamTrack) // Stop custom audio await player.unpublishAudio() ``` | Audio Source | How to Obtain Track | | ------------------ | ------------------------------------------------------------------------ | | `