Skip to main content

Core Classes

AvatarKit

The core management class of the SDK, responsible for initialization and global configuration.
/// SDK entry class
export class AvatarKit {
  /// Initialize SDK
  /// @param appId - Application ID
  /// @param configuration - SDK configuration
  static async initialize(
    appId: string,
    configuration: Configuration
  ): Promise<void>
  
  /// Check initialization status
  static readonly isInitialized: boolean
  
  /// Get the App ID provided during initialization
  static readonly appId: string | null
  
  /// Get the configuration object provided during initialization
  static readonly configuration: Configuration | null
  
  /// Set session token for WebSocket authentication
  /// @param token - Session token
  /// Note: sessionToken is not in the Configuration interface, it must be set separately using this method
  static setSessionToken(token: string): void
  
  /// Get the currently set session token
  static readonly sessionToken: string | null
  
  /// Set user ID for logging and analytics
  /// @param userId - User ID
  static setUserId(userId: string): void
  
  /// Get the currently set user ID
  static readonly userId: string | null
  
  /// Get SDK version
  static readonly version: string
  
  /// Clean up SDK global resources
  /// If you no longer need the digital human SDK, you must call this to release all occupied resources, including WASM modules, memory, etc.
  /// Note: Only call this when the entire application no longer needs digital human functionality.
  static cleanup(): void
}

AvatarManager

Character resource manager, responsible for downloading, caching, and loading character data. Use the singleton instance via AvatarManager.shared.
export class AvatarManager {
  /// Get singleton instance
  static readonly shared: AvatarManager
  
  /// Load character resources
  /// @param characterId - Character ID
  /// @param onProgress - Progress callback (optional)
  /// @returns Promise<Avatar> - Successfully loaded character object
  async load(
    characterId: string,
    onProgress?: (progress: LoadProgressInfo) => void
  ): Promise<Avatar>
  
  /// Clear cache
  clearCache(): void
}

AvatarView

3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages AvatarController.
export class AvatarView {
  /// Create view instance
  /// @param avatar - Loaded character object
  /// @param container - Container element (required)
  ///   - Canvas automatically uses the full size of the container (width and height)
  ///   - Canvas aspect ratio adapts to container size - set container size to control aspect ratio
  ///   - Canvas is automatically added to the container
  constructor(avatar: Avatar, container: HTMLElement)
  
  /// Real-time communication controller
  readonly avatarController: AvatarController
  
  /// Wait for first frame to render
  readonly ready: Promise<void>
  
  /// Get rendering Canvas element
  getCanvas(): HTMLCanvasElement
  
  /// Set background image
  /// @param imageUrl - Image URL
  setBackgroundImage(imageUrl: string): void
  
  /// Update camera configuration
  /// @param config - Camera configuration object
  updateCameraConfig(config: CameraConfig): void
  
  /// Clean up view resources
  dispose(): void
}

AvatarController

Real-time communication controller that handles WebSocket connections and audio/video data. Important Notes:
  • The sendText() method is not currently supported and will throw an error if called.
  • start() and close() are only available in SDK mode
  • yieldAudioData() and yieldFramesData() are only available in Host mode
  • interrupt(), clear(), getCurrentConversationId(), setVolume(), and getVolume() are available in both modes
  • Volume Control: Use setVolume(volume) to control audio playback volume (0.0 to 1.0). This only affects the avatar’s audio player, not system volume. Volume changes take effect immediately, including for currently playing audio.
export class AvatarController {
  /// Start WebSocket connection
  async start(): Promise<void>
  
  /// Send audio data
  /// @param audioData - Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
  ///   - Sample rate: 16kHz (16000 Hz) - backend requirement
  ///   - Format: PCM16 (16-bit signed integer, little-endian)
  ///   - Channels: Mono (single channel)
  ///   - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes
  /// @param end - Whether to end sending
  ///   - `false` (default) - Continue sending audio data for current conversation
  ///   - `true` - Mark the end of current conversation round. After end=true, sending new audio data will interrupt any ongoing playback from the previous conversation round
  /// @returns conversationId - Conversation ID (mainly used internally in SDK mode, not required for basic usage)
  /// Important Notes:
  /// - SDK will start returning animation data and playing once it collects enough audio data, no need to wait for `end=true`
  /// - After `end=true`, sending new audio data will interrupt any ongoing playback from the previous conversation round
  send(audioData: ArrayBuffer, end?: boolean): string
  
  /// Stream audio chunks (Host mode only)
  /// @param data - Audio chunk data (Uint8Array format, 16kHz mono PCM16)
  /// @param isLast - Whether this is the last chunk (default: false)
  /// @returns conversationId - Conversation ID for this audio session
  /// Important Notes:
  /// - Can be called directly to start a new session
  /// - If no conversationId exists, a new one will be automatically generated
  /// - Returns the conversationId that must be used when sending animation data via yieldFramesData()
  yieldAudioData(data: Uint8Array, isLast?: boolean): string
  
  /// Stream animation keyframes (Host mode only)
  /// @param keyframes - Animation keyframes (obtained from your service)
  /// @param conversationId - Conversation ID (required)
  ///   - Use getCurrentConversationId() or yieldAudioData() to get conversationId
  ///   - Animation data with mismatched conversationId will be discarded
  /// Important Notes:
  /// - Requires a valid conversationId from audio data
  /// - Animation data with mismatched conversationId will be discarded
  /// - Must send audio data first to get conversationId before calling this method
  yieldFramesData(keyframes: any[], conversationId: string): void
  
  /// Interrupt current playback (stops and clears data)
  interrupt(): void
  
  /// Clear all data and resources
  clear(): void
  
  /// Close connection (SDK mode only)
  close(): void
  
  /// Get current conversation ID (for Host mode)
  /// Returns: Current conversationId for the active audio session, or null if no active session
  getCurrentConversationId(): string | null
  
  /// Volume control (affects only avatar audio player, not system volume)
  /// @param volume - Volume level from 0.0 to 1.0
  setVolume(volume: number): void
  
  /// Get current volume level
  /// @returns Current volume level from 0.0 to 1.0
  getVolume(): number
  
  /// Connection state callback (SDK mode only)
  onConnectionState?: (state: ConnectionState) => void
  
  /// Conversation state callback
  onConversationState?: (state: ConversationState) => void
  
  /// Error callback
  onError?: (error: Error) => void
}
Event Descriptions:
  • onConnectionState: Connection state change callback (SDK mode only)
  • onConversationState: Conversation state change callback
  • onError: Error callback

Configuration Types

Configuration

SDK configuration interface
interface Configuration {
  /// Environment configuration
  environment: Environment
  /// Driving service mode (optional, default is 'sdk')
  drivingServiceMode?: DrivingServiceMode
}

enum DrivingServiceMode {
  sdk = 'sdk'   // SDK mode: SDK handles WebSocket communication automatically
}

Environment

Environment enumeration
enum Environment {
  /// China region
  cn = 'cn'
  /// International region
  intl = 'intl'
  /// Test environment
  test = 'test'
}

CameraConfig

Camera configuration interface
interface CameraConfig {
  /// Camera position
  position: [number, number, number]
  /// Camera target point
  target: [number, number, number]
  /// Field of view angle (degrees)
  fov: number
  /// Near clipping plane
  near: number
  /// Far clipping plane
  far: number
  /// Up direction vector (optional)
  up?: [number, number, number]
  /// Aspect ratio (optional)
  aspect?: number
}

State Types

ConnectionState

Connection state enumeration
enum ConnectionState {
  /// Not connected
  disconnected = 'disconnected'
  /// Connecting
  connecting = 'connecting'
  /// Connected
  connected = 'connected'
  /// Connection failed
  failed = 'failed'
}

ConversationState

Conversation state enumeration
enum ConversationState {
  /// Idle state, showing breathing animation
  idle = 'idle'
  /// Playing conversation content (including transition animation period)
  playing = 'playing'
}
State Descriptions:
  • idle: Avatar is in idle state, waiting for conversation to start
  • playing: Avatar is playing conversation content (including during transition animation period)
Note: Target state is notified immediately during transition animation:
  • When transitioning from idle to playing, playing state is notified immediately
  • When transitioning from playing to idle, idle state is notified immediately

LoadProgress

Load progress enumeration
enum LoadProgress {
  /// Downloading
  downloading = 'downloading'
  /// Completed
  completed = 'completed'
  /// Failed
  failed = 'failed'
}

type LoadProgressInfo = {
  /// Load progress type
  type: LoadProgress
  /// Progress percentage 0-100
  progress?: number
  /// Error information
  error?: Error
}

Fallback Mechanism

The SDK includes a fallback mechanism to ensure audio playback continues even when animation data is unavailable:
  • SDK Mode Connection Failure: If WebSocket connection fails to establish within 15 seconds, the SDK automatically enters fallback mode. In this mode, audio data can still be sent and will play normally, even though no animation data will be received from the server. This ensures that audio playback is not interrupted even when the service connection fails.
  • SDK Mode Server Error: If the server returns an error after connection is established, the SDK automatically enters audio-only mode for that session and continues playing audio independently.
  • Once in audio-only mode, any subsequent animation data for that session will be ignored, and only audio will continue playing.
  • The fallback mode is interruptible, just like normal playback mode.
  • Connection state callbacks (onConnectionState) will notify you when connection fails or times out, allowing you to handle the fallback state appropriately.

Usage Examples

Basic Initialization

1

Create Configuration

import { AvatarKit, Environment, DrivingServiceMode } from '@spatialwalk/avatarkit'

await AvatarKit.initialize('your-app-id', {
  environment: Environment.cn,
  drivingServiceMode: DrivingServiceMode.sdk, // Optional, 'sdk' is default
})
2

Load Character

import { AvatarManager } from '@spatialwalk/avatarkit'

const avatarManager = AvatarManager.shared
const avatar = await avatarManager.load('character-id', (progress) => {
  console.log(`Loading: ${progress.progress}%`)
})
Ensure the character ID is valid and network connection is normal. It’s recommended to check network status before loading.

Create Digital Human View

1

Create View Instance

import { AvatarView } from '@spatialwalk/avatarkit'

const container = document.getElementById('avatar-container')
const avatarView = new AvatarView(avatar, container)

// Wait for first frame to render
await avatarView.ready
2

Start Real-time Communication

// Set up event listeners
avatarView.avatarController.onConnectionState = (state) => {
  console.log('Connection state:', state)
}

avatarView.avatarController.onConversationState = (state) => {
  console.log('Conversation state:', state)
}

// Start connection
await avatarView.avatarController.start()

Audio Processing

⚠️ Important: The SDK requires audio data to be in 16kHz mono PCM16 format:
  • Sample Rate: 16kHz (16000 Hz) - This is a backend requirement
  • Channels: Mono (single channel)
  • Format: PCM16 (16-bit signed integer, little-endian)
  • Byte Order: Little-endian
Audio Data Format:
  • Each sample is 2 bytes (16-bit)
  • Audio data must be provided as an ArrayBuffer. If your data is in a Uint8Array, it must be converted to ArrayBuffer (e.g., using slice().buffer)
  • For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
Resampling:
  • If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK
  • For high-quality resampling, we recommend using Web Audio API’s OfflineAudioContext with anti-aliasing filtering
  • See example projects for resampling implementation
Audio Application Example
// ⚠️ Important: Audio must be 16kHz mono PCM16 format
// If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer
const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes)
const audioData = audioUint8.slice().buffer // Convert to ArrayBuffer

// Send audio data (end=false means continue sending for current conversation)
avatarView.avatarController.send(audioData, false) // Returns conversationId, but not required in SDK mode

// Note: SDK will start returning animation data and playing once it collects enough audio data, no need to wait for end=true
// Call when audio ends (end=true marks the end of current conversation round)
avatarView.avatarController.send(new ArrayBuffer(0), true) // end=true marks the end of current conversation round
// ⚠️ Note: After end=true, sending new audio data will interrupt any ongoing playback from the previous conversation round

// Interrupt conversation
avatarView.avatarController.interrupt()

Resource Management

Resource Cleanup Example
// Clean up view resources
avatarView.dispose()

// If you no longer need the digital human SDK, clean up global resources
AvatarKit.cleanup() // Clean up SDK global resources (only call if you no longer need the entire SDK)
When disposing AvatarView instances, must call dispose() to properly clean up resources. Not properly cleaning up may cause resource leaks and rendering errors.

Error Handling

Error Handling Example
import { SPAvatarError } from '@spatialwalk/avatarkit'

try {
  await avatarView.avatarController.start()
} catch (error) {
  if (error instanceof SPAvatarError) {
    console.error('SDK Error:', error.message, error.code)
  }
}

// Or use error callback
avatarView.avatarController.onError = (error) => {
  console.error('AvatarController error:', error)
}

Browser Compatibility

BrowserMinimum VersionRendering BackendNotes
Chrome90+WebGPU (Recommended) / WebGLBest Support
Edge90+WebGPU (Recommended) / WebGLBest Support
Firefox90+WebGLLimited WebGPU Support
Safari14+WebGLLimited WebGPU Support
iOS Safari14+WebGLLimited WebGPU Support
Android ChromeAndroid 8+WebGLLimited WebGPU Support
WebGPU requires newer browser versions. SDK will automatically detect and fallback to WebGL.

Performance Optimization Recommendations

  • SDK automatically selects the best rendering backend (WebGPU/WebGL)
  • Resources are automatically cached, no need to re-download on repeated loads
  • WASM memory is automatically managed by SDK, no manual release needed
  • Supports dynamic loading and unloading of resources
  • Recommended to use modern browsers for best performance