Skip to main content
Use this quickstart to create a minimal Speech-to-Avatar web demo with SpatialReal SDK Mode. In a few steps, you will load your avatar, stream speech from a downloadable audio link, and render a realistic talking-avatar experience directly in the browser.
You can just copy each code block as-is and replace values in .env. Or just dump this page to any LLM you like and ask it to implement this demo for you.

Prerequisites

  • SPATIALREAL_APP_ID
  • SPATIALREAL_AVATAR_ID
  • SESSION_TOKEN
  • Node.js 18+, pnpm
1

Create workspace

mkdir spatialreal-quickstart
cd spatialreal-quickstart
pnpm dlx create-vite@latest . --template vue-ts
2

Install dependencies

pnpm install
pnpm add @spatialwalk/avatarkit
3

Configure environment

In SpatialReal Studio, open Applications, then click Generate Temporary Token.Create .env:
.env
VITE_SPATIALREAL_APP_ID=your_app_id # SpatialReal Studio -> Applications
VITE_SPATIALREAL_AVATAR_ID=your_avatar_id # SpatialReal Studio -> Avatars
VITE_SPATIALREAL_SESSION_TOKEN=your_temporary_session_token
VITE_AUDIO_URL=
4

Update Vite config

vite.config.ts
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite'

export default defineConfig({
  plugins: [vue(), avatarkitVitePlugin()],
  server: { port: 3000 },
})
5

Create app page

Create src/App.vue and paste the following snippets in order.Start with the base <script setup> structure. This part imports Vue and SpatialReal SDK APIs, reads values from .env, and defines core UI state. It also includes small helpers used by the playback flow.
src/App.vue
<script setup lang="ts">
import { nextTick, onBeforeUnmount, ref } from 'vue'
import {
  AvatarManager,
  AvatarSDK,
  AvatarView,
  DrivingServiceMode,
  Environment,
} from '@spatialwalk/avatarkit'

// Keep SDK audio format aligned with conversion and streaming.
const SAMPLE_RATE = 16000

// UI state and avatar mount point.
const container = ref<HTMLDivElement | null>(null)
const status = ref('Click Play Demo Audio to start')
const busy = ref(false)

// Active avatar view instance (created after avatar is loaded).
let avatarView: AvatarView | null = null

// Runtime configuration from .env.
const appId = import.meta.env.VITE_SPATIALREAL_APP_ID
const avatarId = import.meta.env.VITE_SPATIALREAL_AVATAR_ID
const sessionToken = import.meta.env.VITE_SPATIALREAL_SESSION_TOKEN
const audioUrl = import.meta.env.VITE_AUDIO_URL

function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms))
}

async function fetchSessionToken(): Promise<string> {
  if (!sessionToken) throw new Error('Missing VITE_SPATIALREAL_SESSION_TOKEN')
  return sessionToken
}
Then add the audio pipeline functions. This part downloads the source audio, decodes it, converts it to mono PCM16 format, and streams it to the avatar controller in 100ms chunks.
src/App.vue
// Download remote audio, decode it, mix to mono, then convert Float32 -> PCM16.
async function downloadAudioAsPcm16(url: string, targetSampleRate: number): Promise<ArrayBuffer> {
  const response = await fetch(url)
  if (!response.ok) throw new Error('Failed to load audio file. Check VITE_AUDIO_URL and CORS.')

  const fileBuffer = await response.arrayBuffer()
  const audioContext = new AudioContext({ sampleRate: targetSampleRate })
  const decoded = await audioContext.decodeAudioData(fileBuffer.slice(0))

  const length = decoded.length
  const channels = decoded.numberOfChannels
  const mono = new Float32Array(length)

  if (channels === 1) {
    mono.set(decoded.getChannelData(0))
  } else {
    for (let c = 0; c < channels; c++) {
      const data = decoded.getChannelData(c)
      for (let i = 0; i < length; i++) mono[i] += data[i] / channels
    }
  }

  const pcm16 = new ArrayBuffer(length * 2)
  const view = new DataView(pcm16)
  for (let i = 0; i < length; i++) {
    const s = Math.max(-1, Math.min(1, mono[i]))
    view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true)
  }

  await audioContext.close()
  return pcm16
}

// Send PCM16 in 100ms chunks to simulate real-time speech input.
async function streamPcm16(pcm16: ArrayBuffer, sampleRate: number): Promise<void> {
  if (!avatarView) return

  const chunkMs = 100
  const chunkSamples = Math.floor((sampleRate * chunkMs) / 1000)
  const chunkBytes = chunkSamples * 2

  for (let offset = 0; offset < pcm16.byteLength; offset += chunkBytes) {
    const end = Math.min(offset + chunkBytes, pcm16.byteLength)
    const chunk = pcm16.slice(offset, end)
    const isLast = end >= pcm16.byteLength

    avatarView.controller.send(chunk, isLast)
    await sleep(chunkMs)
  }
}
Next add avatar lifecycle and main action logic. This is the core flow triggered by the button: initialize SDK, load and mount avatar, connect, then stream converted audio. It also handles cleanup so repeated runs and page unmounts stay stable.
src/App.vue
// Release current connection and rendering resources before re-run/unmount.
async function disposeAvatar(): Promise<void> {
  avatarView?.controller.close()
  avatarView?.dispose()
  avatarView = null
}

// Main demo flow:
// 1) Read token from env
// 2) Initialize SDK
// 3) Load and mount avatar
// 4) Start SDK Mode connection
// 5) Download + convert + stream audio
async function playDemoAudio(): Promise<void> {
  if (busy.value) return
  busy.value = true

  try {
    status.value = 'Fetching session token...'
    const token = await fetchSessionToken()

    if (!AvatarSDK.isInitialized) {
      await AvatarSDK.initialize(appId, {
        environment: Environment.intl,
        drivingServiceMode: DrivingServiceMode.sdk,
      })
    }
    AvatarSDK.setSessionToken(token)

    await nextTick()
    const mountEl = container.value
    if (!mountEl) throw new Error('Avatar container is not ready')

    await disposeAvatar()
    status.value = 'Loading avatar...'
    const avatar = await AvatarManager.shared.load(avatarId)
    avatarView = new AvatarView(avatar, mountEl)

    status.value = 'Connecting to SpatialReal...'
    await avatarView.controller.initializeAudioContext()
    await avatarView.controller.start()

    status.value = 'Downloading and converting audio...'
    const pcm16 = await downloadAudioAsPcm16(audioUrl, SAMPLE_RATE)

    status.value = 'Streaming audio...'
    await streamPcm16(pcm16, SAMPLE_RATE)
    status.value = 'Playback finished'
  } catch (error) {
    status.value = error instanceof Error ? error.message : 'Failed to run demo'
  } finally {
    busy.value = false
  }
}

onBeforeUnmount(async () => {
  await disposeAvatar()
})
</script>
Finally add the template section. It renders the avatar container, one action button, and a status line. The status text helps users understand current progress and errors while running the demo.
src/App.vue
<template>
  <div style="min-height:100vh; display:flex; align-items:center; justify-content:center; padding:16px;">
    <div style="width:min(720px, 100%); display:flex; flex-direction:column; gap:10px;">
      <div ref="container" style="width:100%; aspect-ratio:16/10; min-height:320px; border-radius:12px; overflow:hidden; border:1px solid;" />

      <div style="display:flex; gap:8px; flex-wrap:wrap;">
        <button :disabled="busy" @click="playDemoAudio">
          {{ busy ? 'Running...' : 'Play Demo Audio' }}
        </button>
      </div>

      <div style="font-size:14px;">{{ status }}</div>
    </div>
  </div>
</template>
6

Verify project structure

Speech-to-Avatar quickstart
spatialreal-quickstart/
├── .env
├── vite.config.ts
└── src/
    └── App.vue
7

Install and run

Run the app:
cd spatialreal-quickstart
pnpm dev
Open http://localhost:3000 and click Play Demo Audio.

What Happens

  • Frontend initializes AvatarKit in SDK Mode and starts connection
  • Frontend sets the temporary session token copied from Studio
  • Frontend downloads audio from VITE_AUDIO_URL, converts it to mono PCM16
  • PCM16 chunks are streamed to the avatar controller for animation + audio playback