Use this quickstart to create a minimal Speech-to-Avatar web demo with SpatialReal SDK Mode.
In a few steps, you will load your avatar, stream speech from a downloadable audio link, and render a realistic talking-avatar experience directly in the browser.
You can just copy each code block as-is and replace values in .env.
Or just dump this page to any LLM you like and ask it to implement this demo for you.
Prerequisites
SPATIALREAL_APP_ID
SPATIALREAL_AVATAR_ID
SESSION_TOKEN
- Node.js 18+,
pnpm
Create workspace
mkdir spatialreal-quickstart
cd spatialreal-quickstart
pnpm dlx create-vite@latest . --template vue-ts
Install dependencies
pnpm install
pnpm add @spatialwalk/avatarkit
Configure environment
In SpatialReal Studio, open Applications, then click Generate Temporary Token.Create .env:VITE_SPATIALREAL_APP_ID=your_app_id # SpatialReal Studio -> Applications
VITE_SPATIALREAL_AVATAR_ID=your_avatar_id # SpatialReal Studio -> Avatars
VITE_SPATIALREAL_SESSION_TOKEN=your_temporary_session_token
VITE_AUDIO_URL=
Update Vite config
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite'
export default defineConfig({
plugins: [vue(), avatarkitVitePlugin()],
server: { port: 3000 },
})
Create app page
Create src/App.vue and paste the following snippets in order.Start with the base <script setup> structure. This part imports Vue and SpatialReal SDK APIs, reads values from .env, and defines core UI state. It also includes small helpers used by the playback flow.<script setup lang="ts">
import { nextTick, onBeforeUnmount, ref } from 'vue'
import {
AvatarManager,
AvatarSDK,
AvatarView,
DrivingServiceMode,
Environment,
} from '@spatialwalk/avatarkit'
// Keep SDK audio format aligned with conversion and streaming.
const SAMPLE_RATE = 16000
// UI state and avatar mount point.
const container = ref<HTMLDivElement | null>(null)
const status = ref('Click Play Demo Audio to start')
const busy = ref(false)
// Active avatar view instance (created after avatar is loaded).
let avatarView: AvatarView | null = null
// Runtime configuration from .env.
const appId = import.meta.env.VITE_SPATIALREAL_APP_ID
const avatarId = import.meta.env.VITE_SPATIALREAL_AVATAR_ID
const sessionToken = import.meta.env.VITE_SPATIALREAL_SESSION_TOKEN
const audioUrl = import.meta.env.VITE_AUDIO_URL
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms))
}
async function fetchSessionToken(): Promise<string> {
if (!sessionToken) throw new Error('Missing VITE_SPATIALREAL_SESSION_TOKEN')
return sessionToken
}
Then add the audio pipeline functions. This part downloads the source audio, decodes it, converts it to mono PCM16 format, and streams it to the avatar controller in 100ms chunks.// Download remote audio, decode it, mix to mono, then convert Float32 -> PCM16.
async function downloadAudioAsPcm16(url: string, targetSampleRate: number): Promise<ArrayBuffer> {
const response = await fetch(url)
if (!response.ok) throw new Error('Failed to load audio file. Check VITE_AUDIO_URL and CORS.')
const fileBuffer = await response.arrayBuffer()
const audioContext = new AudioContext({ sampleRate: targetSampleRate })
const decoded = await audioContext.decodeAudioData(fileBuffer.slice(0))
const length = decoded.length
const channels = decoded.numberOfChannels
const mono = new Float32Array(length)
if (channels === 1) {
mono.set(decoded.getChannelData(0))
} else {
for (let c = 0; c < channels; c++) {
const data = decoded.getChannelData(c)
for (let i = 0; i < length; i++) mono[i] += data[i] / channels
}
}
const pcm16 = new ArrayBuffer(length * 2)
const view = new DataView(pcm16)
for (let i = 0; i < length; i++) {
const s = Math.max(-1, Math.min(1, mono[i]))
view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true)
}
await audioContext.close()
return pcm16
}
// Send PCM16 in 100ms chunks to simulate real-time speech input.
async function streamPcm16(pcm16: ArrayBuffer, sampleRate: number): Promise<void> {
if (!avatarView) return
const chunkMs = 100
const chunkSamples = Math.floor((sampleRate * chunkMs) / 1000)
const chunkBytes = chunkSamples * 2
for (let offset = 0; offset < pcm16.byteLength; offset += chunkBytes) {
const end = Math.min(offset + chunkBytes, pcm16.byteLength)
const chunk = pcm16.slice(offset, end)
const isLast = end >= pcm16.byteLength
avatarView.controller.send(chunk, isLast)
await sleep(chunkMs)
}
}
Next add avatar lifecycle and main action logic. This is the core flow triggered by the button: initialize SDK, load and mount avatar, connect, then stream converted audio. It also handles cleanup so repeated runs and page unmounts stay stable.// Release current connection and rendering resources before re-run/unmount.
async function disposeAvatar(): Promise<void> {
avatarView?.controller.close()
avatarView?.dispose()
avatarView = null
}
// Main demo flow:
// 1) Read token from env
// 2) Initialize SDK
// 3) Load and mount avatar
// 4) Start SDK Mode connection
// 5) Download + convert + stream audio
async function playDemoAudio(): Promise<void> {
if (busy.value) return
busy.value = true
try {
status.value = 'Fetching session token...'
const token = await fetchSessionToken()
if (!AvatarSDK.isInitialized) {
await AvatarSDK.initialize(appId, {
environment: Environment.intl,
drivingServiceMode: DrivingServiceMode.sdk,
})
}
AvatarSDK.setSessionToken(token)
await nextTick()
const mountEl = container.value
if (!mountEl) throw new Error('Avatar container is not ready')
await disposeAvatar()
status.value = 'Loading avatar...'
const avatar = await AvatarManager.shared.load(avatarId)
avatarView = new AvatarView(avatar, mountEl)
status.value = 'Connecting to SpatialReal...'
await avatarView.controller.initializeAudioContext()
await avatarView.controller.start()
status.value = 'Downloading and converting audio...'
const pcm16 = await downloadAudioAsPcm16(audioUrl, SAMPLE_RATE)
status.value = 'Streaming audio...'
await streamPcm16(pcm16, SAMPLE_RATE)
status.value = 'Playback finished'
} catch (error) {
status.value = error instanceof Error ? error.message : 'Failed to run demo'
} finally {
busy.value = false
}
}
onBeforeUnmount(async () => {
await disposeAvatar()
})
</script>
Finally add the template section. It renders the avatar container, one action button, and a status line. The status text helps users understand current progress and errors while running the demo.<template>
<div style="min-height:100vh; display:flex; align-items:center; justify-content:center; padding:16px;">
<div style="width:min(720px, 100%); display:flex; flex-direction:column; gap:10px;">
<div ref="container" style="width:100%; aspect-ratio:16/10; min-height:320px; border-radius:12px; overflow:hidden; border:1px solid;" />
<div style="display:flex; gap:8px; flex-wrap:wrap;">
<button :disabled="busy" @click="playDemoAudio">
{{ busy ? 'Running...' : 'Play Demo Audio' }}
</button>
</div>
<div style="font-size:14px;">{{ status }}</div>
</div>
</div>
</template>
Verify project structure
Speech-to-Avatar quickstart
spatialreal-quickstart/
├── .env
├── vite.config.ts
└── src/
└── App.vue
Install and run
Run the app:cd spatialreal-quickstart
pnpm dev
Open http://localhost:3000 and click Play Demo Audio.
What Happens
- Frontend initializes AvatarKit in SDK Mode and starts connection
- Frontend sets the temporary session token copied from Studio
- Frontend downloads audio from
VITE_AUDIO_URL, converts it to mono PCM16
- PCM16 chunks are streamed to the avatar controller for animation + audio playback