Speak with timeline

POST /vrm/:entity_id/speech/timeline

Play audio with synchronized expression keyframes for lip-sync and facial animation. This endpoint allows any TTS engine to be used — the engine receives WAV audio and frame-synchronized keyframe data. Each keyframe specifies a duration and a set of expression targets (e.g. mouth shapes, emotions) to apply during that interval.

The audio data must be base64-encoded WAV. The body size limit for this endpoint is 20 MB to accommodate audio payloads.

Request

Responses

Speech playback completed (when waitForCompletion is true or omitted).

Speak with timeline

/vrm/:entity_id/speech/timeline

Request​

Responses​

Request

Responses