Speak with timeline
POST/vrm/:entity_id/speech/timeline
Play audio with synchronized expression keyframes for lip-sync and facial animation. This endpoint allows any TTS engine to be used — the engine receives WAV audio and frame-synchronized keyframe data. Each keyframe specifies a duration and a set of expression targets (e.g. mouth shapes, emotions) to apply during that interval.
The audio data must be base64-encoded WAV. The body size limit for this endpoint is 20 MB to accommodate audio payloads.
Request
Responses
- 200
- 202
- 400
- 404
Speech playback completed (when waitForCompletion is true or omitted).
Speech queued for playback (when waitForCompletion is false).
Invalid input (bad base64, invalid WAV header, negative keyframe duration).
Entity not found.