Recognize a single sentence from the microphone.
POST/stt/recognize
Starts a cpal capture -> VAD -> Whisper pipeline, returns the first recognized sentence, then destroys the pipeline. Long-polls until speech is detected or timeout (60s).
Request
Responses
- 200
- 408
- 422
- 503
Recognized text
Timeout — no speech detected
Invalid language or model size
Model not available or microphone error