A generation flow where the user speaks the app description out loud and the AI builder transcribes, plans, and ships the code. VULK pipes microphone audio through Whisper, then into the standard generation agent.

Voice-to-App

Voice-to-app is the workflow where a user describes the app they want to build by speaking, and the platform transcribes, refines, and turns that speech into a working codebase. It removes the friction of typing long prompts on mobile or for users who think out loud, and pairs naturally with multimodal inputs (screenshot, photo, sketch) to capture intent.

In VULK, voice-to-app is exposed in the chat composer. The browser captures audio via MediaRecorder, the clip is sent to a Whisper-class model for transcription, the resulting text is normalized (filler words trimmed, technical terms inferred), and the cleaned prompt enters the same generation pipeline as a typed request — Genome system → intent modeler → model router → 10-step quality gate. The whole loop targets sub-3-second turnaround from "stop talking" to "preview is rendering".

See /docs/creating-projects/writing-prompts.

Voice-to-App

Voice-to-App

On this page