Choosing the Right AI Model
How VULK routes across 16+ AI models — Gemini 3.1 Pro (default), Claude Opus 4.7, GPT-5.4, Grok 4.20, Llama, Mistral, DeepSeek — and how to override per-generation or BYOK with zero markup.
Choosing the Right AI Model
VULK ships 16+ AI models out of the box. The router auto-picks the best one based on file type and difficulty (Smart Model Routing — 87 % cost savings vs naive single-model), but you can override per generation. Bring Your Own Key (BYOK) is supported for 9 providers with zero markup on the underlying API calls.
Default
Gemini 3.1 Pro is the default for all new generations. Validated against an 11-gate quality probe in April 2026 — currently the only model in our bench that passes all 11 gates on first try. 1M token context window.
Provider Models (managed)
VULK includes credits for these models on every paid plan. The cost column shows credit multiplier vs the 1× baseline (Gemini 3.1 Pro).
| Model | Provider | Context | Strength | Cost |
|---|---|---|---|---|
| Gemini 3.1 Pro (default) | 1M | Multimodal, full-stack reasoning, design fidelity | 1× | |
| Gemini 3.1 Flash | 1M | Fast iteration, simple components | 0.25× | |
| Gemini 3.1 Flash Image | 1M | Inline image generation | 0.5× | |
| Claude Opus 4.7 | Anthropic | 1M | Architecture, complex business logic | 5× |
| Claude Sonnet 4.6 | Anthropic | 200k | Surgical edits, careful refactors | 3× |
| Claude Haiku 4.5 | Anthropic | 200k | Quick mutations, fast turn-around | 0.8× |
| GPT-5.4 | OpenAI | 400k | Strong reasoning, mathematical logic | 3× |
| GPT-5.4 Codex Max | OpenAI | 400k | Pure-code optimisation | 2× |
| Grok 4.20 | xAI | 2M | Long-context, multimodal | 0.4× |
| DeepSeek V3.2 | DeepSeek | 163k | Cost-efficient code generation | 0.13× |
| Mistral Codestral 2 | Mistral | 262k | Speed-optimised code | 0.5× |
| Llama 4 Maverick | Meta | 1M | Open-weights option, long context | 0.3× |
| Qwen 3.5 Coder | Alibaba | 256k | Multilingual code | 0.4× |
| MiniMax M2.1 | MiniMax | 196k | Asia-region inference | 0.6× |
| GLM 5.1 | Zhipu | 200k | Budget-tier | 0.25× |
| Amazon Nova 2 Pro | Amazon | 1M | AWS-region routing | 0.7× |
Need a model that's not listed? File a request at [email protected] — we add new models as they ship.
Bring Your Own Model (BYOK) — zero markup
On Pro and above, connect your own API keys at vulk.dev/settings/api-keys. VULK routes your generations through your key with zero margin — you pay the underlying provider exactly what they charge.
| Provider | Required key | Notes |
|---|---|---|
| OpenAI | OPENAI_API_KEY | All GPT-5 family + o-series |
| Anthropic | ANTHROPIC_API_KEY | Opus 4.7, Sonnet 4.6, Haiku 4.5 |
| Google AI Studio | GEMINI_API_KEY | All Gemini 3.x models |
| Mistral | MISTRAL_API_KEY | Codestral, Mistral Large |
| Groq | GROQ_API_KEY | Ultra-fast Llama / Mixtral inference |
| DeepSeek | DEEPSEEK_API_KEY | V3.2 + Reasoner |
| Together AI | TOGETHER_API_KEY | 100+ open-weight models |
| Fireworks | FIREWORKS_API_KEY | Production-tuned open models |
| Ollama (self-hosted) | URL of your Ollama server | For privacy-first workflows |
Any OpenAI-compatible endpoint also works — point VULK at your custom URL.
Smart Model Routing (automatic)
When you don't override, VULK picks per file:
- Config files (
package.json,tsconfig.json,.env.example) → Gemini 3.1 Flash (cheap, deterministic) - UI components → Gemini 3.1 Pro (design fidelity)
- Complex business logic / auth / state-machines → Claude Opus 4.7 (premium reasoning)
- Pure code (utility functions, hooks) → GPT-5.4 Codex Max
- Schema / type files → Sonnet 4.6 (careful + cheap)
Result: 87 % credit savings vs running everything through the premium tier.
Recommendations by use case
Landing pages & marketing sites
- Default: Gemini 3.1 Pro (best brand-consistent design)
- Cheap: Gemini 3.1 Flash
Full-stack SaaS dashboards
- Default: Gemini 3.1 Pro
- Premium: Claude Opus 4.7 (deep architecture)
3D / WebGL / shader-heavy projects
- Default: Gemini 3.1 Pro (excellent at GLSL + Three.js)
- Premium: Claude Opus 4.7 (when shader is mathematically complex)
Mobile apps (Flutter / React Native)
- Default: Gemini 3.1 Pro (multimodal, handles platform APIs)
- Surgical edits: Sonnet 4.6
Backend / auth / billing
- Default: Claude Opus 4.7 (highest reasoning for security-critical code)
- Cheap: GPT-5.4 Codex Max
Voice / video-to-app input modalities
- Required: Gemini 3.1 Pro (Whisper + multimodal vision pipeline)
Budget runs
- Best: DeepSeek V3.2 (0.13×) or GLM 5.1 (0.25×)
Provider-chain failover
Every generation goes through a circuit breaker:
Primary → OpenRouter fallback → Anthropic fallbackIf the primary provider is degraded, VULK auto-fails over without dropping the generation.
Changing models
- Per generation: Click the model name in the chat header → pick from the dropdown.
- Default: Settings → AI Models → set your preferred default.
- BYOK: Settings → API Keys → add your key, then it appears in the picker labelled
(BYOK).
Credit multipliers
The cost column on each model shows credit usage relative to Gemini 3.1 Pro = 1×:
- 0.13× = uses 13 % of standard credits
- 1× = standard
- 5× = premium tier
A typical full-stack landing page costs ~50 credits on Gemini 3.1 Pro. The same page on Claude Opus 4.7 costs ~250 credits. Smart Model Routing (default) keeps the cost-quality balance optimal automatically.
MCP integration
If you're using Claude Desktop, Cursor, Windsurf, or any other MCP-compatible client, install vulk-mcp-server to drive VULK from inside your AI workflow:
npx vulk-mcp-serverSee MCP Server for full setup.
Questions? [email protected]