How VULK routes across 16+ AI models — Gemini 3.1 Pro (default), Claude Opus 4.7, GPT-5.4, Grok 4.20, Llama, Mistral, DeepSeek — and how to override per-generation or BYOK with zero markup.

Choosing the Right AI Model

VULK ships 16+ AI models out of the box. The router auto-picks the best one based on file type and difficulty (Smart Model Routing — 87 % cost savings vs naive single-model), but you can override per generation. Bring Your Own Key (BYOK) is supported for 9 providers with zero markup on the underlying API calls.

Default

Gemini 3.1 Pro is the default for all new generations. Validated against an 11-gate quality probe in April 2026 — currently the only model in our bench that passes all 11 gates on first try. 1M token context window.

Provider Models (managed)

VULK includes credits for these models on every paid plan. The cost column shows credit multiplier vs the 1× baseline (Gemini 3.1 Pro).

Model	Provider	Context	Strength	Cost
Gemini 3.1 Pro (default)	Google	1M	Multimodal, full-stack reasoning, design fidelity	1×
Gemini 3.1 Flash	Google	1M	Fast iteration, simple components	0.25×
Gemini 3.1 Flash Image	Google	1M	Inline image generation	0.5×
Claude Opus 4.7	Anthropic	1M	Architecture, complex business logic	5×
Claude Sonnet 4.6	Anthropic	200k	Surgical edits, careful refactors	3×
Claude Haiku 4.5	Anthropic	200k	Quick mutations, fast turn-around	0.8×
GPT-5.4	OpenAI	400k	Strong reasoning, mathematical logic	3×
GPT-5.4 Codex Max	OpenAI	400k	Pure-code optimisation	2×
Grok 4.20	xAI	2M	Long-context, multimodal	0.4×
DeepSeek V3.2	DeepSeek	163k	Cost-efficient code generation	0.13×
Mistral Codestral 2	Mistral	262k	Speed-optimised code	0.5×
Llama 4 Maverick	Meta	1M	Open-weights option, long context	0.3×
Qwen 3.5 Coder	Alibaba	256k	Multilingual code	0.4×
MiniMax M2.1	MiniMax	196k	Asia-region inference	0.6×
GLM 5.1	Zhipu	200k	Budget-tier	0.25×
Amazon Nova 2 Pro	Amazon	1M	AWS-region routing	0.7×

Need a model that's not listed? File a request at [email protected] — we add new models as they ship.

Bring Your Own Model (BYOK) — zero markup

On Pro and above, connect your own API keys at vulk.dev/settings/api-keys. VULK routes your generations through your key with zero margin — you pay the underlying provider exactly what they charge.

Provider	Required key	Notes
OpenAI	`OPENAI_API_KEY`	All GPT-5 family + o-series
Anthropic	`ANTHROPIC_API_KEY`	Opus 4.7, Sonnet 4.6, Haiku 4.5
Google AI Studio	`GEMINI_API_KEY`	All Gemini 3.x models
Mistral	`MISTRAL_API_KEY`	Codestral, Mistral Large
Groq	`GROQ_API_KEY`	Ultra-fast Llama / Mixtral inference
DeepSeek	`DEEPSEEK_API_KEY`	V3.2 + Reasoner
Together AI	`TOGETHER_API_KEY`	100+ open-weight models
Fireworks	`FIREWORKS_API_KEY`	Production-tuned open models
Ollama (self-hosted)	URL of your Ollama server	For privacy-first workflows

Any OpenAI-compatible endpoint also works — point VULK at your custom URL.

Smart Model Routing (automatic)

When you don't override, VULK picks per file:

Config files (package.json, tsconfig.json, .env.example) → Gemini 3.1 Flash (cheap, deterministic)
UI components → Gemini 3.1 Pro (design fidelity)
Complex business logic / auth / state-machines → Claude Opus 4.7 (premium reasoning)
Pure code (utility functions, hooks) → GPT-5.4 Codex Max
Schema / type files → Sonnet 4.6 (careful + cheap)

Result: 87 % credit savings vs running everything through the premium tier.

Recommendations by use case

Landing pages & marketing sites

Default: Gemini 3.1 Pro (best brand-consistent design)
Cheap: Gemini 3.1 Flash

Full-stack SaaS dashboards

Default: Gemini 3.1 Pro
Premium: Claude Opus 4.7 (deep architecture)

3D / WebGL / shader-heavy projects

Default: Gemini 3.1 Pro (excellent at GLSL + Three.js)
Premium: Claude Opus 4.7 (when shader is mathematically complex)

Mobile apps (Flutter / React Native)

Default: Gemini 3.1 Pro (multimodal, handles platform APIs)
Surgical edits: Sonnet 4.6

Backend / auth / billing

Default: Claude Opus 4.7 (highest reasoning for security-critical code)
Cheap: GPT-5.4 Codex Max

Voice / video-to-app input modalities

Required: Gemini 3.1 Pro (Whisper + multimodal vision pipeline)

Budget runs

Best: DeepSeek V3.2 (0.13×) or GLM 5.1 (0.25×)

Provider-chain failover

Every generation goes through a circuit breaker:

Primary → OpenRouter fallback → Anthropic fallback

If the primary provider is degraded, VULK auto-fails over without dropping the generation.

Changing models

Per generation: Click the model name in the chat header → pick from the dropdown.
Default: Settings → AI Models → set your preferred default.
BYOK: Settings → API Keys → add your key, then it appears in the picker labelled (BYOK).

Credit multipliers

The cost column on each model shows credit usage relative to Gemini 3.1 Pro = 1×:

0.13× = uses 13 % of standard credits
1× = standard
5× = premium tier

A typical full-stack landing page costs ~50 credits on Gemini 3.1 Pro. The same page on Claude Opus 4.7 costs ~250 credits. Smart Model Routing (default) keeps the cost-quality balance optimal automatically.

MCP integration

If you're using Claude Desktop, Cursor, Windsurf, or any other MCP-compatible client, install vulk-mcp-server to drive VULK from inside your AI workflow:

npx vulk-mcp-server

See MCP Server for full setup.

Questions? [email protected]

Choosing the Right AI Model

On this page