Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Providers & model aliases

Supported providers

Provider name--provider valueAPI key env varDefault modelNeeds key
AnthropicanthropicANTHROPIC_API_KEYclaude-sonnet-4-6
OpenAIopenaiOPENAI_API_KEYgpt-4o
Google GeminigeminiGEMINI_API_KEYgemini-flash-latest
GroqgroqGROQ_API_KEYllama-3.3-70b-versatile
Grok / xAIgrokXAI_API_KEYgrok-3
DeepSeekdeepseekDEEPSEEK_API_KEYdeepseek-chat
MistralmistralMISTRAL_API_KEYmistral-large-latest
MiniMaxminimaxMINIMAX_API_KEYminimax-text-01
OpenRouteropenrouterOPENROUTER_API_KEYanthropic/claude-3.5-sonnet
Together AItogetherTOGETHER_API_KEYLlama-3.3-70B-Instruct-Turbo
Fireworks AIfireworksFIREWORKS_API_KEYllama-v3p3-70b-instruct
LM Studiolm-studioauto-detect
Ollamaollamaauto-detect
vLLMvllmauto-detect

Local providers (LM Studio, Ollama, vLLM) are auto-detected on first run and require no API key. The model is discovered from the running server.

Model aliases

Aliases let you switch models without memorising exact IDs. They’re shown in the /model picker and accepted by --model and /model.

AliasProviderExact model ID
gemini-flash-liteGeminigemini-flash-lite-latest
gemini-flashGeminigemini-flash-latest
gemini-proGeminigemini-pro-latest
claude-haikuAnthropicclaude-haiku-4-5-20251001
claude-sonnetAnthropicclaude-sonnet-4-6
claude-opusAnthropicclaude-opus-4-6
localLM Studioauto-detect at runtime

You can also use any literal model ID your provider supports — aliases are just shortcuts. koda --model gpt-4o-mini or /model o3 both work.

HTTP timeouts

All providers use a shared HTTP client with the following timeout defaults:

SettingDefaultEnv overrideDescription
Connect timeout30 sKODA_CONNECT_TIMEOUT_SECSTime allowed to establish the TCP/TLS connection
Read timeout300 s (5 min)KODA_READ_TIMEOUT_SECSTime allowed between bytes from the server (per-byte, not total)

The read timeout is per-byte, not total. A long streaming response is fine as long as bytes keep arriving — the timer resets on each chunk. This means slow networks or chatty SSE streams won’t get murdered mid-turn, but a stalled connection (server hung after last byte) will fail fast.

When to tune these

  • Behind a slow corporate proxy? Bump KODA_CONNECT_TIMEOUT_SECS to 60 or 90. Connection-phase timeouts often manifest as “request timed out” with no usage data, which is the giveaway.
  • Long-running model on a flaky link? Bump KODA_READ_TIMEOUT_SECS to 600+. Read-phase timeouts manifest as a partial response cut short partway through generation. (Note: koda also auto-retries transient network errors up to 5 times with exponential backoff; see is_network_transient_error.)
  • Local provider (Ollama, LM Studio, vLLM) and you want fail-fast? Drop KODA_READ_TIMEOUT_SECS to 30 — local models that hang are usually truly hung, not slow.

Example

KODA_CONNECT_TIMEOUT_SECS=60 KODA_READ_TIMEOUT_SECS=300 koda