Error Handling
depends on: conversation, safety
Errors are inevitable. How an agent handles them determines whether users feel helped or abandoned.
Principles
Errors are conversations, not dead ends. Be specific, not vague. Separate user errors from system errors.
A good error message has three parts:
Your file couldn't be uploaded. ← what happened
The image is 15MB, but the limit is 5MB. ← why
Try compressing it or choosing a smaller image. ← what to do
Graceful degradation
When a feature fails, the rest continues:
- External LLM or Embedding db down → use local inference server (Ollama, llama.cpp)
- Local inference server unavailable → downgrade to in-browser inference (WebLLM)
- External execution environment down → use local execution or Edge-Containers
- API down → show cached data with a staleness indicator
- Non-critical feature fails → hide it, don't break the page
- Partial success → save what worked, report what didn't
Retry strategies
- Exponential backoff for server errors (1s, 2s, 4s, 8s)
- Immediate retry for timeouts (once)
- No retry for client errors (4xx)
- User-initiated retry for ambiguous failures
For agents
- Acknowledge the error immediately
- Explain what you tried and what failed
- Suggest alternatives
- If recoverable, attempt recovery transparently
- If not, preserve the user's work
- Never silently swallow errors