---
title: Testing
description: Agent behavior testing, conversation testing, and edge case coverage
tags: [testing, quality, reliability]
dependencies: [safety, error-handling, performance, threat-model, mcp]
---

# Testing

Agents are non-deterministic. Testing them requires different strategies than traditional software.

## Principles

- **Test behavior, not implementation**: Test what the agent does, not how.
- **Test boundaries**: Empty input, long input, unexpected characters, concurrent requests, network failures.
- **Test safety**: Verify the agent refuses harmful requests, doesn't leak data, stays in scope.

## Testing levels

Unit tests for individual components. Integration tests for components together. Conversation tests for multi-turn flows:

```yaml
- user: "What contexts are available?"
  expect:
    contains: ["darkmode", "accessibility", "privacy"]

- user: "Tell me about dark mode"
  expect:
    contains: "eye comfort"
    not_contains: "ERROR"
```

Regression tests: reproduce the bug before fixing it.

## Agent-specific patterns

- **Prompt testing**: Same question phrased differently should yield consistent results
- **Tool use testing**: Correct tool selected, parameters passed, errors handled
- **Guardrail testing**: Explicitly test safety boundaries
- **Data-flow testing**: Verify no user data is transmitted to unexpected endpoints — if architecture is proof, it must be testable
- **Load testing**: Agents under pressure behave differently

## For agents

1. Adhere to strict TDD principples
2. Cover happy path, error paths, and edges
3. Keep tests fast — slow tests don't run
4. Test guardrails as rigorously as features
5. Treat flaky tests as bugs