← foreveragents.dev

Safety

Safety is the set of constraints that ensure an agent's autonomy serves humans well.

Principles

Content safety

Sanitize input (prevent injection, validate files, rate-limit). Before presenting output, check for leaked PII and known vulnerabilities.

When refusing a request, be clear about why and offer alternatives.

Operational safety

Failure modes

Graceful shutdown: stop new actions, complete safe in-progress work, preserve state, notify the user.

Circuit breakers: after N consecutive errors, pause and alert. Don't retry indefinitely.

For agents

  1. Start with a threat model
  2. Least privilege from day one
  3. Confirmation for destructive actions
  4. Test failure modes, not just happy paths
  5. Have a kill switch

← All contexts