Skip to Content
ConceptsSafety classes

Safety classes

Every action an agent can take through Autonomy carries a risk classification, and that classification — not the tool author’s judgment, not the agent’s own discretion — decides whether the action runs immediately, waits for a yes, or is refused outright. This page explains that model and the philosophy behind it: consent before high-risk action, enforced in one place for every caller.

Why one gate, not per-tool judgment

An agent that reads screen text and an agent that submits a payment form are doing very different things, but a system that leaves each tool to decide its own risk invites drift — one tool author’s “should probably ask” is another’s “seems fine to auto-run.” Autonomy fixes the risk tier at the protocol level instead. Every actuating call carries a SafetyClass:

ClassMeaningExample
read_onlyNo approval neededreading the accessibility tree, finding elements
standardLogged, auto-approved when the user is idleclicking a button, selecting a menu item
elevatedRequires explicit approvaltyping into a field, submitting a form, toggling a setting
criticalAlways requires confirmation and an audit traildelete actions, file operations, system-settings changes

That’s the accessibility-backend layer: it gates the actual click, type, or AX action. Above it sits a second, more specific classification for agent-initiated actions — an action class (things like reading state, inspecting a page, scrolling, drafting text, changing settings, sending externally, or handling credentials) that a consent evaluator checks against the user’s current trust posture before anything runs. Both layers exist for the same reason: risk should be a property of the action, checked in exactly one place, not something each integration reinvents.

The philosophy: some things are never automatic

The most important property of the model isn’t the four-tier scale — it’s the short list of actions that sit outside it entirely. Entering credentials, completing account recovery, making a payment, taking a destructive action, or changing a permission are unconditionally denied. No trust level, no “power user” setting, no configuration override changes that outcome. This is checked first, before anything else, on every request.

Everything else is negotiated through two more layers: whether the user’s saved access configuration allows that category of action at all, and — for screen capture specifically — whether the relevant macOS permission has actually been granted. Only after both of those pass does the user’s trust posture matter.

Trust posture: power, guided, training

Autonomy recognizes an operating posture that shapes how much a trusted setup gets to skip:

  • Guided or training posture always inserts an accessible review step, regardless of how low-risk the action looks.
  • Power posture allows a short allowlist of genuinely low-risk actions — reading accessibility state, reading the AX tree, inspecting a page or app, scrolling, local recap, drafting text — to proceed without asking, but only when the saved access configuration also allows that category. Power posture does not touch the unconditional deny set, and it still asks before everything outside that narrow allowlist.

Mental model

Any input the evaluator can’t parse — an unknown action class, a malformed request — resolves to deny, never allow. Fail-safe is the default, not an edge case someone forgot to handle.

A concrete example

An agent is asked to “clean up these old files” during a session. Deleting files is a destructive action — it lands in the unconditional deny set. No amount of trust configuration changes that answer; the agent gets a denial with a reason, not a prompt it could talk its way past. Compare that to typing a reply into a drafted email: that’s an elevated, “ask” action — under a trusted, power-posture setup with the right access configuration, it might still require a single approval rather than being silently automatic, because typing crosses into content the user didn’t dictate.

Common misconceptions

  • “Power mode means the agent can do anything without asking.” It unlocks a specific, short allowlist of read-and-navigate actions — nothing in the always-denied set is ever included, and everything else still asks.
  • “A tool succeeding means the user approved of it.” Consent is a separate, explicit decision recorded before the action runs — not inferred from the fact that nothing broke.
  • “Consent is a UI feature.” It’s enforced at the daemon, the same boundary every caller passes through — the app’s own UI, a bring-your-own agent, and the default launcher all hit the identical gate. See MCP-native architecture for why that boundary is the daemon and not any one client.

For how this gate shows up from an agent’s side of the connection — the exact tools and error shapes involved — see consent and safety. For how this model interacts with mode selection, see the accessibility model.

Last updated on