Consent and safety

Every agent connected to Autonomy — the default claude launcher or a fully custom bring-your-own agent — can read screen state, click UI, type into fields, and potentially trigger something destructive or expensive. Autonomy doesn’t leave that risk to each agent’s own judgment. It classifies actions at the protocol level and routes anything risky through an explicit consent gate before it happens, so the person on the other end of the Mac — who may not be watching the screen at all — stays the one deciding, not just the one theoretically able to object afterward.

This page covers the two-layer model behind that gate: SafetyClass, the protocol-wide four-tier classification every action carries, and ConsentCategory, the much more specific category attached to an actual consent request. Then it walks through the consent_request / consent_resolve call pair and the action ledger that records what happened.

Why a protocol-level model instead of per-tool checks

The alternative — burying an approval check inside each individual tool — was explicitly rejected. ADR-007 ties consent and trust policy to a shared classification instead, so the same risk model applies consistently whether the action came through relay traffic, an MCP tool call, a playbook step, or a future remote-control path. Mutating actions can be blocked, prompted, or audited without rewriting each tool’s logic individually, and contributors have to treat the classification as a protocol contract — not a UI nicety a new tool can quietly skip.

SafetyClass: four tiers

Every action-performing method in the protocol carries one of four values:

Class	Approval required	Examples
`read_only`	None	`read_state`, `read_cache`, `read_tree`, `find_element`
`standard`	Logged; auto-approved if the user is idle	Click a button, select a menu item
`elevated`	Explicit user approval required	Type text into fields, submit forms, toggle settings
`critical`	Always confirmed, plus an audit trail	Delete actions, file operations, system settings changes

elevated and critical are the two classes that trigger the consent flow below; standard can be auto-approved while the user is idle, and read_only never blocks.

ConsentCategory: what the request is actually about

SafetyClass says how risky; ConsentCategory says what kind of risky. When an agent calls consent_request, it names one of 33 typed categories — far more specific than the four safety tiers, because “ask before this” reads very differently for a purchase than for a permission prompt. Grouped by theme:

Money and identity — purchase, credential_or_payment, two_factor_authentication, credential_context_present
Destructive or account-level change — delete_or_destructive_change, account_or_security_change, privacy_setting_change, setting_change, permission_open_settings
Leaving the user’s control — send_message, send_email, external_send, external_handoff, submit_form
Capture and sharing — capture_screen, capture_sensitive_app, screen_share_external, camera_share_external, microphone_share_external, support_packet_share
Presence of sensitive context — workplace_data_present, bystander_or_third_party_data_present
Input and attention — text_entry, microphone_start, microphone_or_speech, speech_recognition_start, foreground_takeover, focus_change, pointer_movement
Low-risk or local — local_read_only, local_visual_fallback, support_packet_create, generic_driver_fallback

That last group matters as much as the risky ones: not everything routes through a blocking approval, and treating every category as equally scary would train users to rubber-stamp prompts instead of reading them.

Don’t picture consent as a checkbox an agent gets once and keeps forever. Picture an airlock: the agent describes exactly what’s about to happen and what happens if the user says no, the user opens or doesn’t open the inner door, and the record of that specific decision stays attached to that specific action — it doesn’t carry over to the next one automatically.

The request/resolve call pair

A concrete consent_request for, say, a purchase:


{
  "runId": "run-2026-07-04-001",
  "consentCategory": "purchase",
  "assistiveMode": "voiceover_aware",
  "prompt": "This will complete a $42.00 checkout on the current page. Continue?",
  "defaultIfDeclined": "Leave the cart as-is and stop before payment.",
  "surfacesUsed": ["browser_dom"]
}

The daemon holds this as a pending, task-scoped approval. The agent then blocks on consent_resolve:


{
  "approvalId": "appr-9f21",
  "status": "granted",
  "decisionSummary": "User confirmed the $42.00 checkout by voice."
}

status is one of exactly three values: granted, declined, or cancelled. A missing, stale, ambiguous, or declined response means stop — not retry with a smaller ask, not fall back to a lower-consent path to get the same result.

After acting (or not), the agent calls action_ledger_record with redacted values describing what was asked, decided, and observed. This is the “last 50 events” a screen-off user can ask about later without depending on a cloud trace — consent and action history live locally and are meant to be inspectable, not just theoretically auditable.

Delivery evidence is not user confirmation

The same discipline applies to how an agent reports what it did. A successful voiceover_transport_announce call proves a channel was used — voiceover_direct_requested, tts_fallback_used, speech_queue_entered, or user_heard_unverified — not that the user actually heard it. Autonomy distinguishes observed, inferred, user-declared, unknown, and not-measured state throughout, specifically so an agent (or this documentation) never overclaims what actually happened.

If you’re building a specialist rather than trusting a general-purpose agent to gate its own risky actions correctly, Autonomy ships a dedicated subagent for exactly this: autonomy:high-consent-guard reviews configured access before external sends, credential or payment flows, 2FA, destructive changes, and support-packet sharing, instead of folding that judgment call into every other specialist’s prompt.

Pitfalls

Don’t bundle multiple high-risk actions into one approval. A single consent_request should describe one action’s category, destination, and reversibility — not “approve these five things at once,” which erodes the user’s ability to say no to just one of them.
Don’t ask for secrets in chat when the user can enter them directly. Categories like credential_or_payment and two_factor_authentication exist to gate the action, not to become a channel for collecting the credential itself.
A declined or cancelled resolution is not a retry signal. Falling back to a different, lower-friction surface to get the same outcome defeats the point of asking.
support_packet_create is local-only by default. Creating a redacted support packet is its own category (support_packet_create); sharing it externally is a separate category (support_packet_share) requiring its own consent — don’t conflate the two.

Bring your own agent — where consent fits in the wider connection flow
MCP tool catalog — the consent and ledger tools in context with the rest of the catalog
Safety classes — the protocol-level model in more depth
Connect your first agent — seeing a consent prompt for the first time