Consent and safety
Every agent connected to Autonomy — the default claude launcher or a fully
custom bring-your-own agent — can read screen state, click UI, type into
fields, and potentially trigger something destructive or expensive. Autonomy
doesn’t leave that risk to each agent’s own judgment. It classifies actions at
the protocol level and routes anything risky through an explicit consent gate
before it happens, so the person on the other end of the Mac — who may not be
watching the screen at all — stays the one deciding, not just the one
theoretically able to object afterward.
This page covers the two-layer model behind that gate: SafetyClass, the
protocol-wide four-tier classification every action carries, and
ConsentCategory, the much more specific category attached to an actual
consent request. Then it walks through the consent_request /
consent_resolve call pair and the action ledger that records what happened.
Why a protocol-level model instead of per-tool checks
The alternative — burying an approval check inside each individual tool — was explicitly rejected. ADR-007 ties consent and trust policy to a shared classification instead, so the same risk model applies consistently whether the action came through relay traffic, an MCP tool call, a playbook step, or a future remote-control path. Mutating actions can be blocked, prompted, or audited without rewriting each tool’s logic individually, and contributors have to treat the classification as a protocol contract — not a UI nicety a new tool can quietly skip.
SafetyClass: four tiers
Every action-performing method in the protocol carries one of four values:
| Class | Approval required | Examples |
|---|---|---|
read_only | None | read_state, read_cache, read_tree, find_element |
standard | Logged; auto-approved if the user is idle | Click a button, select a menu item |
elevated | Explicit user approval required | Type text into fields, submit forms, toggle settings |
critical | Always confirmed, plus an audit trail | Delete actions, file operations, system settings changes |
elevated and critical are the two classes that trigger the consent flow
below; standard can be auto-approved while the user is idle, and read_only
never blocks.
ConsentCategory: what the request is actually about
SafetyClass says how risky; ConsentCategory says what kind of risky.
When an agent calls consent_request, it names one of 33 typed categories —
far more specific than the four safety tiers, because “ask before this” reads
very differently for a purchase than for a permission prompt. Grouped by
theme:
- Money and identity —
purchase,credential_or_payment,two_factor_authentication,credential_context_present - Destructive or account-level change —
delete_or_destructive_change,account_or_security_change,privacy_setting_change,setting_change,permission_open_settings - Leaving the user’s control —
send_message,send_email,external_send,external_handoff,submit_form - Capture and sharing —
capture_screen,capture_sensitive_app,screen_share_external,camera_share_external,microphone_share_external,support_packet_share - Presence of sensitive context —
workplace_data_present,bystander_or_third_party_data_present - Input and attention —
text_entry,microphone_start,microphone_or_speech,speech_recognition_start,foreground_takeover,focus_change,pointer_movement - Low-risk or local —
local_read_only,local_visual_fallback,support_packet_create,generic_driver_fallback
That last group matters as much as the risky ones: not everything routes through a blocking approval, and treating every category as equally scary would train users to rubber-stamp prompts instead of reading them.
Mental model: consent as an airlock, not a signature line
Don’t picture consent as a checkbox an agent gets once and keeps forever. Picture an airlock: the agent describes exactly what’s about to happen and what happens if the user says no, the user opens or doesn’t open the inner door, and the record of that specific decision stays attached to that specific action — it doesn’t carry over to the next one automatically.
The request/resolve call pair
A concrete consent_request for, say, a purchase:
{
"runId": "run-2026-07-04-001",
"consentCategory": "purchase",
"assistiveMode": "voiceover_aware",
"prompt": "This will complete a $42.00 checkout on the current page. Continue?",
"defaultIfDeclined": "Leave the cart as-is and stop before payment.",
"surfacesUsed": ["browser_dom"]
}The daemon holds this as a pending, task-scoped approval. The agent then
blocks on consent_resolve:
{
"approvalId": "appr-9f21",
"status": "granted",
"decisionSummary": "User confirmed the $42.00 checkout by voice."
}status is one of exactly three values: granted, declined, or
cancelled. A missing, stale, ambiguous, or declined response means stop —
not retry with a smaller ask, not fall back to a lower-consent path to get
the same result.
After acting (or not), the agent calls action_ledger_record with redacted
values describing what was asked, decided, and observed. This is the “last 50
events” a screen-off user can ask about later without depending on a cloud
trace — consent and action history live locally and are meant to be
inspectable, not just theoretically auditable.
Delivery evidence is not user confirmation
The same discipline applies to how an agent reports what it did. A
successful voiceover_transport_announce call proves a channel was used —
voiceover_direct_requested, tts_fallback_used, speech_queue_entered, or
user_heard_unverified — not that the user actually heard it. Autonomy
distinguishes observed, inferred, user-declared, unknown, and not-measured
state throughout, specifically so an agent (or this documentation) never
overclaims what actually happened.
If you’re building a specialist rather than trusting a general-purpose agent
to gate its own risky actions correctly, Autonomy ships a dedicated
subagent for exactly this: autonomy:high-consent-guard reviews configured
access before external sends, credential or payment flows, 2FA, destructive
changes, and support-packet sharing, instead of folding that judgment call
into every other specialist’s prompt.
Pitfalls
- Don’t bundle multiple high-risk actions into one approval. A single
consent_requestshould describe one action’s category, destination, and reversibility — not “approve these five things at once,” which erodes the user’s ability to say no to just one of them. - Don’t ask for secrets in chat when the user can enter them directly.
Categories like
credential_or_paymentandtwo_factor_authenticationexist to gate the action, not to become a channel for collecting the credential itself. - A declined or cancelled resolution is not a retry signal. Falling back to a different, lower-friction surface to get the same outcome defeats the point of asking.
support_packet_createis local-only by default. Creating a redacted support packet is its own category (support_packet_create); sharing it externally is a separate category (support_packet_share) requiring its own consent — don’t conflate the two.
Related
- Bring your own agent — where consent fits in the wider connection flow
- MCP tool catalog — the consent and ledger tools in context with the rest of the catalog
- Safety classes — the protocol-level model in more depth
- Connect your first agent — seeing a consent prompt for the first time