MCP tool catalog
Once an agent is connected (see Overview), a
tools/list call returns Autonomy’s tool catalog — on the order of 170 tools
spanning screen reading, input, browser automation, speech, VoiceOver,
consent, and workflow. That number is the whole reason this page exists: no
agent should try to hold 170 tool schemas in context at once, and no person
should have to read 170 rows to understand what Autonomy can do.
This page groups the catalog by domain — the same domain taxonomy
Autonomy uses internally to scope tool access for specialist subagents — and
gives a few representative real tools per group. It is not the exhaustive
reference; for that, ask a connected agent to run a7y-cli tools list, or
call tools/list directly against the daemon.
Why domains, not one flat list
A one-shot Claude connection was observed, live, deferring every tool’s schema the moment the catalog crossed roughly 160 tools — the classic “tool overload” failure mode where selection quality drops and context bloats. The fix wasn’t fewer tools; some tasks genuinely need all of them somewhere in the system. The fix was scoping: a connecting agent (or a domain-specialist subagent) can request just one domain’s tools plus a small cross-cutting set, instead of the full surface.
full catalog (~170 tools)
│
├─ a7y-cli daemon proxy → everything (default, back-compatible)
└─ a7y-cli daemon proxy --domain X → domain X's tools ∪ cross-cutting set
(consent, action ledger, readiness)The domain groups below are exactly the ones this filtering understands.
Screen — reading and acting on the accessibility tree
The largest domain. Prefers the native accessibility tree over pixels: read what’s on screen, find elements semantically, act on them, and verify the result.
| Tool | Purpose |
|---|---|
screen_read_tree | Read the accessibility tree for the focused or a named window. |
screen_find_element | Locate an element by role, label, or query instead of coordinates. |
screen_click_element_by_query | Click an element resolved semantically, not by x/y. |
screen_type_into_element | Type into a resolved element with built-in waits. |
screen_wait_for_element / screen_wait_for_idle | Block until a target element or app state is ready before acting. |
screen_macro_focus_click_type_verify | A bundled focus → click → type → verify sequence for common form flows. |
screen_capture | Take a screenshot when semantic reading isn’t enough (a capture_screen-consent action — see Consent & safety). |
Browser and DOM — pages, tabs, and page content
Connects browser automation, tab state, and DOM reads to the same evidence model as the screen domain.
| Tool | Purpose |
|---|---|
browser_open_url / browser_session_open_url | Navigate to a URL in a managed tab. |
browser_session_list_tabs / browser_session_switch_to_tab | Enumerate and switch between open tabs. |
dom_extract | Pull structured content out of the current page. |
dom_select_dropdown / dom_scroll_to_text | Semantic page interaction beyond raw clicks. |
browser_cookies | Read cookie state for the current session. |
Guided page exploration (copilot overlay)
A narrower lane for screen-reader-safe, step-at-a-time page exploration — the runtime behind README’s “Guided Live Page Exploration” workflow. Distinct from raw DOM tools because it’s built for narrated, bounded loops rather than one-shot extraction.
| Tool | Purpose |
|---|---|
copilot_attach | Attach to the active guided-overlay UI session. |
copilot_status | Read current overlay connection status. |
copilot_highlight / copilot_cursor | Visually mark the element being discussed. |
copilot_dom_read_text / copilot_dom_find | Read or locate page content within the guided session. |
Speech and voice
Assistive speech recognition (listening) and speech synthesis (speaking), kept separate from the VoiceOver-specific transport below.
| Tool | Purpose |
|---|---|
speak_text | Speak an utterance through the assistive speech output lane. |
speech_list_voices / speech_set_voice | Enumerate and choose a synthesis voice. |
speech_listen_start / speech_listen_stop | Start and stop assistive speech recognition. |
speech_status | Combined recognition + synthesis runtime status. |
VoiceOver transport
A dedicated, deterministic lane for VoiceOver-aware output and navigation,
with delivery evidence and AX focus verification built in — this is what
voiceover_transport_announce uses for the screen-off status updates
described in Bring your own agent.
| Tool | Purpose |
|---|---|
voiceover_transport_announce | Post a concise announcement, preferring direct VoiceOver output, with delivery evidence (not a claim the user heard it). |
voiceover_transport_key / voiceover_transport_scroll | Native keyboard navigation with AX focus capture for verification. |
voiceover_transport_snapshot | Read the focused app/element without claiming spoken output. |
voiceover_transport_session_policy | Get or set session-level narration mode (e.g. frequent spoken updates for a screen-off user). |
Accessibility and readiness
Checks what’s actually available before an agent commits to a plan — the “say what’s granted, missing, blocked, or unknown” principle from the README.
| Tool | Purpose |
|---|---|
accessibility_state | Read the seven-surface accessibility MVP state (keyboard, voice, switch, speech, audio, display). |
doctor_check_permissions | Report macOS permission states (Accessibility, Screen Recording, …). |
assistive_mode_select | Choose the assistive mode to operate under, with consent gates and rationale. |
surface_classify | Classify a target surface (native AX, browser DOM, PDF, 2FA, payment, …) before choosing a tool path. |
driver_policy_check | Check Autonomy policy before falling back to generic driver-style control. |
Consent and the action ledger
The gate for anything in the elevated/critical safety band, and the
durable, screen-reader-friendly record of what happened. Covered in full in
Consent & safety.
| Tool | Purpose |
|---|---|
consent_request | Create a typed, task-scoped consent request before acting. |
consent_resolve | Resolve a pending request as granted, declined, or cancelled. |
action_ledger_record / action_ledger_list | Record and read back what the agent did, asked, and observed. |
barrier_report_record | Record an accessibility barrier as first-class history instead of a generic failure. |
Session and conversation
Claims, narrates in, and releases the in-app Agent Conversation panel — the tool family a bring-your-own agent uses to participate in that surface. Full walkthrough in Bring your own agent.
| Tool | Purpose |
|---|---|
agent_conversation_session_claim | Claim the panel as an external MCP-connected agent. |
agent_conversation_append | Append user-visible transcript content (user/agent/status/failure). |
agent_conversation_session_release | Release the claim without closing the panel. |
agent_conversation_session_get | Read current session state. |
Playbooks
Bounded, auditable workflow execution — a step further than one-off tool calls when a task has explicit state and needs a resumable audit trail.
| Tool | Purpose |
|---|---|
playbook_list | List available playbooks with metadata summaries. |
playbook_start / playbook_status | Start a playbook and read back step state and its audit log. |
playbook_cancel | Cancel a running playbook. |
playbook_export_audit | Export a run’s audit trail as newline-delimited JSON. |
Routing, profiles, and subagents
Supports the ADR-013 pattern of dispatching a task to a domain-scoped specialist instead of one agent holding everything.
| Tool | Purpose |
|---|---|
task_route | Route a task goal to a portable Autonomy agent profile. |
agent_profile_get / agent_profile_list | Read available assistive working-style profiles. |
subagent_task_claim / subagent_task_ack | Claim and acknowledge queued specialist tasks. |
Everything else
A handful of smaller, cross-cutting groups round out the catalog: clipboard_read/clipboard_write
for clipboard access, host_read_status and host_lima_* for host/VM
environment status, preferences_get/preferences_set and
access_config_* for stored user preferences, run_checkpoint/run_resume/replay_export
for durable workflow recovery, and support_packet_create for local, redacted
support handoffs. Every domain-scoped subagent gets consent, the action
ledger, and readiness checks in addition to its own domain — so any acting
specialist can still gate risk and record what it did.
Pitfalls
- Don’t assume one agent should hold the whole catalog. If you’re
building a specialist rather than a general-purpose connection, scope it to
one domain (
a7y-cli daemon proxy --domain screen, for example) plus the cross-cutting set, the same way Autonomy’s own shipped subagents (autonomy:high-consent-guard,autonomy:voiceover-form-review, and others underplugins/autonomy/agents/) are scoped. - Tool counts drift. Treat “~170 tools” as approximate, not a contract —
the surface grows as capabilities are added.
a7y-cli tools listis the live source of truth, not this page. - A subagent’s tool allowlist alone doesn’t shrink its context. In Claude Code specifically, restricting which tools a subagent may call doesn’t stop every tool’s schema from landing in its context — only a domain-scoped MCP connection does that. See Bring your own agent for the detail.
Related
- Overview — how an agent gets connected in the first place
- Bring your own agent — domain-scoped connections and the conversation tools
- Consent & safety — the model behind the consent and ledger tools
- What Autonomy gives agents — the product-level shape of these capabilities