Skip to Content
Connect an AgentMCP tool catalog

MCP tool catalog

Once an agent is connected (see Overview), a tools/list call returns Autonomy’s tool catalog — on the order of 170 tools spanning screen reading, input, browser automation, speech, VoiceOver, consent, and workflow. That number is the whole reason this page exists: no agent should try to hold 170 tool schemas in context at once, and no person should have to read 170 rows to understand what Autonomy can do.

This page groups the catalog by domain — the same domain taxonomy Autonomy uses internally to scope tool access for specialist subagents — and gives a few representative real tools per group. It is not the exhaustive reference; for that, ask a connected agent to run a7y-cli tools list, or call tools/list directly against the daemon.

Why domains, not one flat list

A one-shot Claude connection was observed, live, deferring every tool’s schema the moment the catalog crossed roughly 160 tools — the classic “tool overload” failure mode where selection quality drops and context bloats. The fix wasn’t fewer tools; some tasks genuinely need all of them somewhere in the system. The fix was scoping: a connecting agent (or a domain-specialist subagent) can request just one domain’s tools plus a small cross-cutting set, instead of the full surface.

full catalog (~170 tools) ├─ a7y-cli daemon proxy → everything (default, back-compatible) └─ a7y-cli daemon proxy --domain X → domain X's tools ∪ cross-cutting set (consent, action ledger, readiness)

The domain groups below are exactly the ones this filtering understands.

Screen — reading and acting on the accessibility tree

The largest domain. Prefers the native accessibility tree over pixels: read what’s on screen, find elements semantically, act on them, and verify the result.

ToolPurpose
screen_read_treeRead the accessibility tree for the focused or a named window.
screen_find_elementLocate an element by role, label, or query instead of coordinates.
screen_click_element_by_queryClick an element resolved semantically, not by x/y.
screen_type_into_elementType into a resolved element with built-in waits.
screen_wait_for_element / screen_wait_for_idleBlock until a target element or app state is ready before acting.
screen_macro_focus_click_type_verifyA bundled focus → click → type → verify sequence for common form flows.
screen_captureTake a screenshot when semantic reading isn’t enough (a capture_screen-consent action — see Consent & safety).

Browser and DOM — pages, tabs, and page content

Connects browser automation, tab state, and DOM reads to the same evidence model as the screen domain.

ToolPurpose
browser_open_url / browser_session_open_urlNavigate to a URL in a managed tab.
browser_session_list_tabs / browser_session_switch_to_tabEnumerate and switch between open tabs.
dom_extractPull structured content out of the current page.
dom_select_dropdown / dom_scroll_to_textSemantic page interaction beyond raw clicks.
browser_cookiesRead cookie state for the current session.

Guided page exploration (copilot overlay)

A narrower lane for screen-reader-safe, step-at-a-time page exploration — the runtime behind README’s “Guided Live Page Exploration” workflow. Distinct from raw DOM tools because it’s built for narrated, bounded loops rather than one-shot extraction.

ToolPurpose
copilot_attachAttach to the active guided-overlay UI session.
copilot_statusRead current overlay connection status.
copilot_highlight / copilot_cursorVisually mark the element being discussed.
copilot_dom_read_text / copilot_dom_findRead or locate page content within the guided session.

Speech and voice

Assistive speech recognition (listening) and speech synthesis (speaking), kept separate from the VoiceOver-specific transport below.

ToolPurpose
speak_textSpeak an utterance through the assistive speech output lane.
speech_list_voices / speech_set_voiceEnumerate and choose a synthesis voice.
speech_listen_start / speech_listen_stopStart and stop assistive speech recognition.
speech_statusCombined recognition + synthesis runtime status.

VoiceOver transport

A dedicated, deterministic lane for VoiceOver-aware output and navigation, with delivery evidence and AX focus verification built in — this is what voiceover_transport_announce uses for the screen-off status updates described in Bring your own agent.

ToolPurpose
voiceover_transport_announcePost a concise announcement, preferring direct VoiceOver output, with delivery evidence (not a claim the user heard it).
voiceover_transport_key / voiceover_transport_scrollNative keyboard navigation with AX focus capture for verification.
voiceover_transport_snapshotRead the focused app/element without claiming spoken output.
voiceover_transport_session_policyGet or set session-level narration mode (e.g. frequent spoken updates for a screen-off user).

Accessibility and readiness

Checks what’s actually available before an agent commits to a plan — the “say what’s granted, missing, blocked, or unknown” principle from the README.

ToolPurpose
accessibility_stateRead the seven-surface accessibility MVP state (keyboard, voice, switch, speech, audio, display).
doctor_check_permissionsReport macOS permission states (Accessibility, Screen Recording, …).
assistive_mode_selectChoose the assistive mode to operate under, with consent gates and rationale.
surface_classifyClassify a target surface (native AX, browser DOM, PDF, 2FA, payment, …) before choosing a tool path.
driver_policy_checkCheck Autonomy policy before falling back to generic driver-style control.

The gate for anything in the elevated/critical safety band, and the durable, screen-reader-friendly record of what happened. Covered in full in Consent & safety.

ToolPurpose
consent_requestCreate a typed, task-scoped consent request before acting.
consent_resolveResolve a pending request as granted, declined, or cancelled.
action_ledger_record / action_ledger_listRecord and read back what the agent did, asked, and observed.
barrier_report_recordRecord an accessibility barrier as first-class history instead of a generic failure.

Session and conversation

Claims, narrates in, and releases the in-app Agent Conversation panel — the tool family a bring-your-own agent uses to participate in that surface. Full walkthrough in Bring your own agent.

ToolPurpose
agent_conversation_session_claimClaim the panel as an external MCP-connected agent.
agent_conversation_appendAppend user-visible transcript content (user/agent/status/failure).
agent_conversation_session_releaseRelease the claim without closing the panel.
agent_conversation_session_getRead current session state.

Playbooks

Bounded, auditable workflow execution — a step further than one-off tool calls when a task has explicit state and needs a resumable audit trail.

ToolPurpose
playbook_listList available playbooks with metadata summaries.
playbook_start / playbook_statusStart a playbook and read back step state and its audit log.
playbook_cancelCancel a running playbook.
playbook_export_auditExport a run’s audit trail as newline-delimited JSON.

Routing, profiles, and subagents

Supports the ADR-013 pattern of dispatching a task to a domain-scoped specialist instead of one agent holding everything.

ToolPurpose
task_routeRoute a task goal to a portable Autonomy agent profile.
agent_profile_get / agent_profile_listRead available assistive working-style profiles.
subagent_task_claim / subagent_task_ackClaim and acknowledge queued specialist tasks.

Everything else

A handful of smaller, cross-cutting groups round out the catalog: clipboard_read/clipboard_write for clipboard access, host_read_status and host_lima_* for host/VM environment status, preferences_get/preferences_set and access_config_* for stored user preferences, run_checkpoint/run_resume/replay_export for durable workflow recovery, and support_packet_create for local, redacted support handoffs. Every domain-scoped subagent gets consent, the action ledger, and readiness checks in addition to its own domain — so any acting specialist can still gate risk and record what it did.

Pitfalls

  • Don’t assume one agent should hold the whole catalog. If you’re building a specialist rather than a general-purpose connection, scope it to one domain (a7y-cli daemon proxy --domain screen, for example) plus the cross-cutting set, the same way Autonomy’s own shipped subagents (autonomy:high-consent-guard, autonomy:voiceover-form-review, and others under plugins/autonomy/agents/) are scoped.
  • Tool counts drift. Treat “~170 tools” as approximate, not a contract — the surface grows as capabilities are added. a7y-cli tools list is the live source of truth, not this page.
  • A subagent’s tool allowlist alone doesn’t shrink its context. In Claude Code specifically, restricting which tools a subagent may call doesn’t stop every tool’s schema from landing in its context — only a domain-scoped MCP connection does that. See Bring your own agent for the detail.
Last updated on