The runtime

This page is the mental model for what actually runs when an agent calls an Autonomy tool: one daemon, composed of a handful of Rust subsystems wired together directly, sitting between the agent and macOS’s accessibility APIs. Understanding this shape helps explain both why the runtime feels fast and why its behavior is consistent no matter which agent is driving it.

One process, composed in-process

Autonomy’s core subsystems — the accessibility backend, the relay hub, the playbook engine, and MCP tool dispatch — all live in Rust and are wired together directly inside one daemon process, rather than each being a separate service reached over its own RPC hop. That’s a deliberate choice: fewer moving parts, simpler error handling, and no coordination overhead between subsystems that all belong to the same runtime anyway.

This does not mean everything everywhere is one process. Agent hosts still connect over MCP as real external clients, and browser integrations still run inside the browser’s own process, reached through the relay. The in-process decision is specifically about how the daemon composes its own internals, not a claim that the whole system is a single process end to end.

The layers

a7y-ax is not one implementation — it’s a trait (AccessibilityBackend) with a macOS implementation behind it today, and Linux/Windows implementations behind platform gates at different levels of validation. Higher layers depend on the trait, never on a specific platform’s API, so capability differences show up as explicit, discoverable capability flags instead of silent behavioral forks.

The relay hub (a7y-relay) is intentionally a plain, inspectable structure — a shared lock-protected state map plus one outbound queue per connected client — rather than a more abstract actor framework. Session snapshots, active-tab tracking, and backpressure all live in that one visible place, which matters when something needs debugging: there’s one hub to look at, not a scattered set of actor mailboxes.

Why latency-critical paths stay fast

A screen-off user experiences latency directly, as silence — every extra round trip an agent needs is time spent not knowing what’s happening. The runtime’s answer is to own more of the polling, verification, and fallback logic itself rather than pushing it back to the agent as extra turns:

The agent discovers a narrow tool set instead of loading the entire catalog.
It reads accessibility state once, before acting.
It calls a tool with an explicit expected outcome.
The daemon performs the action AX-first, then verifies the postcondition — polling accessibility state, falling back to a visual marker or OCR only when AX can’t confirm it, and giving up cleanly on a timeout rather than guessing.
The result comes back already classified — observed success, a classified failure, or an explicit “uncertain” — so the agent spends its next turn deciding what to do, not re-checking whether the last thing worked.

The goal stated plainly: fewer agent turns should be needed because the runtime owns polling, rebinding, and failure classification, not because the agent got lucky.

A concrete example: verifying a click actually landed

An agent asks the runtime to click a “Save” button and expects a confirmation dialog to appear. The daemon performs the click through the accessibility backend, then — instead of returning success immediately — polls the focused window and tree state until the expected dialog shows up, a timeout elapses, or the state comes back ambiguous. Only an accessibility read that can’t resolve the question falls through to a screenshot-and-OCR check. The agent gets back one of three honest answers: verified success with evidence, a classified failure (blocked, mismatch, timeout), or an explicit “couldn’t verify” — never a silent assumption that the click worked.

Common misconceptions

“In-process composition means no real network boundaries exist.” Browser clients and any MCP-connected agent are still real external processes crossing a real boundary; only the Rust subsystems inside the daemon are composed directly.
“The relay hub is a message queue like any other.” It’s explicitly a session-aware router with backpressure and safety-ceiling enforcement built in, not a generic pub/sub layer.
“Verification is optional polish.” It’s the load-bearing reason the runtime can claim an action worked at all — see safety classes for how that same discipline extends to the consent decision itself.

For why the daemon is reached over MCP rather than app-side code, see MCP-native architecture. For the full tool surface these subsystems expose, see the MCP tool catalog.