What is Autonomy
Autonomy is a local MCP/CLI runtime that gives AI agents accessibility-native computer use on macOS. It runs as a daemon on the user’s machine, exposes tools over the Model Context Protocol, and lets an agent — Codex, Claude Code, or any MCP client — see the screen, act on it, speak, and browse through the user’s real assistive technology setup instead of guessing from pixels.
The people this exists for are not an edge case. The runtime’s first, most-tested scenario is a blind or screen-off user who works with a coding agent entirely by voice: no monitor glance, no chat window to find, no assumption that someone is watching the screen.
The problem it solves
Most computer-use and coding-agent stacks treat accessibility as a finishing touch, if they touch it at all. They click screen coordinates, infer intent from screenshots, retry until something looks right, and leave a blind or motor-impaired user to bridge the gap with their own screen reader or switch device — fighting the agent’s assumptions instead of being served by them.
That is backwards. Disabled users should not have to adapt to agents; agents should adapt to the user’s assistive technology, permissions, preferences, and pace. Autonomy is the layer that makes that possible: it reads accessibility state before acting, prefers semantic actions over pointer control, asks before high-risk steps, and leaves redacted evidence explaining what actually happened.
Autonomy is not trying to become another agent. It is the accessibility layer other agents use. The Codex and Claude Code plugins are proof integrations, not the product.
How it differs from generic “computer use”
| Generic computer-use agent | Autonomy |
|---|---|
| Clicks pixels, infers state from screenshots | Reads the accessibility tree first; prefers AX/keyboard actions |
| Treats accessibility as a fallback | Accessibility state gates what the agent does next |
| Assumes a sighted user is watching | Built for screen-off, VoiceOver-driven sessions |
| Fires actions and hopes | Verifies postconditions (AX read, visual marker, or explicit failure) |
| One risk model per tool author | One consent gate, uniform across every tool and every agent |
A mental model
Think of Autonomy less like “another chatbot” and more like a permission-aware translator that sits between an agent’s intent and the operating system:
The agent never touches the operating system directly. Every read and every action passes through the same daemon, so the same rules apply whether the agent is Codex, Claude Code, or something brought by the user.
A concrete example
A blind user asks their coding agent to review a job-application form before
submitting it. A generic agent might screenshot the page, guess at field
purpose, and click “submit.” Through Autonomy, the agent instead reads
accessibility_state, learns VoiceOver is running, selects a VoiceOver-aware
mode, reads labels/roles/values/required-state from the accessibility tree,
announces what it found, and stops before the irreversible submit action to
ask the user directly — because form submission is exactly the kind of
elevated action the consent model is built to
gate.
Common misconceptions
- “It’s a new agent.” It isn’t. Autonomy has no opinion about which model or agent framework you use — see MCP-native architecture for how any agent connects.
- “Accessibility support means labels exist.” Having accessible labels is necessary but not sufficient; see the accessibility model for what “accessibility-native” actually requires at runtime.
- “A successful tool call means the user perceived it.” Command success and user awareness are tracked separately throughout the runtime — a distinction that matters most for spoken output.
For the install and first-session path, start at Getting Started. To see the exact tool surface an agent gets, see the MCP tool catalog.