Your first screen-off task
This is the payoff for everything so far: a task small enough to trust on the
first try, real enough to prove the chain actually works, and verified at
every step rather than taken on faith. If you’ve followed this section in
order, the app is installed, Accessibility and Screen Recording are granted,
a7y-cli doctor is green, and Claude Code is connected. Nothing below needs a
monitor.
The task: ask your agent to open an app, tell you what’s on screen, and confirm it out loud. That’s the smallest complete loop this product offers — see, narrate, and reach the user without them looking at anything — and it’s the same loop every larger screen-off workflow is built from.
Do it
In Claude Code, with Autonomy connected, ask in plain language:
Open TextEdit, then tell me out loud what's on screen.A well-behaved agent following Autonomy’s rules will do roughly this, in order — ask it to narrate each step if you want to watch the reasoning rather than just the result:
Open the app
It calls the screen_open_app tool with {"app": "TextEdit"}. A successful
call returns a plain confirmation (opened app: TextEdit) — not a guess
based on how long it waited.
Read the screen, not a screenshot
It calls screen_read_tree (for example {"app": "TextEdit"}) to get back
a structured snapshot of the window — its accessible elements, not a
picture of them. This is the accessibility-tree-first behavior the
accessibility model exists to enforce:
pixels are a fallback, not the default.
Say what it found, through Autonomy’s speech transport
Not a raw system voice: it should route the update through
voiceover_transport_announce with a real message, not call say or
osascript directly.
Verify after each step — don’t just trust the transcript
After step 1: ask “did TextEdit actually open?” and have it call
screen_read_tree again to confirm — a plain assertion in chat isn’t
evidence, a structural read is.
After step 3, and this is the important one: a successful tool call only proves an announcement was requested through a channel — not that you heard it. If you’re testing the speech transport directly rather than through a conversation, you can call it yourself and inspect the result:
"/Applications/autonomy (a7y).app/Contents/Resources/bin/a7y-cli" \
tools call voiceover_transport_announce \
--args '{"message":"TextEdit is open. The window is empty and ready for text.","priority":"polite","agentId":"claude","deliveryMode":"direct_voiceover_preferred"}' \
--jsonThe response carries delivery evidence, and the honest ones look like this:
| Evidence value | What it actually means |
|---|---|
voiceover_direct_requested | Direct VoiceOver output was attempted. |
tts_fallback_used | Autonomy used its text-to-speech fallback lane instead. |
speech_queue_entered | The message entered the serialized speech queue behind others. |
user_heard_unverified | The runtime requested delivery. Whether you heard it is still unconfirmed. |
None of these mean “the user heard it” — only you can confirm that. If you didn’t hear anything, that’s a real signal to check, not something to ignore because the tool call itself “succeeded.” See Spoken updates not heard if nothing comes through.
Why this task, specifically
It’s deliberately unremarkable. Opening an app and reading its structure is a
read_only/standard-tier action under Autonomy’s
safety-class consent model — nothing here should
trigger a consent prompt. That’s the point: your first task should tell you
the chain works, not test how the product handles risk. Once this works, you
already know how a harder task will look when it needs your consent — it’ll
stop and ask, explicitly, instead of guessing.
What you just proved
If you heard the confirmation, you’ve exercised every link in the map at once: the app was installed with a stable identity, Accessibility let the agent read the real window structure instead of a screenshot, the doctor’s earlier green run meant none of this needed debugging, and the MCP connection carried a real tool call from Claude Code to your Mac and a spoken result back to you — with the evidence to tell requested from heard.
From here, Bring your own agent covers connecting a different MCP client to the same daemon, and the MCP tool catalog covers everything else that’s reachable beyond opening one app and reading one window.