Unified tool surface
All interaction tools below accept a
parameter and auto-dispatch iOS vs Android based on its shape (UUID → iOS simulator, anything else → Android adb serial). You use the same tool names on both platforms.
For platform-specific caveats (Metro
, locked-screen describe errors, etc.), see § 9 Platform-specific notes at the bottom.
1. Before You Start
If you delegate simulator tasks to sub-agents, make sure they have MCP permissions.
Use
to get a target id. Results are tagged with
(
or
); booted/ready devices come first. Pick the first entry that matches the platform you need — if none are ready, call
with
(iOS) or
(Android). See
argent-ios-simulator-setup
/
argent-android-emulator-setup
for full setup flow.
Load tool schemas before first use. Gesture tools (
,
,
,
,
) may be deferred — their parameter schemas are not loaded until fetched. Always use ToolSearch to load the schemas of all gesture tools you plan to use
before calling any of them. If you skip this step, parameters may be coerced to strings instead of numbers, causing validation errors.
2. Best Practices
- Always refer to tapping_rule from your argent.md rule before tapping.
- Before performing interactions, consider whether they can be dispatched sequentially - more on that in .
- Use for lists/scrolling, not , unless you need non-linear movement. Consider whether you need multiple swipes, if yes - use .
- Tap a text field before typing — on iOS try first then fall back to ; on Android use directly ( is iOS-only).
- Coordinates are normalized — always 0.0–1.0, not pixels.
- For app navigation, prefer first. It works on any screen without app restart. Do not navigate from screenshots on regular in-app screens unless failed to expose a reliable target. Use only when you need app-scoped UIKit properties.
3. Opening Apps
Never navigate to an app by tapping home-screen icons. Use
or
— they are instant and reliable.
launch-app — by bundle ID
json
{ "udid": "<UDID>", "bundleId": "com.apple.MobileSMS" }
Common IDs:
(Messages),
(Safari),
(Settings),
,
,
,
,
com.apple.MobileAddressBook
(Contacts)
open-url — by URL scheme
json
{ "udid": "<UDID>", "url": "messages://" }
Common schemes:
,
,
,
,
,
(Safari)
4. Choosing the Right Tool
| Action | Tool | Notes |
|---|
| Multiple actions | | Batch steps in one call (no intermediate screenshots) |
| Open an app | | Always — never tap home-screen icons |
| Restart an app | | Terminate and relaunch by bundle ID |
| Open URL/scheme | | Web pages, deep links, URL schemes |
| Single tap | | Buttons, links, checkboxes |
| Scroll/swipe | | Straight-line scroll or swipe |
| Long press | | Context menus, drag start |
| Drag & drop | | Complex drag interactions |
| Pinch/zoom | | Two-finger pinch with auto-interpolation |
| Rotation | | Two-finger rotation with auto-interpolation |
| Custom gesture | | Arbitrary touch sequences, optional interpolation |
| Hardware key | | Home, back, power, volume, appSwitch, actionButton |
| Type text (fast) | | iOS only. Form fields — uses clipboard |
| Type text | | iOS+Android. Fallback when paste fails; supports Enter, Escape, arrows |
| Rotate device | | Orientation changes |
5. Finding Tap Targets
IMPORTANT. When moved to a different screen after an action or do not know the coordinates of component, always perform proper discovery first.
| App type | Discovery tool | What it returns |
|---|
| Target app discovery | | Accessibility element tree for the current device screen (iOS AX-service or Android uiautomator) with normalized frame coordinates. Works on any app, system dialogs, and Home screen — no app restart or required |
| React Native | | React component tree with names, text, testID, and (tap: x,y) |
| App-scoped native | | Low-level app-scoped accessibility elements with normalized and raw coordinates; requires |
| Permission / system modal overlay | | detects system dialogs automatically and returns dialog buttons with tap coordinates. Fall back to only if does not expose the controls |
| Final visual fallback | | Use only when discovery tools cannot inspect the current UI reliably. Do not derive routine in-app navigation targets from screenshots |
Point follow-up native diagnostics after you already have a candidate point:
native-user-interactable-view-at-point
: deepest native view that would receive touch at a known raw iOS point; requires
- : deepest visible native view at a known raw iOS point; requires
If Fails
Read the exact error and choose the action that matches it:
- Error mentions not available or daemon startup failure:
the ax-service daemon could not start. Check that the simulator is booted. Use as a temporary fallback, or use with an explicit if the app has native devtools injected.
- returns an empty element list:
the screen may be blank, loading, or showing content without accessibility labels. Use to see what is visible, then retry after the content has loaded.
- succeeds but is not detailed enough for a React Native app:
use next.
- You need app-scoped inspection with full UIKit properties (, ):
use with an explicit . This requires native devtools (dylib) injection — call first if needed.
- You already have a candidate point and want to confirm what would actually receive touch:
use
native-user-interactable-view-at-point
. Use when you want the visually deepest view instead of the hit-test target.
6. Tool Usage
gesture-tap — Single tap at a point
json
{ "udid": "<UDID>", "x": 0.5, "y": 0.5 }
Coordinates:
= left/top,
= right/bottom.
Before tapping near the bottom of the screen in React Native apps, check that "Open Debugger to View Warnings" banners are not visible — tapping them breaks the debugger connection. Close them with the X icon if present.
gesture-swipe — Straight-line gesture
json
{ "udid": "<UDID>", "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 }
Swipe
up (
) = scroll content
down. Default duration: 300ms. Optional:
for slower swipe.
gesture-pinch — Two-finger pinch
json
{ "udid": "<UDID>", "centerX": 0.5, "centerY": 0.5, "startDistance": 0.2, "endDistance": 0.6 }
All values are normalized 0.0–1.0 (fractions of screen, not pixels) — same as all other gesture tools.
means fingers start 20% of the screen apart;
means they end 60% apart.
startDistance < endDistance
= pinch out (zoom in).
startDistance > endDistance
= pinch in (zoom out). Defaults:
(horizontal),
. Optional:
for vertical axis,
for slower pinch.
gesture-rotate — Two-finger rotation
json
{
"udid": "<UDID>",
"centerX": 0.5,
"centerY": 0.5,
"radius": 0.15,
"startAngle": 0,
"endAngle": 90
}
All positions and radius are normalized 0.0–1.0 (fractions of screen, not pixels).
means each finger is 15% of the screen away from center.
= clockwise. Default duration: 300ms. Optional:
for slower rotation.
gesture-custom — Custom touch sequence
For long-press, drag-and-drop, and other complex sequences, see
references/gesture-examples.md
. Set
to auto-generate smooth intermediate Move events between keyframes.
button — Hardware button press
json
{ "udid": "<UDID>", "button": "home" }
paste — Type text into focused field (iOS only)
json
{ "udid": "<UDID>", "text": "Hello, world!" }
Tap the field first, then paste. Fall back to
if it doesn't work. On Android the call is rejected by the capability gate ("Tool 'paste' is not supported on android") — use
directly.
keyboard — Type text or press special keys
json
{ "udid": "<UDID>", "text": "search query", "key": "enter" }
Special keys:
,
,
,
,
,
,
,
,
,
–
. Optional:
between keystrokes (default 50ms).
rotate — Change orientation
json
{ "udid": "<UDID>", "orientation": "LandscapeLeft" }
7. Screenshots
Use the explicit
tool only when:
- You need the initial screen state before any action.
- The auto-attached screenshot shows a transitional or loading frame.
- You require extra context.
- You want to check state after a delay (e.g. waiting for a network response).
- A permission dialog, system alert, or native modal overlay is visible and did not expose reliable targets.
When using
for permission or native modal navigation:
- Do not switch to screenshot-driven navigation just because a modal is visible. On regular app screens and in-app modals, keep using .
- Prefer obvious, centered alert buttons such as , , , , or .
- Tap one control at a time and inspect the returned auto-screenshot before doing anything else.
- After the modal is dismissed, return to normal discovery with , , or .
Optional rotation parameter:
{ "udid": "<UDID>", "rotation": "LandscapeLeft" }
— rotates the capture without changing simulator orientation.
Screenshots are downscaled by default (30% of original resolution) to reduce context size.
accepts values from 0.01 to 1.0. If UI elements are hard to read or you need to inspect fine detail, pass
to get full resolution:
{ "udid": "<UDID>", "scale": 1.0 }
.
Troubleshooting
| Problem | Solution |
|---|
| Screenshot times out | Restart the simulator-server via tool |
| No booted iOS simulator | Call with the iOS |
| No ready Android device | Call with |
8. Action Sequencing with
Use
to batch multiple interaction steps into
a single tool call. Only one screenshot is returned — after all steps complete. Use cases:
scrolling multiple times, typing and submitting automatically, known sequence of multiple taps, rotating device back and forth.
Do
not use
when any step depends on observing the result of a previous step
Use cases
Use the sequencing when:
- Knowing that some action needs multiple steps without necessarily immediate insight of screenshot
- "scroll to bottom", "scroll to top", "scroll to do X" -> sequence scroll 3-5 times
- form interactions, "clear and retype field" -> you may use triple-tap to select all, type new value
- "submit form" → fill all fields in sequence, tap submit
- "go back to X" → defined tap sequence for the navigation
Allowed tools inside
The
is shared — do
not include it in each step's
. Optional
per step (default 100ms).
Examples
Scroll down three times:
json
{
"udid": "<UDID>",
"steps": [
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } },
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } },
{ "tool": "gesture-swipe", "args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 } }
]
}
Type into a focused field and submit:
json
{
"udid": "<UDID>",
"steps": [
{ "tool": "keyboard", "args": { "text": "hello world" } },
{ "tool": "keyboard", "args": { "key": "enter" } }
]
}
Tap a known button, then scroll down:
json
{
"udid": "<UDID>",
"steps": [
{ "tool": "gesture-tap", "args": { "x": 0.5, "y": 0.15 } },
{
"tool": "gesture-swipe",
"args": { "fromX": 0.5, "fromY": 0.7, "toX": 0.5, "toY": 0.3 },
"delayMs": 300
}
]
}
Stops on the first error and returns partial results.
9. Platform-specific notes
Android
- Metro reachability: run
adb reverse tcp:8081 tcp:8081
on the device before the RN app starts, or Metro won't be reachable from the device. See for the full workflow. Re-run if the device restarts.
- First-launch permission prompts: on Android always installs with so runtime permissions are pre-granted on first launch — no flag to pass.
- Locked screen / secure surfaces: throws a clear error if it can't capture (keyguard, DRM, Play Integrity). Unlock the device or fall back to .
- APK vs .app in : pass absolute path on Android; directory on iOS.
iOS
(no iOS-only gotchas collected here yet — add them as they come up)