Loading...
Loading...
A scoring scale for evaluating how well a CLI is designed for AI agents, based on the "Rewrite Your CLI for AI Agents" principles.
npx skill4agent add jpoehnelt/skills agent-dx-cli-scaleHuman DX optimizes for discoverability and forgiveness. Agent DX optimizes for predictability and defense-in-depth. — You Need to Rewrite Your CLI for AI Agents
| Score | Criteria |
|---|---|
| 0 | Human-only output (tables, color codes, prose). No structured format available. |
| 1 | |
| 2 | Consistent JSON output across all commands. Errors also return structured JSON. |
| 3 | NDJSON streaming for paginated results. Structured output is the default in non-TTY (piped) contexts. |
| Score | Criteria |
|---|---|
| 0 | Only bespoke flags. No way to pass structured input. |
| 1 | Accepts |
| 2 | All mutating commands accept a raw JSON payload that maps directly to the underlying API schema. |
| 3 | Raw payload is first-class alongside convenience flags. The agent can use the API schema as documentation with zero translation loss. |
| Score | Criteria |
|---|---|
| 0 | Only |
| 1 | |
| 2 | Full schema introspection for all commands — params, types, required fields — as JSON. |
| 3 | Live, runtime-resolved schemas (e.g., from a discovery document) that always reflect the current API version. Includes scopes, enums, and nested types. |
| Score | Criteria |
|---|---|
| 0 | Returns full API responses with no way to limit fields or paginate. |
| 1 | Supports |
| 2 | Field masks on all read commands. Pagination with |
| 3 | Streaming pagination (NDJSON per page). Explicit guidance in context/skill files on field mask usage. The CLI actively protects the agent from token waste. |
| Score | Criteria |
|---|---|
| 0 | No input validation beyond basic type checks. |
| 1 | Validates some inputs, but does not cover agent-specific hallucination patterns (path traversals, embedded query params, double encoding). |
| 2 | Rejects control characters, path traversals ( |
| 3 | Comprehensive hardening: all of the above, plus output path sandboxing to CWD, HTTP-layer percent-encoding, and an explicit security posture — "The agent is not a trusted operator." |
| Score | Criteria |
|---|---|
| 0 | No dry-run mode. No response sanitization. |
| 1 | |
| 2 | |
| 3 | Dry-run plus response sanitization (e.g., via Model Armor) to defend against prompt injection embedded in API data. The full request→response loop is defended. |
| Score | Criteria |
|---|---|
| 0 | Only |
| 1 | A |
| 2 | Structured skill files (YAML frontmatter + Markdown) covering per-command or per-API-surface workflows and invariants. |
| 3 | Comprehensive skill library encoding agent-specific guardrails ("always use --dry-run", "always use --fields"). Skills are versioned, discoverable, and follow a standard like OpenClaw. |
| Range | Rating | Description |
|---|---|---|
| 0–5 | Human-only | Built for humans. Agents will struggle with parsing, hallucinate inputs, and lack safety rails. |
| 6–10 | Agent-tolerant | Agents can use it, but they'll waste tokens, make avoidable errors, and require heavy prompt engineering to compensate. |
| 11–15 | Agent-ready | Solid agent support. Structured I/O, input validation, and some introspection. A few gaps remain. |
| 16–21 | Agent-first | Purpose-built for agents. Full schema introspection, comprehensive input hardening, safety rails, and packaged agent knowledge. |