agent-native-architecture

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<why_now>
<why_now>

Why Now

为何现在适合开发?

Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
The surprising discovery: a really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. </why_now>
<core_principles>
如今,软件Agent的运行已足够可靠。Claude Code证明,具备bash和文件工具访问权限、能循环运行直至达成目标的大语言模型(LLM),可以自主完成复杂的多步骤任务。
令人惊讶的发现:一个优秀的编码Agent实际上也是优秀的通用Agent。让Claude Code重构代码库的架构,同样可以让Agent帮你整理文件、管理阅读清单,或自动化工作流。
Claude Code SDK让这一切触手可及。你可以构建这样的应用:其功能并非由你编写的代码实现,而是由具备工具的Agent循环运行直至达成的结果。
这开辟了一个新领域:将Claude Code的工作模式应用到编码之外的众多场景中。 </why_now>
<core_principles>

Core Principles

核心原则

1. Parity

1. 对等性(Parity)

Whatever the user can do through the UI, the agent should be able to achieve through tools.
This is the foundational principle. Without it, nothing else matters.
Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."
If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.
The fix: Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.
This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can achieve the same outcomes. Sometimes that's a single tool (
create_note
). Sometimes it's composing primitives (
write_file
to a notes directory with proper formatting).
The discipline: When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.
A capability map helps:
User ActionHow Agent Achieves It
Create a note
write_file
to notes directory, or
create_note
tool
Tag a note as urgent
update_file
metadata, or
tag_note
tool
Search notes
search_files
or
search_notes
tool
Delete a note
delete_file
or
delete_note
tool
The test: Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?

用户通过UI能完成的所有操作,Agent都应能通过工具实现。
这是最基础的原则,没有它,其他一切都无从谈起。
想象你开发了一款笔记应用,拥有美观的界面用于创建、整理和标记笔记。用户向Agent请求:“创建一份会议总结笔记,并标记为紧急。”
如果你只为UI开发了创建笔记的功能,却没有给Agent配备对应的能力,那么Agent就会陷入困境。它可能会道歉或询问更多细节,但无法提供帮助——尽管人类用户通过界面可以轻松完成这个操作。
解决方案: 确保Agent拥有可完成UI所有操作的工具(或工具组合)。
这并不要求UI按钮与工具一一对应,而是要保证Agent能达成相同的结果。有时只需一个工具(如
create_note
),有时则需要组合原语(如使用
write_file
将内容写入笔记目录并按正确格式保存)。
执行规范: 每当添加新的UI功能时,都要问自己:Agent能否达成这个结果?如果不能,就添加必要的工具或原语。
能力映射表会有所帮助:
用户操作Agent实现方式
创建笔记使用
write_file
写入笔记目录,或调用
create_note
工具
将笔记标记为紧急更新
update_file
元数据,或调用
tag_note
工具
搜索笔记使用
search_files
search_notes
工具
删除笔记使用
delete_file
delete_note
工具
测试方法: 随机选择一个用户可通过UI完成的操作,描述给Agent,看它能否达成预期结果。

2. Granularity

2. 粒度(Granularity)

Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.
A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.
A feature is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.
Less granular (limits the agent):
Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor
More granular (empowers the agent):
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
        determine appropriate locations based on content and recency,
        and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
       checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt
The key shift: The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
The test: To change how a feature behaves, do you edit prose or refactor code?

优先使用原子原语。功能是Agent循环运行所达成的结果。
工具是一种原子能力:读取文件、写入文件、执行bash命令、存储记录、发送通知。
功能并非你编写的函数,而是你在提示词中描述的结果,由具备工具的Agent循环运行直至达成。
粒度较粗(限制Agent能力):
Tool: classify_and_organize_files(files)
→ 你编写了决策逻辑
→ Agent执行你的代码
→ 如需更改行为,你需要重构代码
粒度较细(赋能Agent):
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "整理用户的下载文件夹。分析每个文件,
        根据内容和最近使用情况确定合适的存储位置,
        并将文件移动到对应位置。"
Agent: 循环运行——读取文件、做出判断、移动文件、
       检查结果——直至文件夹整理完成。
→ Agent自主做出决策
→ 如需更改行为,你只需编辑提示词
关键转变: Agent是带着判断力去追求结果,而非执行预设的步骤序列。它可能会遇到意外的文件类型、调整策略,或询问澄清问题。循环会持续到结果达成。
工具的原子性越强,Agent的使用灵活性就越高。如果你将决策逻辑捆绑到工具中,就相当于把判断权又交回了代码。
测试方法: 要更改某个功能的行为,你是编辑提示词还是重构代码?

3. Composability

3. 可组合性(Composability)

With atomic tools and parity, you can create new features just by writing new prompts.
This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.
Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:
"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."
The agent uses
list_files
,
read_file
, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.
This works for developers and users. You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.
The constraint: This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.
The test: Can you add a new feature by writing a new prompt section, without adding new code?

具备原子工具和对等性后,你只需编写新的提示词就能创建新功能。
这是前两个原则带来的回报。当你的工具是原子化的,且Agent能完成用户可做的所有操作时,新功能就只是新的提示词。
想要一个“每周回顾”功能,总结活动并建议优先级?只需编写一个提示词:
"回顾本周修改的文件。总结关键变化。根据未完成事项和即将到来的截止日期,为下周建议三个优先级任务。"
Agent会使用
list_files
read_file
工具和自身判断力来完成这项任务。你无需编写每周回顾的代码,只需描述结果,Agent就会循环运行直至达成。
这对开发者和用户都适用。 你可以通过添加提示词发布新功能。用户可以通过修改提示词或创建自己的提示词来自定义行为。比如“当我说‘归档此文件’时,始终将其移动到我的行动文件夹并标记为紧急”,这样的用户级提示词就能扩展应用的功能。
约束条件: 只有当工具足够原子化,能以你未预料到的方式组合,且Agent具备与用户对等的能力时,这种模式才有效。如果工具包含过多逻辑,或Agent无法访问关键能力,组合性就会失效。
测试方法: 你能否仅通过编写新的提示词部分来添加新功能,而无需添加新代码?

4. Emergent Capability

4. 涌现能力(Emergent Capability)

The agent can accomplish things you didn't explicitly design for.
When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.
"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."
You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.
This reveals latent demand. Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
The flywheel:
  1. Build with atomic tools and parity
  2. Users ask for things you didn't anticipate
  3. Agent composes tools to accomplish them (or fails, revealing a gap)
  4. You observe patterns in what's being requested
  5. Add domain tools or prompts to make common patterns efficient
  6. Repeat
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
The test: Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.

Agent可以完成你未明确设计的任务。
当工具是原子化的、具备对等性且提示词可组合时,用户会向Agent提出你从未预料到的请求。而通常情况下,Agent能找到解决方法。
例如:“将我的会议笔记与任务列表交叉引用,告诉我已承诺但尚未安排的事项。”
你并未构建“承诺跟踪器”功能,但如果Agent能读取笔记、读取任务并进行推理——循环运行直至得到答案——它就能完成这项任务。
这能揭示潜在需求。 你无需猜测用户想要什么功能,只需观察他们向Agent提出的请求。当出现模式时,你可以使用领域特定工具或专用提示词进行优化。但你无需提前预料这些需求——只需发现它们即可。
飞轮效应:
  1. 使用原子工具和对等性构建基础
  2. 用户提出你未预料到的请求
  3. Agent组合工具完成任务(或失败,暴露出能力缺口)
  4. 你观察请求中的模式
  5. 添加领域工具或提示词,让常见模式更高效
  6. 重复上述步骤
这改变了产品开发方式。你无需在前期设想所有功能,只需创建一个有能力的基础,并从涌现的需求中学习。
测试方法: 向Agent提出一个与你的领域相关的开放式请求。它能否找到合理的方法,循环运行直至成功?如果它只是说“我没有这个功能”,说明你的架构过于受限。

5. Improvement Over Time

5. 随时间改进(Improvement Over Time)

Agent-native applications get better through accumulated context and prompt refinement.
Unlike traditional software, agent-native applications can improve without shipping code:
Accumulated context: The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A
context.md
file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.
Prompt refinement at multiple levels:
  • Developer level: You ship updated prompts that change agent behavior for all users
  • User level: Users customize prompts for their workflow
  • Agent level: The agent modifies its own prompts based on feedback (advanced)
Self-modification (advanced): Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.
The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.
The test: Does the application work better after a month of use than on day one, even without code changes? </core_principles>
<intake>
Agent原生应用可通过积累的上下文和提示词优化不断改进。
与传统软件不同,Agent原生应用无需发布代码即可改进:
积累的上下文: Agent可跨会话维护状态——已存在的内容、用户已完成的操作、有效的方法和无效的方法。Agent读取和更新的
context.md
文件是基础层。更复杂的方法包括结构化记忆和学习用户偏好。
多层面的提示词优化:
  • 开发者层面: 你发布更新后的提示词,改变所有用户的Agent行为
  • 用户层面: 用户可自定义提示词以适配自己的工作流
  • Agent层面: Agent可根据反馈修改自己的提示词(高级功能)
自修改(高级功能): 能编辑自己的提示词甚至代码的Agent。对于生产用例,考虑添加安全防护——审批闸门、自动回滚检查点、健康检查。这是未来的发展方向。
改进机制仍在探索中。上下文和提示词优化已被证实有效,自修改功能正在兴起。但有一点很明确:这种架构支持传统软件不具备的改进方式。
测试方法: 即使没有代码更改,使用一个月后的应用是否比第一天更好用? </core_principles>
<intake>

What aspect of agent-native architecture do you need help with?

你需要Agent原生架构哪方面的帮助?

  1. Design architecture - Plan a new agent-native system from scratch
  2. Files & workspace - Use files as the universal interface, shared workspace patterns
  3. Tool design - Build primitive tools, dynamic capability discovery, CRUD completeness
  4. Domain tools - Know when to add domain tools vs stay with primitives
  5. Execution patterns - Completion signals, partial completion, context limits
  6. System prompts - Define agent behavior in prompts, judgment criteria
  7. Context injection - Inject runtime app state into agent prompts
  8. Action parity - Ensure agents can do everything users can do
  9. Self-modification - Enable agents to safely evolve themselves
  10. Product design - Progressive disclosure, latent demand, approval patterns
  11. Mobile patterns - iOS storage, background execution, checkpoint/resume
  12. Testing - Test agent-native apps for capability and parity
  13. Refactoring - Make existing code more agent-native
Wait for response before proceeding. </intake>
<routing> | Response | Action | |----------|--------| | 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below | | 2, "files", "workspace", "filesystem" | Read [files-universal-interface.md](./references/files-universal-interface.md) and [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) | | 3, "tool", "mcp", "primitive", "crud" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) | | 4, "domain tool", "when to add" | Read [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) | | 5, "execution", "completion", "loop" | Read [agent-execution-patterns.md](./references/agent-execution-patterns.md) | | 6, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) | | 7, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) | | 8, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) | | 9, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) | | 10, "product", "progressive", "approval", "latent demand" | Read [product-implications.md](./references/product-implications.md) | | 11, "mobile", "ios", "android", "background", "checkpoint" | Read [mobile-patterns.md](./references/mobile-patterns.md) | | 12, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) | | 13, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
After reading the reference, apply those patterns to the user's specific context. </routing>
<architecture_checklist>
  1. 设计架构 - 从零开始规划新的Agent原生系统
  2. 文件与工作区 - 将文件作为通用接口,采用共享工作区模式
  3. 工具设计 - 构建原语工具、动态能力发现、CRUD完整性
  4. 领域工具 - 了解何时添加领域工具而非保留原语
  5. 执行模式 - 完成信号、部分完成、上下文限制
  6. 系统提示词 - 在提示词中定义Agent行为、判断标准
  7. 上下文注入 - 将运行时应用状态注入Agent提示词
  8. 操作对等性 - 确保Agent能完成用户可做的所有操作
  9. 自修改 - 让Agent能够安全地自我进化
  10. 产品设计 - 渐进式披露、潜在需求、审批模式
  11. 移动端模式 - iOS存储、后台执行、检查点/恢复
  12. 测试 - 测试Agent原生应用的能力和对等性
  13. 重构 - 让现有代码更贴近Agent原生架构
请等待你的回复后再继续。 </intake>
<routing> | 回复关键词 | 操作 | |----------|--------| | 1, "design", "architecture", "plan" | 阅读[architecture-patterns.md](./references/architecture-patterns.md),然后应用下方的架构检查清单 | | 2, "files", "workspace", "filesystem" | 阅读[files-universal-interface.md](./references/files-universal-interface.md)和[shared-workspace-architecture.md](./references/shared-workspace-architecture.md) | | 3, "tool", "mcp", "primitive", "crud" | 阅读[mcp-tool-design.md](./references/mcp-tool-design.md) | | 4, "domain tool", "when to add" | 阅读[from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) | | 5, "execution", "completion", "loop" | 阅读[agent-execution-patterns.md](./references/agent-execution-patterns.md) | | 6, "prompt", "system prompt", "behavior" | 阅读[system-prompt-design.md](./references/system-prompt-design.md) | | 7, "context", "inject", "runtime", "dynamic" | 阅读[dynamic-context-injection.md](./references/dynamic-context-injection.md) | | 8, "parity", "ui action", "capability map" | 阅读[action-parity-discipline.md](./references/action-parity-discipline.md) | | 9, "self-modify", "evolve", "git" | 阅读[self-modification.md](./references/self-modification.md) | | 10, "product", "progressive", "approval", "latent demand" | 阅读[product-implications.md](./references/product-implications.md) | | 11, "mobile", "ios", "android", "background", "checkpoint" | 阅读[mobile-patterns.md](./references/mobile-patterns.md) | | 12, "test", "testing", "verify", "validate" | 阅读[agent-native-testing.md](./references/agent-native-testing.md) | | 13, "review", "refactor", "existing" | 阅读[refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
阅读参考文档后,将这些模式应用到用户的具体场景中。 </routing>
<architecture_checklist>

Architecture Review Checklist

架构审查清单

When designing an agent-native system, verify these before implementation:
在设计Agent原生系统时,请在实现前验证以下内容:

Core Principles

核心原则

  • Parity: Every UI action has a corresponding agent capability
  • Granularity: Tools are primitives; features are prompt-defined outcomes
  • Composability: New features can be added via prompts alone
  • Emergent Capability: Agent can handle open-ended requests in your domain
  • 对等性(Parity):每个UI操作都有对应的Agent能力
  • 粒度(Granularity):工具为原语;功能为通过提示词定义的结果
  • 可组合性(Composability):仅通过提示词即可添加新功能
  • 涌现能力(Emergent Capability):Agent可处理领域内的开放式请求

Tool Design

工具设计

  • Dynamic vs Static: For external APIs where agent should have full access, use Dynamic Capability Discovery
  • CRUD Completeness: Every entity has create, read, update, AND delete
  • Primitives not Workflows: Tools enable capability, don't encode business logic
  • API as Validator: Use
    z.string()
    inputs when the API validates, not
    z.enum()
  • 动态与静态:对于Agent应具备完全访问权限的外部API,使用动态能力发现
  • CRUD完整性:每个实体都具备创建、读取、更新和删除能力
  • 原语而非工作流:工具提供能力,不包含业务逻辑
  • API作为验证器:当API会进行验证时,使用
    z.string()
    作为输入,而非
    z.enum()

Files & Workspace

文件与工作区

  • Shared Workspace: Agent and user work in same data space
  • context.md Pattern: Agent reads/updates context file for accumulated knowledge
  • File Organization: Entity-scoped directories with consistent naming
  • 共享工作区:Agent和用户在同一数据空间工作
  • context.md模式:Agent读取/更新上下文文件以积累知识
  • 文件组织:按实体划分目录,命名保持一致

Agent Execution

Agent执行

  • Completion Signals: Agent has explicit
    complete_task
    tool (not heuristic detection)
  • Partial Completion: Multi-step tasks track progress for resume
  • Context Limits: Designed for bounded context from the start
  • 完成信号:Agent具备明确的
    complete_task
    工具(而非启发式检测)
  • 部分完成:多步骤任务跟踪进度以便恢复
  • 上下文限制:从设计初期就考虑有界上下文

Context Injection

上下文注入

  • Available Resources: System prompt includes what exists (files, data, types)
  • Available Capabilities: System prompt documents tools with user vocabulary
  • Dynamic Context: Context refreshes for long sessions (or provide
    refresh_context
    tool)
  • 可用资源:系统提示词包含现有资源(文件、数据、类型)
  • 可用能力:系统提示词使用用户熟悉的词汇记录工具
  • 动态上下文:长会话中刷新上下文(或提供
    refresh_context
    工具)

UI Integration

UI集成

  • Agent → UI: Agent changes reflect in UI (shared service, file watching, or event bus)
  • No Silent Actions: Agent writes trigger UI updates immediately
  • Capability Discovery: Users can learn what agent can do
  • Agent → UI:Agent的更改会反映到UI中(共享服务、文件监听或事件总线)
  • 无静默操作:Agent的更改会立即触发UI更新
  • 能力发现:用户可了解Agent能完成的操作

Mobile (if applicable)

移动端(如适用)

  • Checkpoint/Resume: Handle iOS app suspension gracefully
  • iCloud Storage: iCloud-first with local fallback for multi-device sync
  • Cost Awareness: Model tier selection (Haiku/Sonnet/Opus)
When designing architecture, explicitly address each checkbox in your plan. </architecture_checklist>
<quick_start>
  • 检查点/恢复:优雅处理iOS应用暂停
  • iCloud存储:优先使用iCloud,本地存储作为多设备同步的备选
  • 成本意识:根据任务复杂度选择模型层级(Haiku/Sonnet/Opus)
设计架构时,请在计划中明确说明每个检查项的处理方式。 </architecture_checklist>
<quick_start>

Quick Start: Build an Agent-Native Feature

快速入门:构建Agent原生功能

Step 1: Define atomic tools
typescript
const tools = [
  tool("read_file", "Read any file", { path: z.string() }, ...),
  tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
  tool("list_files", "List directory", { path: z.string() }, ...),
  tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];
Step 2: Write behavior in the system prompt
markdown
undefined
步骤1:定义原子工具
typescript
const tools = [
  tool("read_file", "Read any file", { path: z.string() }, ...),
  tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
  tool("list_files", "List directory", { path: z.string() }, ...),
  tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];
步骤2:在系统提示词中定义行为
markdown
undefined

Your Responsibilities

你的职责

When asked to organize content, you should:
  1. Read existing files to understand the structure
  2. Analyze what organization makes sense
  3. Create/move files using your tools
  4. Use your judgment about layout and formatting
  5. Call complete_task when you're done
You decide the structure. Make it good.

**Step 3: Let the agent work in a loop**
```typescript
const result = await agent.run({
  prompt: userMessage,
  tools: tools,
  systemPrompt: systemPrompt,
  // Agent loops until it calls complete_task
});
</quick_start>
<reference_index>
当被要求整理内容时,你应该:
  1. 读取现有文件以了解结构
  2. 分析合理的组织方式
  3. 使用工具创建/移动文件
  4. 自主判断布局和格式
  5. 完成任务时调用complete_task
由你决定结构,确保效果良好。

**步骤3:让Agent循环运行**
```typescript
const result = await agent.run({
  prompt: userMessage,
  tools: tools,
  systemPrompt: systemPrompt,
  // Agent会循环运行直至调用complete_task
});
</quick_start>
<reference_index>

Reference Files

参考文件

All references in
references/
:
Core Patterns:
  • architecture-patterns.md - Event-driven, unified orchestrator, agent-to-UI
  • files-universal-interface.md - Why files, organization patterns, context.md
  • mcp-tool-design.md - Tool design, dynamic capability discovery, CRUD
  • from-primitives-to-domain-tools.md - When to add domain tools, graduating to code
  • agent-execution-patterns.md - Completion signals, partial completion, context limits
  • system-prompt-design.md - Features as prompts, judgment criteria
Agent-Native Disciplines:
  • dynamic-context-injection.md - Runtime context, what to inject
  • action-parity-discipline.md - Capability mapping, parity workflow
  • shared-workspace-architecture.md - Shared data space, UI integration
  • product-implications.md - Progressive disclosure, latent demand, approval
  • agent-native-testing.md - Testing outcomes, parity tests
Platform-Specific:
  • mobile-patterns.md - iOS storage, checkpoint/resume, cost awareness
  • self-modification.md - Git-based evolution, guardrails
  • refactoring-to-prompt-native.md - Migrating existing code </reference_index>
<anti_patterns>
所有参考文件位于
references/
目录下:
核心模式:
  • architecture-patterns.md - 事件驱动、统一编排器、Agent到UI的集成
  • files-universal-interface.md - 为何使用文件、组织模式、context.md
  • mcp-tool-design.md - 工具设计、动态能力发现、CRUD
  • from-primitives-to-domain-tools.md - 何时添加领域工具、向代码迁移
  • agent-execution-patterns.md - 完成信号、部分完成、上下文限制
  • system-prompt-design.md - 基于提示词的功能、判断标准
Agent原生规范:
  • dynamic-context-injection.md - 运行时上下文、注入内容
  • action-parity-discipline.md - 能力映射、对等性工作流
  • shared-workspace-architecture.md - 共享数据空间、UI集成
  • product-implications.md - 渐进式披露、潜在需求、审批机制
  • agent-native-testing.md - 结果测试、对等性测试
平台特定内容:
  • mobile-patterns.md - iOS存储、检查点/恢复、成本意识
  • self-modification.md - 基于Git的进化、防护措施
  • refactoring-to-prompt-native.md - 现有代码迁移 </reference_index>
<anti_patterns>

Anti-Patterns

反模式

Common Approaches That Aren't Fully Agent-Native

常见的非完全Agent原生方法

These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.
Agent as router — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.
Build the app, then add agent — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.
Request/response thinking — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.
Defensive tool design — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.
Happy path in code, agent just executes — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.

这些方法不一定是错误的——它们可能适合你的用例。但值得注意的是,它们与本文档描述的架构不同。
Agent作为路由器 — Agent判断用户需求,然后调用正确的函数。Agent的智能仅用于路由,而非执行操作。这种模式可以工作,但你只利用了Agent能力的一小部分。
先构建应用,再添加Agent — 你以传统方式(编写代码)构建功能,然后将其暴露给Agent。Agent只能完成你的功能已支持的操作,无法获得涌现能力。
请求/响应思维 — Agent接收输入,执行一个操作,返回输出。这忽略了循环模式:Agent接收要达成的结果,运行直至完成,并处理过程中出现的意外情况。
防御性工具设计 — 由于习惯了防御性编程,你过度限制工具的输入。严格的枚举、每层都做验证。这很安全,但会阻止Agent完成你未预料到的任务。
代码处理正常路径,Agent仅执行 — 传统软件在代码中处理边缘情况——你编写X出错时的处理逻辑。Agent原生架构让Agent用判断力处理边缘情况。如果你的代码处理了所有边缘情况,Agent就只是一个调用者。

Specific Anti-Patterns

具体反模式

THE CARDINAL SIN: Agent executes your code instead of figuring things out
typescript
// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);      // Your code decides
  const priority = calculatePriority(message); // Your code decides
  await store(message, category, priority);   // Your code orchestrates
  if (priority > 3) await notify();           // Your code decides
});

// RIGHT - Agent figures out how to process feedback
tools: store_item, send_message  // Primitives
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
Workflow-shaped tools
analyze_and_organize
bundles judgment into the tool. Break it into primitives and let the agent compose them.
Context starvation — Agent doesn't know what resources exist in the app.
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."
Fix: Inject available resources, capabilities, and vocabulary into system prompt.
Orphan UI actions — User can do something through the UI that the agent can't achieve. Fix: maintain parity.
Silent actions — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.
Heuristic completion detection — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a
complete_task
tool.
Static tool mapping for dynamic APIs — Building 50 tools for 50 API endpoints when a
discover
+
access
pattern would give more flexibility.
typescript
// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required

// RIGHT - Dynamic capability discovery
tool("list_available_types", ...)  // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...)  // Access any type
Incomplete CRUD — Agent can create but not update or delete.
typescript
// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...)  // Missing: update, delete
Fix: Every entity needs full CRUD.
Sandbox isolation — Agent works in separate data space from user.
Documents/
├── user_files/        ← User's space
└── agent_output/      ← Agent's space (isolated)
Fix: Use shared workspace where both operate on same files.
Gates without reason — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.
Artificial capability limits — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do. </anti_patterns>
<success_criteria>
最严重的错误:Agent执行你的代码而非自主解决问题
typescript
// 错误示例 - 你编写了工作流,Agent仅执行
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);      // 你的代码做决策
  const priority = calculatePriority(message); // 你的代码做决策
  await store(message, category, priority);   // 你的代码编排
  if (priority > 3) await notify();           // 你的代码做决策
});

// 正确示例 - Agent自主处理反馈
tools: store_item, send_message  // 原语工具
prompt: "根据可操作性将重要性评为1-5级,存储反馈,若评分>=4则发送通知"
工作流形状的工具
analyze_and_organize
将判断逻辑捆绑到工具中。应将其拆分为原语,让Agent组合使用。
上下文匮乏 — Agent不知道应用中存在哪些资源。
用户:“在我的动态中写一些关于叶卡捷琳娜大帝的内容”
Agent:“什么动态?我不明白你指的是什么系统。”
解决方案:将可用资源、能力和词汇注入系统提示词。
孤立的UI操作 — 用户可通过UI完成的操作,Agent无法实现。解决方案:保持对等性。
静默操作 — Agent更改了状态,但UI未更新。解决方案:使用带有响应式绑定的共享数据存储,或文件系统监听。
启发式完成检测 — 通过启发式方法检测Agent是否完成任务(连续多次迭代未调用工具、检查是否存在预期输出文件)。这种方法很脆弱。解决方案:要求Agent通过
complete_task
工具明确发出完成信号。
动态API的静态工具映射 — 为50个API端点构建50个工具,而使用
discover
+
access
模式会更灵活。
typescript
// 错误示例 - 每个API类型都需要硬编码工具
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// 当添加血糖追踪功能时... 需要更改代码

// 正确示例 - 动态能力发现
tool("list_available_types", ...)  // 发现可用资源
tool("read_health_data", { dataType: z.string() }, ...)  // 访问任意类型
不完整的CRUD — Agent可创建但无法更新或删除。
typescript
// 用户:“删除那篇日记”
// Agent:“我没有对应的工具”
tool("create_journal_entry", ...)  // 缺少:更新、删除
解决方案:每个实体都需要完整的CRUD能力。
沙箱隔离 — Agent在与用户分离的数据空间中工作。
Documents/
├── user_files/        ← 用户空间
└── agent_output/      ← Agent空间(隔离)
解决方案:使用共享工作区,让Agent和用户操作相同的文件。
无理由的限制 — 只能通过领域工具完成某项操作,而你并非有意限制访问。默认应是开放的,除非有特定理由,否则保留原语工具。
人为的能力限制 — 出于模糊的安全考虑限制Agent的能力,而非针对特定风险。限制能力时需谨慎,Agent通常应能完成用户可做的所有操作。 </anti_patterns>
<success_criteria>

Success Criteria

成功标准

You've built an agent-native application when:
当你满足以下条件时,说明你已构建了Agent原生应用:

Architecture

架构

  • The agent can achieve anything users can achieve through the UI (parity)
  • Tools are atomic primitives; domain tools are shortcuts, not gates (granularity)
  • New features can be added by writing new prompts (composability)
  • The agent can accomplish tasks you didn't explicitly design for (emergent capability)
  • Changing behavior means editing prompts, not refactoring code
  • Agent能完成用户通过UI可完成的所有操作(对等性)
  • 工具为原子原语;领域工具是快捷方式而非限制(粒度)
  • 仅通过编写新提示词即可添加新功能(可组合性)
  • Agent能完成你未明确设计的任务(涌现能力)
  • 更改行为只需编辑提示词,无需重构代码

Implementation

实现

  • System prompt includes dynamic context about app state
  • Every UI action has a corresponding agent tool (action parity)
  • Agent tools are documented in system prompt with user vocabulary
  • Agent and user work in the same data space (shared workspace)
  • Agent actions are immediately reflected in the UI
  • Every entity has full CRUD (Create, Read, Update, Delete)
  • Agents explicitly signal completion (no heuristic detection)
  • context.md or equivalent for accumulated knowledge
  • 系统提示词包含应用状态的动态上下文
  • 每个UI操作都有对应的Agent工具(操作对等性)
  • 系统提示词使用用户熟悉的词汇记录工具
  • Agent和用户在同一数据空间工作(共享工作区)
  • Agent的操作会立即反映到UI中
  • 每个实体都具备完整的CRUD(创建、读取、更新、删除)能力
  • Agent明确发出完成信号(无启发式检测)
  • 使用context.md或类似方式积累知识

Product

产品

  • Simple requests work immediately with no learning curve
  • Power users can push the system in unexpected directions
  • You're learning what users want by observing what they ask the agent to do
  • Approval requirements match stakes and reversibility
  • 简单请求无需学习即可立即生效
  • 高级用户可将系统推向未预料的方向
  • 你通过观察用户向Agent提出的请求了解他们的需求
  • 审批要求与风险和可撤销性匹配

Mobile (if applicable)

移动端(如适用)

  • Checkpoint/resume handles app interruption
  • iCloud-first storage with local fallback
  • Background execution uses available time wisely
  • Model tier matched to task complexity

  • 检查点/恢复可处理应用中断
  • 优先使用iCloud存储,本地存储作为备选
  • 后台执行合理利用可用时间
  • 根据任务复杂度选择模型层级

The Ultimate Test

终极测试

Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.
Can it figure out how to accomplish it, operating in a loop until it succeeds?
If yes, you've built something agent-native.
If it says "I don't have a feature for that"—your architecture is still too constrained. </success_criteria>
向Agent描述一个属于你的应用领域,但你未构建特定功能的结果。
它能否找到完成方法,循环运行直至成功?
如果可以,说明你已构建了Agent原生应用。
如果它说“我没有这个功能”——你的架构仍然过于受限。 </success_criteria>