project-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Project Development Methodology

项目开发方法论

This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application.
本技能涵盖识别适合LLM处理的任务、设计高效项目架构,以及利用Agent辅助开发快速迭代的原则。无论你是构建批处理流水线、多Agent研究系统还是交互式Agent应用,这套方法论都适用。

When to Activate

激活时机

Activate this skill when:
  • Starting a new project that might benefit from LLM processing
  • Evaluating whether a task is well-suited for agents versus traditional code
  • Designing the architecture for an LLM-powered application
  • Planning a batch processing pipeline with structured outputs
  • Choosing between single-agent and multi-agent approaches
  • Estimating costs and timelines for LLM-heavy projects
在以下场景激活本技能:
  • 启动可能受益于LLM处理的新项目
  • 评估任务更适合Agent还是传统代码实现
  • 设计LLM驱动应用的架构
  • 规划带有结构化输出的批处理流水线
  • 在单Agent与多Agent方案间做选择
  • 估算LLM密集型项目的成本与时间线

Core Concepts

核心概念

Task-Model Fit Recognition

任务-模型适配性识别

Evaluate task-model fit before writing any code, because building automation on a fundamentally mismatched task wastes days of effort. Run every proposed task through these two tables to decide proceed-or-stop.
Proceed when the task has these characteristics:
CharacteristicRationale
Synthesis across sourcesLLMs combine information from multiple inputs better than rule-based alternatives
Subjective judgment with rubricsGrading, evaluation, and classification with criteria map naturally to language reasoning
Natural language outputWhen the goal is human-readable text, LLMs deliver it natively
Error toleranceIndividual failures do not break the overall system, so LLM non-determinism is acceptable
Batch processingNo conversational state required between items, which keeps context clean
Domain knowledge in trainingThe model already has relevant context, reducing prompt engineering overhead
Stop when the task has these characteristics:
CharacteristicRationale
Precise computationMath, counting, and exact algorithms are unreliable in language models
Real-time requirementsLLM latency is too high for sub-second responses
Perfect accuracy requirementsHallucination risk makes 100% accuracy impossible
Proprietary data dependenceThe model lacks necessary context and cannot acquire it from prompts alone
Sequential dependenciesEach step depends heavily on the previous result, compounding errors
Deterministic output requirementsSame input must produce identical output, which LLMs cannot guarantee
在编写任何代码前先评估任务-模型适配性,因为在根本不匹配的任务上构建自动化会浪费数天精力。将每个拟议任务通过以下两个表格评估,决定是否推进。
具备以下特征的任务可推进:
特征理由
跨源信息合成相较于基于规则的方案,LLM更擅长整合多输入信息
带评分标准的主观判断评分、评估和分类这类带准则的任务与语言推理天然契合
自然语言输出当目标是生成人类可读文本时,LLM可原生输出
容错性单个失败不会破坏整个系统,因此LLM的非确定性是可接受的
批处理无需在条目间维护会话状态,可保持上下文简洁
训练数据包含领域知识模型已具备相关上下文,减少提示工程开销
具备以下特征的任务应停止:
特征理由
精确计算数学运算、计数和精确算法在语言模型中不可靠
实时性要求LLM的延迟过高,无法满足亚秒级响应需求
100%准确率要求幻觉风险使得100%准确率无法实现
依赖专有数据模型缺乏必要上下文,且无法仅通过提示获取
序列依赖每一步严重依赖前序结果,会放大错误
确定性输出要求相同输入必须生成完全一致的输出,而LLM无法保证这一点

The Manual Prototype Step

手动原型验证步骤

Always validate task-model fit with a manual test before investing in automation. Copy one representative input into the model interface, evaluate the output quality, and use the result to answer these questions:
  • Does the model have the knowledge required for this task?
  • Can the model produce output in the format needed?
  • What level of quality should be expected at scale?
  • Are there obvious failure modes to address?
Do this because a failed manual prototype predicts a failed automated system, while a successful one provides both a quality baseline and a prompt-design template. The test takes minutes and prevents hours of wasted development.
在投入自动化开发前,务必通过手动测试验证任务-模型适配性。将一个代表性输入复制到模型界面,评估输出质量,并用结果回答以下问题:
  • 模型是否具备完成该任务所需的知识?
  • 模型能否生成符合所需格式的输出?
  • 规模化处理时预期能达到什么质量水平?
  • 是否存在需要解决的明显失败模式?
这样做的原因是:失败的手动原型预示着自动化系统也会失败,而成功的原型既能提供质量基准,也能作为提示设计模板。该测试仅需数分钟,却能避免数小时的无效开发。

Pipeline Architecture

流水线架构

Structure LLM projects as staged pipelines because separation of deterministic and non-deterministic stages enables fast iteration and cost control. Design each stage to be:
  • Discrete: Clear boundaries between stages so each can be debugged independently
  • Idempotent: Re-running produces the same result, preventing duplicate work
  • Cacheable: Intermediate results persist to disk, avoiding expensive re-computation
  • Independent: Each stage can run separately, enabling selective re-execution
Use this canonical pipeline structure:
acquire -> prepare -> process -> parse -> render
  1. Acquire: Fetch raw data from sources (APIs, files, databases)
  2. Prepare: Transform data into prompt format
  3. Process: Execute LLM calls (the expensive, non-deterministic step)
  4. Parse: Extract structured data from LLM outputs
  5. Render: Generate final outputs (reports, files, visualizations)
Stages 1, 2, 4, and 5 are deterministic. Stage 3 is non-deterministic and expensive. Maintain this separation because it allows re-running the expensive LLM stage only when necessary, while iterating quickly on parsing and rendering.
将LLM项目构建为分阶段流水线,因为分离确定性与非确定性阶段可实现快速迭代和成本控制。每个阶段的设计需满足:
  • 离散性:阶段间边界清晰,便于独立调试
  • 幂等性:重复执行可得到相同结果,避免重复工作
  • 可缓存性:中间结果持久化到磁盘,避免昂贵的重复计算
  • 独立性:每个阶段可单独运行,支持选择性重新执行
推荐使用标准流水线结构:
acquire -> prepare -> process -> parse -> render
  1. Acquire:从数据源(API、文件、数据库)获取原始数据
  2. Prepare:将数据转换为提示格式
  3. Process:执行LLM调用(成本高昂、非确定性的步骤)
  4. Parse:从LLM输出中提取结构化数据
  5. Render:生成最终输出(报告、文件、可视化内容)
阶段1、2、4、5为确定性阶段,阶段3为非确定性且成本高昂的阶段。保持这种分离的原因是:仅在必要时重新运行昂贵的LLM阶段,同时可快速迭代解析和渲染环节。

File System as State Machine

以文件系统作为状态机

Use the file system to track pipeline state rather than databases or in-memory structures, because file existence provides natural idempotency and human-readable debugging.
data/{id}/
  raw.json         # acquire stage complete
  prompt.md        # prepare stage complete
  response.md      # process stage complete
  parsed.json      # parse stage complete
Check if an item needs processing by checking whether the output file exists. Re-run a stage by deleting its output file and downstream files. Debug by reading the intermediate files directly. This pattern works because each directory is independent, enabling simple parallelization and trivial caching.
使用文件系统跟踪流水线状态,而非数据库或内存结构,因为文件的存在天然具备幂等性,且便于人工调试。
data/{id}/
  raw.json         # acquire阶段完成
  prompt.md        # prepare阶段完成
  response.md      # process阶段完成
  parsed.json      # parse阶段完成
通过检查输出文件是否存在,判断条目是否需要处理。删除某阶段的输出文件及下游文件即可重新运行该阶段。直接读取中间文件进行调试。这种模式可行的原因是每个目录相互独立,支持简单的并行化和原生缓存。

Structured Output Design

结构化输出设计

Design prompts for structured, parseable outputs because prompt design directly determines parsing reliability. Include these elements in every structured prompt:
  1. Section markers: Explicit headers or prefixes that parsers can match on
  2. Format examples: Show exactly what output should look like
  3. Rationale disclosure: State "I will be parsing this programmatically" so the model prioritizes format compliance
  4. Constrained values: Enumerated options, score ranges, and fixed formats
Build parsers that handle LLM output variations gracefully, because LLMs do not follow instructions perfectly. Use regex patterns flexible enough for minor formatting variations, provide sensible defaults when sections are missing, and log parsing failures for review rather than crashing.
设计提示以生成结构化、可解析的输出,因为提示设计直接决定解析可靠性。每个结构化提示需包含以下元素:
  1. 分段标记:解析器可匹配的显式标题或前缀
  2. 格式示例:明确展示输出应有的样式
  3. 解析说明:声明「我将通过程序解析此内容」,让模型优先遵循格式要求
  4. 约束值:枚举选项、评分范围和固定格式
构建能灵活处理LLM输出变体的解析器,因为LLM无法完美遵循指令。使用能兼容轻微格式变化的正则表达式,当缺失部分内容时提供合理默认值,记录解析失败情况以供审查而非直接崩溃。

Agent-Assisted Development

Agent辅助开发

Use agent-capable models to accelerate development through rapid iteration: describe the project goal and constraints, let the agent generate initial implementation, test and iterate on specific failures, then refine prompts and architecture based on results.
Adopt these practices because they keep agent output focused and high-quality:
  • Provide clear, specific requirements upfront to reduce revision cycles
  • Break large projects into discrete components so each can be validated independently
  • Test each component before moving to the next to catch failures early
  • Keep the agent focused on one task at a time to prevent context degradation
使用具备Agent能力的模型加速开发迭代:描述项目目标与约束,让Agent生成初始实现,针对特定失败进行测试和迭代,然后根据结果优化提示和架构。
采用以下实践以确保Agent输出聚焦且高质量:
  • 提前提供清晰、具体的需求,减少修订周期
  • 将大型项目拆分为独立组件,便于逐一验证
  • 在进入下一环节前测试每个组件,尽早发现问题
  • 让Agent一次专注于一个任务,避免上下文退化

Cost and Scale Estimation

成本与规模估算

Estimate LLM processing costs before starting, because token costs compound quickly at scale and late discovery of budget overruns forces costly rework. Use this formula:
Total cost = (items x tokens_per_item x price_per_token) + API overhead
For batch processing, estimate input tokens per item (prompt + context), estimate output tokens per item (typical response length), multiply by item count, and add 20-30% buffer for retries and failures.
Track actual costs during development. If costs exceed estimates significantly, reduce context length through truncation, use smaller models for simpler items, cache and reuse partial results, or add parallel processing to reduce wall-clock time.
在启动前估算LLM处理成本,因为token成本在规模化时会快速累积,后期发现预算超支会导致昂贵的返工。使用以下公式:
总成本 = (条目数 × 每条目token数 × 每token价格) + API开销
对于批处理,估算每条目的输入token数(提示+上下文)、输出token数(典型响应长度),乘以条目数,并增加20-30%的缓冲以应对重试和失败。
在开发过程中跟踪实际成本。若成本远超估算,可通过截断缩短上下文长度、对简单任务使用更小的模型、缓存并复用部分结果,或添加并行处理缩短耗时。

Detailed Topics

详细主题

Choosing Single vs Multi-Agent Architecture

单Agent与多Agent架构选择

Default to single-agent pipelines for batch processing with independent items, because they are simpler to manage, cheaper to run, and easier to debug. Escalate to multi-agent architectures only when one of these conditions holds:
  • Parallel exploration of different aspects is required
  • The task exceeds single context window capacity
  • Specialized sub-agents demonstrably improve quality on benchmarks
Choose multi-agent for context isolation, not role anthropomorphization. Sub-agents get fresh context windows for focused subtasks, which prevents context degradation on long-running tasks.
See
multi-agent-patterns
skill for detailed architecture guidance.
对于处理独立条目的批处理,默认选择单Agent流水线,因为其更易于管理、运行成本更低且调试更简单。仅在满足以下条件之一时升级为多Agent架构:
  • 需要并行探索不同维度
  • 任务超出单上下文窗口容量
  • 专业子Agent在基准测试中能显著提升质量
选择多Agent架构是为了上下文隔离,而非角色拟人化。子Agent可获得全新上下文窗口以专注于子任务,避免长任务中的上下文退化。
如需详细架构指导,请查看
multi-agent-patterns
技能。

Architectural Reduction

架构简化

Start with minimal architecture and add complexity only when production evidence proves it necessary, because over-engineered scaffolding often constrains rather than enables model performance.
Vercel's d0 agent achieved 100% success rate (up from 80%) by reducing from 17 specialized tools to 2 primitives: bash command execution and SQL. The file system agent pattern uses standard Unix utilities (grep, cat, find, ls) instead of custom exploration tools.
Reduce when:
  • The data layer is well-documented and consistently structured
  • The model has sufficient reasoning capability
  • Specialized tools are constraining rather than enabling
  • More time is spent maintaining scaffolding than improving outcomes
Add complexity when:
  • The underlying data is messy, inconsistent, or poorly documented
  • The domain requires specialized knowledge the model lacks
  • Safety constraints require limiting agent capabilities
  • Operations are truly complex and benefit from structured workflows
See
tool-design
skill for detailed tool architecture guidance.
从最小化架构开始,仅当生产实践证明必要时才增加复杂度,因为过度设计的框架往往会限制而非提升模型性能。
Vercel的d0 Agent将17个专业工具简化为2个基础工具:bash命令执行和SQL,成功率从80%提升至100%。文件系统Agent模式使用标准Unix工具(grep、cat、find、ls)而非自定义探索工具。
适合简化的场景:
  • 数据层文档完善且结构一致
  • 模型具备足够的推理能力
  • 专业工具起到限制作用而非赋能
  • 维护框架的时间多于优化结果的时间
适合增加复杂度的场景:
  • 底层数据杂乱、不一致或文档缺失
  • 领域需要模型不具备的专业知识
  • 安全约束需要限制Agent能力
  • 操作确实复杂,可从结构化工作流中受益
如需详细工具架构指导,请查看
tool-design
技能。

Iteration and Refactoring

迭代与重构

Plan for multiple architectural iterations from the start, because production agent systems at scale always require refactoring. Manus refactored their agent framework five times since launch. The Bitter Lesson suggests that structures added for current model limitations become constraints as models improve.
Build for change by following these practices:
  • Keep architecture simple and unopinionated so refactoring is cheap
  • Test across model generations to verify the harness is not limiting performance
  • Design systems that benefit from model improvements rather than locking in limitations
从一开始就规划多轮架构迭代,因为规模化的生产Agent系统始终需要重构。Manus自上线以来已重构其Agent框架五次。「痛苦教训」表明,为当前模型局限性添加的结构会随着模型能力提升而成为约束。
通过以下实践构建可变更的系统:
  • 保持架构简单且无偏见,降低重构成本
  • 跨模型版本测试,验证框架未限制性能
  • 设计能从模型能力提升中获益的系统,而非固化局限性

Practical Guidance

实践指南

Project Planning Template

项目规划模板

Follow this template in order, because each step validates assumptions before the next step invests effort.
  1. Task Analysis
    • Define the input and desired output explicitly
    • Classify: synthesis, generation, classification, or analysis
    • Set an acceptable error rate based on business impact
    • Estimate the value per successful completion to justify costs
  2. Manual Validation
    • Test one representative example with the target model
    • Evaluate output quality and format against requirements
    • Identify failure modes that need parser hardening or prompt revision
    • Estimate tokens per item for cost projection
  3. Architecture Selection
    • Choose single pipeline vs multi-agent based on the criteria above
    • Identify required tools and data sources
    • Design storage and caching strategy using file-system state
    • Plan parallelization approach for the process stage
  4. Cost Estimation
    • Calculate items x tokens x price with a 20-30% buffer
    • Estimate development time for each pipeline stage
    • Identify infrastructure requirements (API keys, storage, compute)
    • Project ongoing operational costs for production runs
  5. Development Plan
    • Implement stage-by-stage, testing each before proceeding
    • Define a testing strategy per stage with expected outputs
    • Set iteration milestones tied to quality metrics
    • Plan deployment approach with rollback capability
按以下顺序遵循本模板,因为每一步都会在下一步投入精力前验证假设。
  1. 任务分析
    • 明确定义输入与期望输出
    • 分类:合成、生成、分类或分析
    • 根据业务影响设置可接受的错误率
    • 估算每次成功完成的价值,以论证成本合理性
  2. 手动验证
    • 使用目标模型测试一个代表性示例
    • 根据需求评估输出质量与格式
    • 识别需要强化解析器或优化提示的失败模式
    • 估算每条目的token数以进行成本预测
  3. 架构选择
    • 根据上述标准选择单流水线或多Agent方案
    • 确定所需工具与数据源
    • 使用文件系统状态设计存储与缓存策略
    • 为处理阶段规划并行化方案
  4. 成本估算
    • 计算条目数×token数×价格,并添加20-30%的缓冲
    • 估算每个流水线阶段的开发时间
    • 确定基础设施需求(API密钥、存储、计算资源)
    • 预测生产运行的持续运营成本
  5. 开发计划
    • 分阶段实现,每完成一个阶段就进行测试
    • 为每个阶段定义测试策略与预期输出
    • 设置与质量指标挂钩的迭代里程碑
    • 规划具备回滚能力的部署方案

Examples

示例

Example 1: Batch Analysis Pipeline (Karpathy's HN Time Capsule)
Task: Analyze 930 HN discussions from 10 years ago with hindsight grading.
Architecture:
  • 5-stage pipeline: fetch -> prompt -> analyze -> parse -> render
  • File system state: data/{date}/{item_id}/ with stage output files
  • Structured output: 6 sections with explicit format requirements
  • Parallel execution: 15 workers for LLM calls
Results: $58 total cost, ~1 hour execution, static HTML output.
Example 2: Architectural Reduction (Vercel d0)
Task: Text-to-SQL agent for internal analytics.
Before: 17 specialized tools, 80% success rate, 274s average execution.
After: 2 tools (bash + SQL), 100% success rate, 77s average execution.
Key insight: The semantic layer was already good documentation. Claude just needed access to read files directly.
See Case Studies for detailed analysis.
示例1:批处理分析流水线(Karpathy的HN时间胶囊)
任务:用事后评估分析10年前的930条HN讨论。
架构:
  • 5阶段流水线:fetch -> prompt -> analyze -> parse -> render
  • 文件系统状态:data/{date}/{item_id}/ 包含各阶段输出文件
  • 结构化输出:6个带明确格式要求的部分
  • 并行执行:15个worker处理LLM调用
结果:总成本58美元,耗时约1小时,输出静态HTML。
示例2:架构简化(Vercel d0)
任务:用于内部分析的Text-to-SQL Agent。
优化前:17个专业工具,成功率80%,平均执行时间274秒。
优化后:2个工具(bash + SQL),成功率100%,平均执行时间77秒。
核心洞察:语义层已有完善的文档,Claude只需直接读取文件即可。
如需详细分析,请查看案例研究

Guidelines

准则

  1. Validate task-model fit with manual prototyping before building automation
  2. Structure pipelines as discrete, idempotent, cacheable stages
  3. Use the file system for state management and debugging
  4. Design prompts for structured, parseable outputs with explicit format examples
  5. Start with minimal architecture; add complexity only when proven necessary
  6. Estimate costs early and track throughout development
  7. Build robust parsers that handle LLM output variations
  8. Expect and plan for multiple architectural iterations
  9. Test whether scaffolding helps or constrains model performance
  10. Use agent-assisted development for rapid iteration on implementation
  1. 在构建自动化前,通过手动原型验证任务-模型适配性
  2. 将流水线构建为离散、幂等、可缓存的阶段
  3. 使用文件系统进行状态管理与调试
  4. 设计提示以生成结构化、可解析的输出,并提供明确格式示例
  5. 从最小化架构开始,仅在必要时增加复杂度
  6. 尽早估算成本,并在开发全程跟踪
  7. 构建能处理LLM输出变体的健壮解析器
  8. 预期并规划多轮架构迭代
  9. 测试框架是提升还是限制了模型性能
  10. 利用Agent辅助开发快速迭代实现方案

Gotchas

常见陷阱

  1. Skipping manual validation: Building automation before verifying the model can do the task wastes significant time when the approach is fundamentally flawed. Always run one representative example through the model interface first.
  2. Monolithic pipelines: Combining all stages into one script makes debugging and iteration difficult. Separate stages with persistent intermediate outputs so each can be re-run independently.
  3. Over-constraining the model: Adding guardrails, pre-filtering, and validation logic that the model could handle on its own reduces performance. Test whether scaffolding helps or hurts before keeping it.
  4. Ignoring costs until production: Token costs compound quickly at scale. Estimate and track from the beginning to avoid budget surprises that force architectural rework.
  5. Perfect parsing requirements: Expecting LLMs to follow format instructions perfectly leads to brittle systems. Build robust parsers that handle variations and log failures for review.
  6. Premature optimization: Adding caching, parallelization, and optimization before the basic pipeline works correctly wastes effort on code that may be discarded during iteration.
  7. Model version lock-in: Building pipelines that only work with one specific model version creates fragile systems. Test across model generations and abstract the LLM call layer so models can be swapped without rewriting pipeline logic.
  8. Evaluation-less deployment: Shipping agent pipelines without measuring output quality means regressions go undetected. Define quality metrics during development and run evaluation checks before and after every model or prompt change.
  1. 跳过手动验证:在未验证模型能否完成任务前就构建自动化,当方案存在根本性缺陷时会浪费大量时间。务必先在模型界面测试一个代表性示例。
  2. 单体流水线:将所有阶段整合到一个脚本中会增加调试与迭代难度。用持久化中间输出分离各阶段,以便独立重新运行。
  3. 过度约束模型:添加模型可自行处理的防护、预过滤和验证逻辑会降低性能。在保留前测试框架是帮助还是阻碍了模型。
  4. 直到生产阶段才关注成本:token成本在规模化时会快速累积。从一开始就估算并跟踪成本,避免因预算问题导致架构返工。
  5. 要求完美解析:期望LLM完全遵循格式指令会导致系统脆弱。构建能处理变体的健壮解析器,并记录失败情况以供审查。
  6. 过早优化:在基础流水线正常工作前就添加缓存、并行化和优化逻辑,会浪费精力在可能在迭代中被丢弃的代码上。
  7. 模型版本锁定:构建仅适用于特定模型版本的流水线会导致系统脆弱。跨模型版本测试,并抽象LLM调用层,以便无需重写流水线逻辑即可切换模型。
  8. 无评估部署:在未衡量输出质量的情况下发布Agent流水线,会导致回归问题无法被发现。在开发阶段定义质量指标,并在每次模型或提示变更前后运行评估检查。

Integration

集成

This skill connects to:
  • context-fundamentals - Understanding context constraints for prompt design
  • tool-design - Designing tools for agent systems within pipelines
  • multi-agent-patterns - When to use multi-agent versus single pipelines
  • evaluation - Evaluating pipeline outputs and agent performance
  • context-compression - Managing context when pipelines exceed limits
本技能与以下技能关联:
  • context-fundamentals - 理解提示设计的上下文约束
  • tool-design - 在流水线中为Agent系统设计工具
  • multi-agent-patterns - 何时使用多Agent而非单流水线
  • evaluation - 评估流水线输出与Agent性能
  • context-compression - 当流水线超出限制时管理上下文

References

参考资料

Internal references:
  • Case Studies - Read when: evaluating architecture tradeoffs or reviewing real-world pipeline implementations (Karpathy HN Capsule, Vercel d0, Manus patterns)
  • Pipeline Patterns - Read when: designing a new pipeline stage layout, choosing caching strategies, or debugging stage boundaries
Related skills in this collection:
  • tool-design - Tool architecture and reduction patterns
  • multi-agent-patterns - When to use multi-agent architectures
  • evaluation - Output evaluation frameworks
External resources:

内部参考:
  • 案例研究 - 适用场景:评估架构权衡或查看真实世界流水线实现(Karpathy HN胶囊、Vercel d0、Manus模式)
  • 流水线模式 - 适用场景:设计新流水线阶段布局、选择缓存策略或调试阶段边界
本集合中的相关技能:
  • tool-design - 工具架构与简化模式
  • multi-agent-patterns - 何时使用多Agent架构
  • evaluation - 输出评估框架
外部资源:

Skill Metadata

技能元数据

Created: 2025-12-25 Last Updated: 2026-03-17 Author: Agent Skills for Context Engineering Contributors Version: 1.1.0
创建时间:2025-12-25 最后更新:2026-03-17 作者:Agent Skills for Context Engineering Contributors 版本:1.1.0