building-with-llms
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBuilding with LLMs
基于LLM的应用开发
Scope
适用范围
Covers
- Building and shipping LLM-powered features/apps (assistant, copilot, light agent workflows)
- Prompt + tool contract design (instructions, schemas, examples, guardrails)
- Data quality + evaluation (test sets, rubrics, red teaming, iteration loop)
- Production readiness (latency/cost budgets, logging, fallbacks, safety/security checks)
- Using coding agents (Codex/Claude Code) to accelerate engineering safely
When to use
- “Turn this LLM feature idea into a build plan with prompts, evals, and launch checks.”
- “We need a system prompt + tool definitions + output schema for our LLM workflow.”
- “Our LLM is flaky—design an eval plan and iteration loop to stabilize quality.”
- “Design a RAG/tool-using agent approach with safety and monitoring.”
- “We want to use an AI coding agent to implement this—set constraints and review gates.”
When NOT to use
- You need product/portfolio strategy and positioning (use ).
ai-product-strategy - You need a full PRD/spec set for cross-functional alignment (use /
writing-prds).writing-specs-designs - You need primary user research (use /
conducting-user-interviews).usability-testing - You are doing model training/research, infra architecture, or bespoke model tuning (delegate to ML/eng; this skill assumes API models).
- You only want “which model/provider should we pick?” (treat as an input; if it dominates, do a separate evaluation doc).
涵盖内容
- 构建并交付基于LLM的功能/应用(助手、Copilot、轻量Agent工作流)
- 提示词+工具契约设计(指令、Schema、示例、防护规则)
- 数据质量与评估(测试集、评分标准、红队测试、迭代循环)
- 生产就绪性(延迟/成本预算、日志记录、降级方案、安全/合规检查)
- 安全使用代码Agent(Codex/Claude Code)加速工程开发
适用场景
- “将这个LLM功能想法转化为包含提示词、评估方案和上线检查项的开发计划。”
- “我们需要为LLM工作流设计系统提示词+工具定义+输出Schema。”
- “我们的LLM应用不稳定——设计一套评估方案和迭代循环来稳定质量。”
- “设计一套带有安全机制和监控的RAG/工具调用型Agent方案。”
- “我们希望使用AI代码Agent来实现这个功能——设定约束条件和审核关卡。”
不适用场景
- 你需要产品/组合战略与定位(请使用)。
ai-product-strategy - 你需要用于跨职能对齐的完整PRD/规格文档(请使用/
writing-prds)。writing-specs-designs - 你需要开展初级用户研究(请使用/
conducting-user-interviews)。usability-testing - 你正在进行模型训练/研究、基础设施架构设计或定制模型调优(请委托给ML/工程团队;本技能基于API调用型模型)。
- 你仅想了解“我们应该选择哪个模型/服务商?”(将其视为输入;若该问题占主导,请单独撰写评估文档)。
Inputs
输入要求
Minimum required
- Use case + target user + what “good” looks like (success metrics + failure modes)
- The LLM’s job: generate text, transform data, classify, extract, plan, or take actions via tools
- Constraints: privacy/compliance, data sensitivity, latency, cost, reliability, supported regions
- Integration surface: UI/workflow, downstream systems/APIs/tools, and any required output schema
Missing-info strategy
- Ask up to 5 questions from references/INTAKE.md (3–5 at a time).
- If details remain missing, proceed with explicit assumptions and provide 2–3 options (prompting vs RAG vs tool use; autonomy level).
- If asked to write code or run commands, request confirmation and use least privilege (no secrets; avoid destructive changes).
最低必填信息
- 用例+目标用户+“合格”的定义(成功指标+失败场景)
- LLM的具体任务:生成文本、转换数据、分类、提取信息、规划,或通过工具执行操作
- 约束条件:隐私/合规要求、数据敏感度、延迟、成本、可靠性、支持区域
- 集成场景:UI/工作流、下游系统/API/工具,以及任何要求的输出Schema
缺失信息处理策略
- 从references/INTAKE.md中提出最多5个问题(每次3–5个)。
- 若仍有信息缺失,基于明确假设推进,并提供2–3种可选方案(提示词工程 vs RAG vs 工具调用;自主权限等级)。
- 若被要求编写代码或执行命令,需先请求确认,并遵循最小权限原则(不处理密钥;避免破坏性操作)。
Outputs (deliverables)
输出成果(交付物)
Produce an LLM Build Pack (in chat; or as files if requested), in this order:
- Feature brief (goal, users, non-goals, constraints, success + guardrails)
- System design sketch (pattern + architecture, context strategy, budgets, failure handling)
- Prompt + tool contract (system prompt, tool schemas, output schema, examples, refusal/guardrails)
- Data + evaluation plan (test set, rubrics, automated checks, red-team suite, acceptance thresholds)
- Build + iteration plan (prototype slice, instrumentation, debugging loop, how to use coding agents safely)
- Launch + monitoring plan (logging, dashboards/alerts, fallback/rollback, incident playbook hooks)
- Risks / Open questions / Next steps (always included)
Templates: references/TEMPLATES.md
生成一份LLM Build Pack(聊天形式;或按要求生成文件),顺序如下:
- 功能简介(目标、用户群体、非目标、约束条件、成功指标+防护规则)
- 系统设计草图(模式+架构、上下文策略、预算、故障处理)
- 提示词+工具契约(系统提示词、工具Schema、输出Schema、示例、拒绝规则/防护机制)
- 数据+评估方案(测试集、评分标准、自动化检查、红队测试套件、验收阈值)
- 开发+迭代计划(原型切片、监控埋点、调试循环、如何安全使用代码Agent)
- 上线+监控方案(日志记录、仪表盘/告警、降级/回滚方案、事件响应手册钩子)
- 风险/待解决问题/下一步行动(必须包含)
模板:references/TEMPLATES.md
Workflow (8 steps)
工作流程(8个步骤)
1) Frame the job, boundary, and “good”
1) 明确任务、边界与“合格”标准
- Inputs: Use case, target user, constraints.
- Actions: Write a crisp job statement (“The LLM must…”) + 3–5 non-goals. Define success metrics and guardrails (quality, safety, cost, latency).
- Outputs: Draft Feature brief.
- Checks: A stakeholder can restate what the LLM does and does not do, and how success is measured.
- 输入: 用例、目标用户、约束条件。
- 行动: 撰写清晰的任务说明(“LLM必须……”)+3–5条非目标。定义成功指标和防护规则(质量、安全、成本、延迟)。
- 输出: 草稿版功能简介。
- 校验: 利益相关者能够重述LLM的能做与不能做,以及如何衡量成功。
2) Choose the minimum viable autonomy pattern
2) 选择最小可行自主权限模式
- Inputs: Workflow + risk tolerance.
- Actions: Decide assistant vs copilot vs agent-like tool use. Identify “human control points” (review/approve moments) and what the model is never allowed to do.
- Outputs: Autonomy decisions captured in Feature brief.
- Checks: Any action-taking behavior has explicit permissions, confirmations, and an undo/rollback story.
- 输入: 工作流+风险容忍度。
- 行动: 决定采用助手、Copilot还是类Agent的工具调用模式。确定“人工控制点”(审核/批准环节)以及模型绝对禁止执行的操作。
- 输出: 自主权限决策记录在功能简介中。
- 校验: 任何执行操作的行为都有明确的权限、确认机制,以及撤销/回滚方案。
3) Design the context strategy (prompting → RAG → tools)
3) 设计上下文策略(提示词工程 → RAG → 工具调用)
- Inputs: Data sources, integration points, constraints.
- Actions: Decide how the model gets reliable context: instruction hierarchy, retrieval strategy, tool calls, structured inputs. Define the “source of truth” and how conflicts are handled.
- Outputs: Draft System design sketch.
- Checks: You can explain (a) what data is used, (b) where it comes from, (c) how freshness/authority is enforced.
- 输入: 数据源、集成点、约束条件。
- 行动: 确定模型获取可靠上下文的方式:指令层级、检索策略、工具调用、结构化输入。定义“事实来源”以及冲突处理规则。
- 输出: 草稿版系统设计草图。
- 校验: 能够解释(a)使用哪些数据,(b)数据来源,(c)如何保证数据的新鲜度/权威性。
4) Draft the prompt + tool contract (make the system legible)
4) 起草提示词+工具契约(提升系统可解释性)
- Inputs: Job statement + context strategy + output schema needs.
- Actions: Write the system prompt, tool descriptions, and output schema. Add examples and explicit DO/DO NOT rules. Include safe failure behavior (ask clarifying questions, abstain, cite sources).
- Outputs: Prompt + tool contract.
- Checks: A reviewer can predict behavior for 5–10 representative inputs; contract includes at least 3 hard constraints and examples.
- 输入: 任务说明+上下文策略+输出Schema需求。
- 行动: 编写系统提示词、工具描述和输出Schema。添加示例和明确的允许/禁止规则。包含安全故障处理逻辑(询问澄清问题、拒绝执行、引用来源)。
- 输出: 提示词+工具契约。
- 校验: 审核人员能够预测5–10个典型输入对应的模型行为;契约包含至少3条硬性约束和示例。
5) Build the eval set + rubric (debug like software)
5) 构建测试集+评分标准(像调试软件一样调试LLM)
- Inputs: Expected behaviors + failure modes + edge cases.
- Actions: Create a test set covering normal cases, tricky cases, and red-team cases. Define a scoring rubric and acceptance thresholds. Add automated checks where possible (schema validity, citation presence, forbidden content).
- Outputs: Data + evaluation plan.
- Checks: You can run the same prompts repeatedly and measure improvement/regression; evals cover the top failure modes.
- 输入: 预期行为+失败场景+边缘案例。
- 行动: 创建覆盖常规场景、复杂场景和红队测试场景的测试集。定义评分标准和验收阈值。尽可能添加自动化检查(Schema有效性、引用存在性、禁止内容检测)。
- 输出: 数据+评估方案。
- 校验: 能够重复运行相同提示词并衡量改进/退化情况;评估覆盖主要失败场景。
6) Prototype a thin slice, using coding agents safely
6) 构建最小可行原型,安全使用代码Agent
- Inputs: System sketch + prompt contract + eval plan.
- Actions: Implement the smallest end-to-end slice. Use coding agents for “lower hanging fruit” tasks, but keep tight constraints: small diffs, tests, code review, no secret handling.
- Outputs: Build + iteration plan (and optionally a prototype plan/checklist).
- Checks: You can explain what the agent changed, why, and how it was validated (tests, evals, manual review).
- 输入: 系统设计草图+提示词契约+评估方案。
- 行动: 实现最小端到端切片。使用代码Agent完成“低难度”任务,但需严格约束:小范围代码变更、测试、代码审核、不处理密钥。
- 输出: 开发+迭代计划(可选包含原型计划/检查清单)。
- 校验: 能够解释Agent做了哪些变更、原因,以及如何验证(测试、评估、人工审核)。
7) Production readiness: budgets, monitoring, and failure handling
7) 生产就绪:预算、监控与故障处理
- Inputs: Prototype learnings + constraints.
- Actions: Define cost/latency budgets, fallbacks, rate limits, logging fields, and alert thresholds. Address prompt injection/tool misuse risks; add safeguards and review processes.
- Outputs: Launch + monitoring plan.
- Checks: There is a clear path to detect regressions, cap cost, and safely degrade when the model misbehaves.
- 输入: 原型学习成果+约束条件。
- 行动: 定义成本/延迟预算、降级方案、速率限制、日志字段和告警阈值。解决提示词注入/工具滥用风险;添加防护措施和审核流程。
- 输出: 上线+监控方案。
- 校验: 存在清晰的路径来检测退化情况、控制成本,并在模型行为异常时安全降级。
8) Quality gate + finalize
8) 质量校验+最终定稿
- Inputs: Full draft pack.
- Actions: Run references/CHECKLISTS.md and score with references/RUBRIC.md. Tighten unclear contracts, add missing tests, and always include Risks / Open questions / Next steps.
- Outputs: Final LLM Build Pack.
- Checks: A team can execute the plan without a meeting; unknowns are explicit and owned.
- 输入: 完整的Build Pack草稿。
- 行动: 运行references/CHECKLISTS.md并使用references/RUBRIC.md评分。优化模糊的契约,补充缺失的测试,并且必须包含风险/待解决问题/下一步行动。
- 输出: 最终版LLM Build Pack。
- 校验: 团队无需开会即可执行该计划;未知事项明确且有负责人。
Quality gate (required)
质量校验(必填)
- Use references/CHECKLISTS.md and references/RUBRIC.md.
- Always include: Risks, Open questions, Next steps.
- 使用references/CHECKLISTS.md和references/RUBRIC.md。
- 必须包含:风险、待解决问题、下一步行动。
Examples
示例
Example 1 (RAG copilot): “Use to plan a support-response copilot that drafts replies using our internal KB. Constraints: no PII leakage; must cite sources; p95 latency < 3s; cost < $0.10/ticket.”
Expected: LLM Build Pack with prompt/tool contract, eval set (including privacy red-team cases), and monitoring/rollback plan.
building-with-llmsExpected: LLM Build Pack with prompt/tool contract, eval set (including privacy red-team cases), and monitoring/rollback plan.
Example 2 (tool-using workflow): “Use to design an LLM workflow that turns meeting notes into action items and Jira tickets (human review required). Output must be valid JSON.”
Expected: output schema + tool contract + eval plan for structured extraction + guardrails against over-creation.
building-with-llmsExpected: output schema + tool contract + eval plan for structured extraction + guardrails against over-creation.
Boundary example: “Fine-tune/train a new LLM from scratch.”
Response: out of scope; propose an API-model approach and highlight what ML/infra work is required if training is truly needed.
Response: out of scope; propose an API-model approach and highlight what ML/infra work is required if training is truly needed.
示例1(RAG辅助助手): “使用来规划一个支持回复助手,该助手利用我们的内部知识库起草回复。约束条件:不得泄露PII;必须引用来源;p95延迟<3秒;单工单成本<$0.10。”
预期产出:包含提示词/工具契约、测试集(含隐私红队测试案例)以及监控/回滚方案的LLM Build Pack。
building-with-llms预期产出:包含提示词/工具契约、测试集(含隐私红队测试案例)以及监控/回滚方案的LLM Build Pack。
示例2(工具调用工作流): “使用设计一个LLM工作流,将会议纪要转化为行动项和Jira工单(需人工审核)。输出必须为合法JSON。”
预期产出:输出Schema + 工具契约 + 结构化提取评估方案 + 防止过度创建的防护规则。
building-with-llms预期产出:输出Schema + 工具契约 + 结构化提取评估方案 + 防止过度创建的防护规则。
边界示例: “从头开始微调/训练一个新的LLM。”
回复:超出本技能范围;建议采用API调用型模型方案,若确实需要训练,则说明所需的ML/基础设施工作内容。
回复:超出本技能范围;建议采用API调用型模型方案,若确实需要训练,则说明所需的ML/基础设施工作内容。