evaluating-new-technology

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Evaluating New Technology

新技术评估

Scope

适用范围

Covers
  • Evaluating a new tool/platform/vendor (including AI products) for adoption
  • Emerging tech “should we use this?” decisions
  • Build vs buy decisions and tech stack changes
  • Running a proof-of-value pilot and capturing evidence
  • First-pass risk review (security/privacy/compliance, vendor claims, operational readiness)
When to use
  • “Evaluate this new AI tool/vendor for our team.”
  • “Should we build this in-house or buy a vendor?”
  • “We’re considering changing our analytics/experimentation stack—make a recommendation.”
  • “Create a technology evaluation doc with a pilot plan, risks, and decision memo.”
When NOT to use
  • You don’t have a real problem/job to solve yet (use
    problem-definition
    first).
  • You need a full product strategy/roadmap (use
    ai-product-strategy
    ).
  • You’re designing how to build an LLM system (use
    building-with-llms
    ).
  • You need a formal security assessment / penetration testing (engage security; this skill produces a structured first pass).
适用场景
  • 评估拟采用的新工具/平台/供应商(包括AI产品)
  • 新兴技术“是否应该采用”的决策
  • 自研vs采购决策以及技术栈变更
  • 开展价值验证试点并收集证据
  • 初步风险评审(安全/隐私/合规、供应商承诺、运营就绪情况)
适用时机
  • “为我们团队评估这款新AI工具/供应商。”
  • “我们应该自研还是采购供应商的产品?”
  • “我们正考虑更换分析/实验技术栈——给出建议。”
  • “创建一份包含试点计划、风险和决策备忘录的技术评估文档。”
不适用时机
  • 尚未明确需要解决的实际问题/工作任务(请先使用
    problem-definition
  • 需要完整的产品战略/路线图(请使用
    ai-product-strategy
  • 正在设计LLM系统的构建方案(请使用
    building-with-llms
  • 需要正式的安全评估/渗透测试(请联系安全团队;本技能仅提供结构化的初步评审)

Inputs

输入信息

Minimum required
  • Candidate technology (what it is, vendor/build option, links if available)
  • Problem/workflow to improve + who it’s for
  • Current approach/stack and what’s not working
  • Constraints: data sensitivity, privacy/compliance, budget, timeline, regions, deployment model (SaaS/on-prem)
  • Decision context: who decides, adoption scope, risk tolerance
Missing-info strategy
  • Ask up to 5 questions from references/INTAKE.md (3–5 at a time).
  • If still missing, proceed with explicit assumptions and present 2–3 options (e.g., buy vs build vs defer).
  • Do not request secrets. If asked to run tools, change production systems, or sign up for vendors, require explicit confirmation.
必备信息
  • 候选技术(包括技术类型、供应商/自研选项、相关链接(如有))
  • 待优化的问题/工作流 + 受众群体
  • 当前方案/技术栈及其存在的问题
  • 约束条件:数据敏感度、隐私/合规要求、预算、时间线、覆盖区域、部署模式(SaaS/on-prem)
  • 决策背景:决策主体、采用范围、风险容忍度
缺失信息处理策略
  • references/INTAKE.md中最多提出5个问题(每次3-5个)
  • 如果仍有信息缺失,基于明确的假设推进,并给出2-3个选项(例如:采购vs自研vs推迟)
  • 不得索要机密信息。若被要求运行工具、变更生产系统或注册供应商服务,需获得明确确认。

Outputs (deliverables)

输出成果(交付物)

Produce a Technology Evaluation Pack (in chat; or as files if requested), in this order:
  1. Evaluation brief (problem, stakeholders, decision, constraints, non-goals, assumptions)
  2. Options & criteria matrix (status quo + alternatives, criteria, scoring, notes)
  3. Build vs buy analysis (bandwidth/TCO, core competency, opportunity cost, lock-in)
  4. Pilot (proof-of-value) plan (hypotheses, scope, metrics, timeline, exit criteria)
  5. Risk & guardrails review (security/privacy/compliance, vendor claims, mitigations)
  6. Decision memo (recommendation, rationale, trade-offs, adoption/rollback plan)
  7. Risks / Open questions / Next steps (always included)
Templates: references/TEMPLATES.md
生成一份技术评估包(以聊天形式呈现;若有需求可生成文件),内容顺序如下:
  1. 评估摘要(问题、利益相关方、决策事项、约束条件、非目标、假设)
  2. 选项与评估标准矩阵(现状+替代方案、评估标准、评分、备注)
  3. 自研vs采购分析(资源投入/总拥有成本、核心竞争力、机会成本、锁定风险)
  4. 试点(价值验证)计划(假设、范围、指标、时间线、退出标准)
  5. 风险与防护措施评审(安全/隐私/合规、供应商承诺、缓解方案)
  6. 决策备忘录(建议、理由、权衡、采用/回滚计划)
  7. 风险/未解决问题/下一步行动(必须包含)
模板:references/TEMPLATES.md

Workflow (8 steps)

工作流程(8个步骤)

1) Start with the problem (avoid tool bias)

1) 从问题出发(避免工具偏见)

  • Inputs: Candidate tech, target workflow/users, current pain.
  • Actions: Write a one-sentence problem statement and “who feels it.” List 3–5 symptoms and 3–5 non-goals.
  • Outputs: Draft Evaluation brief (problem + non-goals).
  • Checks: You can explain the decision without naming the tool.
  • 输入信息: 候选技术、目标工作流/用户群体、当前痛点
  • 行动: 撰写一句话问题陈述及“受影响人群”。列出3-5个症状和3-5个非目标。
  • 输出: 草稿版评估摘要(问题+非目标)
  • 校验: 无需提及具体工具即可解释决策背景

2) Define “good” and hard constraints

2) 定义“成功标准”和刚性约束

  • Inputs: Success metrics, constraints, risk tolerance, decision deadline.
  • Actions: Define success metrics (leading + lagging) and must-have constraints (privacy, compliance, security, uptime, latency/cost if relevant). Capture “deal breakers.”
  • Outputs: Evaluation brief (success + constraints + deal breakers).
  • Checks: A stakeholder can say what would make this a clear “yes” or “no.”
  • 输入信息: 成功指标、约束条件、风险容忍度、决策截止日期
  • 行动: 定义成功指标(前置指标+滞后指标)以及必备约束条件(隐私、合规、安全、可用性、延迟/成本(如相关))。明确“否决项”。
  • 输出: 评估摘要(成功指标+约束条件+否决项)
  • 校验: 利益相关方能够明确判断结果是“通过”还是“否决”

3) Map options and evaluation criteria (workflows → ROI)

3) 梳理选项与评估标准(工作流→投资回报率)

  • Inputs: Current stack, alternatives, stakeholders.
  • Actions: List options: status quo, 1–3 vendors, build, hybrid. Define criteria anchored to workflows enabled and ROI (time saved, revenue impact, risk reduction), not feature checklists.
  • Outputs: Options & criteria matrix.
  • Checks: Every criterion is measurable or at least falsifiable in a pilot.
  • 输入信息: 当前技术栈、替代方案、利益相关方
  • 行动: 列出选项:现状、1-3个供应商方案、自研、混合方案。基于可支持的工作流和投资回报率(节省时间、收入影响、风险降低)定义评估标准,而非仅做功能清单检查。
  • 输出: 选项与评估标准矩阵
  • 校验: 每项评估标准均可衡量,或至少可在试点中验证真伪

4) Fast reality check: integration + data fit

4) 快速现实校验:集成适配性+数据兼容性

  • Inputs: Architecture constraints, data sources, integration points.
  • Actions: Identify required integrations (SSO, data pipelines, APIs, logs). Note migration complexity, data ownership, and export/exit path. For PLG/growth tools, sanity-check the stack layers (data hub → analytics → lifecycle).
  • Outputs: Notes added to Options & criteria matrix (integration complexity + stack fit).
  • Checks: You can describe the end-to-end data/control flow in 5–10 bullets.
  • 输入信息: 架构约束、数据源、集成点
  • 行动: 识别所需集成(SSO、数据管道、API、日志)。记录迁移复杂度、数据所有权以及导出/退出路径。对于PLG/增长工具,确认技术栈层级合理性(数据中心→分析→生命周期管理)。
  • 输出:选项与评估标准矩阵中添加备注(集成复杂度+技术栈适配性)
  • 校验: 能够用5-10条要点描述端到端的数据/控制流

5) Build vs buy with “bandwidth” as a first-class cost

5) 自研vs采购:将“资源投入”作为核心成本考量

  • Inputs: Engineering capacity, core competencies, opportunity cost.
  • Actions: Compare build vs buy using a bandwidth/TCO ledger (build time, maintenance, on-call, upgrades, vendor management). Prefer building only when it’s a core differentiator or the vendor market is immature/unacceptable.
  • Outputs: Build vs buy analysis.
  • Checks: The analysis includes opportunity cost and who would maintain the system 12 months from now.
  • 输入信息: 工程团队产能、核心竞争力、机会成本
  • 行动: 基于资源投入/总拥有成本台账(自研时间、维护成本、值班成本、升级成本、供应商管理成本)对比自研与采购。仅当该技术为核心差异化竞争力,或供应商市场不成熟/不可接受时,优先考虑自研。
  • 输出: 自研vs采购分析
  • 校验: 分析内容包含机会成本,以及12个月后负责维护系统的主体

6) Risk & guardrails review (be skeptical of “100% safe” claims)

6) 风险与防护措施评审(对“100%安全”的承诺保持怀疑)

  • Inputs: Data sensitivity, threat model, vendor posture, deployment model.
  • Actions: Identify key risks (security, privacy, compliance, reliability, lock-in). For AI vendors: treat “guardrails catch everything” claims as marketing; assume determined attackers exist and design defense-in-depth (permissions, logging, human approval points, eval/red-team).
  • Outputs: Risk & guardrails review.
  • Checks: Each top risk has an owner and a mitigation or a “blocker” label.
  • 输入信息: 数据敏感度、威胁模型、供应商安全态势、部署模式
  • 行动: 识别关键风险(安全、隐私、合规、可靠性、锁定风险)。对于AI供应商:将“防护措施可拦截所有风险”的宣传视为营销话术;假设存在恶意攻击者,并设计纵深防御方案(权限管控、日志记录、人工审批节点、评估/红队测试)。
  • 输出: 风险与防护措施评审
  • 校验: 每个顶级风险均有负责人和缓解方案,或标记为“阻塞项”

7) Plan a proof-of-value pilot (or document why you can skip it)

7) 制定价值验证试点计划(或说明无需试点的理由)

  • Inputs: Criteria, risks, timeline, stakeholders.
  • Actions: Define pilot hypotheses, scope, success metrics, test dataset, and evaluation method. Specify timeline, resourcing, and exit criteria (adopt / iterate / reject). Include rollback and data deletion requirements.
  • Outputs: Pilot plan.
  • Checks: A team can run the pilot without extra meetings; success/failure is unambiguous.
  • 输入信息: 评估标准、风险、时间线、利益相关方
  • 行动: 定义试点假设、范围、成功指标、测试数据集和评估方法。明确时间线、资源配置和退出标准(采用/迭代/否决)。包含回滚和数据删除要求。
  • 输出: 试点计划
  • 校验: 团队无需额外会议即可开展试点;成功/失败的判定标准清晰明确

8) Decide, communicate, and quality-gate

8) 决策、沟通与质量校验

  • Inputs: Completed pack drafts.
  • Actions: Write the Decision memo with recommendation, trade-offs, and adoption plan. Run references/CHECKLISTS.md and score with references/RUBRIC.md. Always include Risks / Open questions / Next steps.
  • Outputs: Final Technology Evaluation Pack.
  • Checks: Decision is actionable (owner, date, next actions) and reversible where possible.
  • 输入信息: 完成的评估包草稿
  • 行动: 撰写决策备忘录,包含建议、权衡分析和采用计划。使用references/CHECKLISTS.md检查,并通过references/RUBRIC.md评分。必须包含风险/未解决问题/下一步行动
  • 输出: 最终版技术评估包
  • 校验: 决策具备可执行性(负责人、日期、下一步行动),且尽可能具备可逆性

Quality gate (required)

质量校验(必填)

  • Use references/CHECKLISTS.md and references/RUBRIC.md.
  • Always include: Risks, Open questions, Next steps.
  • 使用references/CHECKLISTS.mdreferences/RUBRIC.md进行校验
  • 必须包含:风险未解决问题下一步行动

Examples

示例

Example 1 (AI vendor): “Use
evaluating-new-technology
to evaluate an AI ‘prompt guardrails’ vendor for our support agent. Constraints: SOC2 required, PII present, must support SSO, budget $50k/yr, decision in 3 weeks.”
Expected: evaluation pack that treats guardrail claims skeptically and proposes defense-in-depth + a measurable pilot.
Example 2 (analytics stack): “Use
evaluating-new-technology
to choose between PostHog and Amplitude for our PLG product. Current stack: Segment + data warehouse; goal is faster iteration on onboarding and activation.”
Expected: options matrix + pilot plan tied to workflows (experiments, funnels, lifecycle triggers) and migration effort.
Boundary example: “What’s the best new AI tool we should adopt?”
Response: out of scope without a problem/workflow; ask intake questions and/or propose running
problem-definition
first.
示例1(AI供应商): “使用
evaluating-new-technology
为我们的支持Agent评估一款AI‘提示词防护’供应商。约束条件:需符合SOC2标准,涉及PII数据,必须支持SSO,预算每年5万美元,3周内完成决策。”
预期输出:一份对防护措施承诺持怀疑态度的评估包,提出纵深防御方案+可衡量的试点计划。
示例2(分析技术栈): “使用
evaluating-new-technology
为我们的PLG产品在PostHog和Amplitude之间做选择。当前技术栈:Segment + 数据仓库;目标是加快用户激活与留存的迭代速度。”
预期输出:选项矩阵+与工作流(实验、漏斗分析、生命周期触发)和迁移工作量挂钩的试点计划。
边界示例: “我们应该采用哪款最新的AI工具?”
回应:因缺少具体问题/工作流,超出适用范围;请提出采集问题或建议先运行
problem-definition