argumentation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseConstruct Arguments
构建论证
Build rigorous arguments from hypothesis through reasoning to concrete evidence. Every persuasive technical claim follows the same triad: a clear hypothesis states what you believe, an argument explains why it holds, and examples prove that it holds. This skill teaches you to apply that structure to code reviews, design decisions, research writing, and any context where claims need justification.
从假设出发,经过逻辑推理再落地到具体证据,搭建严谨的论证体系。所有有说服力的技术主张都遵循相同的三元结构:清晰的假设说明你认为什么成立,论证解释为什么成立,示例证明确实成立。本技能教你将这套结构应用到代码评审、设计决策、研究写作等所有需要为主张提供合理性支撑的场景中。
When to Use
适用场景
- Writing or reviewing a PR description that proposes a technical change
- Justifying a design decision in an ADR (Architecture Decision Record)
- Constructing feedback in a code review that goes beyond "I don't like this"
- Writing a research argument or technical proposal
- Challenging or defending an approach in a technical discussion
- 撰写或审核提出技术变更的PR描述
- 在ADR(架构决策记录)中论证设计决策的合理性
- 输出不止于“我不喜欢这个”的有实质内容的代码评审反馈
- 撰写研究论证或技术提案
- 在技术讨论中质疑或论证某一技术方案
Inputs
输入要求
- Required: A claim or position that needs justification
- Required: Context (code review, design decision, research, documentation)
- Optional: Audience (peer developers, reviewers, stakeholders, researchers)
- Optional: Counterarguments or alternative positions to address
- Optional: Evidence or data available to support the claim
- 必填:需要合理性支撑的主张或立场
- 必填:场景上下文(代码评审、设计决策、研究、文档)
- 可选:受众(同级开发、评审人、利益相关方、研究人员)
- 可选:需要回应的反方论点或替代方案
- 可选:可用于支撑主张的证据或数据
Procedure
执行流程
Step 1: Formulate the Hypothesis
步骤1:制定假设
State your claim as a clear, falsifiable hypothesis. A hypothesis is not an opinion or a preference -- it is a specific assertion that can be tested against evidence.
- Write the claim in one sentence
- Apply the falsifiability test: can someone prove this wrong with evidence?
- Scope it narrowly: constrain to a specific context, codebase, or domain
- Distinguish from opinions by checking for testable criteria
Falsifiable vs. unfalsifiable:
| Unfalsifiable (opinion) | Falsifiable (hypothesis) |
|---|---|
| "This code is bad" | "This function has O(n^2) complexity where O(n) is achievable" |
| "We should use TypeScript" | "TypeScript's type system will catch the class of null-reference bugs that caused 4 of our last 6 production incidents" |
| "The API design is cleaner" | "Replacing the 5 endpoint variants with a single parameterized endpoint reduces the public API surface by 60%" |
| "This research approach is better" | "Method A achieves higher precision than Method B on dataset X at the 95% confidence level" |
Expected: A one-sentence hypothesis that is specific, scoped, and falsifiable. Someone reading it can immediately imagine what evidence would confirm or refute it.
On failure: If the hypothesis feels vague, apply the "how would I disprove this?" test. If you cannot imagine counter-evidence, the claim is an opinion, not a hypothesis. Narrow the scope or add measurable criteria until it becomes testable.
将你的主张表述为清晰、可证伪的假设。假设不是观点或偏好,而是可以通过证据验证的具体断言。
- 用一句话描述你的主张
- 应用可证伪性测试:其他人能否通过证据证明这个主张不成立?
- 收窄适用范围:限定在特定上下文、代码库或领域内
- 通过检查是否存在可测试的标准,和主观观点做区分
可证伪 vs 不可证伪:
| 不可证伪(观点) | 可证伪(假设) |
|---|---|
| "这段代码写得不好" | "该函数的时间复杂度为O(n^2),存在优化到O(n)的空间" |
| "我们应该用TypeScript" | "TypeScript的类型系统可以捕获导致我们过去6次生产事故中4次的空引用类Bug" |
| "这个API设计更简洁" | "将5个端点变体替换为单个参数化端点后,公共API面减少60%" |
| "这个研究方法更好" | "在数据集X上,95%置信度下方法A的精度高于方法B" |
预期产出:一句具体、范围明确、可证伪的假设,阅读者可以立刻想到能证明或证伪它的证据。
失败处理:如果假设表述模糊,就问自己“我要怎么证伪它?”。如果你想不到反证,说明这只是个观点而非假设,需要收窄范围或增加可量化标准,直到它变得可测试。
Step 2: Identify the Argument Type
步骤2:确定论证类型
Select the logical structure that best supports your hypothesis. Different claims call for different reasoning strategies.
- Review the four argument types:
| Type | Structure | Best for |
|---|---|---|
| Deductive | If A then B; A is true; therefore B | Formal proofs, type safety claims |
| Inductive | Observed pattern across N cases; therefore likely in general | Performance data, test results |
| Analogical | X is similar to Y in relevant ways; Y has property P; therefore X likely has P | Design decisions, technology choices |
| Evidential | Evidence E is more likely under hypothesis H1 than H2; therefore H1 is supported | Research findings, A/B test results |
-
Match your hypothesis to the strongest argument type:
- Claiming something must be true? Use deductive
- Claiming something tends to be true based on observations? Use inductive
- Claiming something will likely work based on similar prior cases? Use analogical
- Claiming one explanation fits the data better than alternatives? Use evidential
-
Consider combining types for stronger arguments (e.g., analogical reasoning backed by inductive evidence)
Expected: A chosen argument type (or combination) with a clear rationale for why it fits the hypothesis.
On failure: If no single type fits cleanly, the hypothesis may need splitting into sub-claims. Break it into parts that each have a natural argument structure.
选择最适合支撑你假设的逻辑结构,不同的主张需要匹配不同的推理策略。
- 了解四类论证类型:
| 类型 | 结构 | 适用场景 |
|---|---|---|
| 演绎论证 | 如果A则B;A为真;因此B为真 | 形式化证明、类型安全主张 |
| 归纳论证 | 在N个案例中观察到规律;因此大概率普遍成立 | 性能数据、测试结果 |
| 类比论证 | X和Y在相关维度相似;Y具备属性P;因此X大概率也具备P | 设计决策、技术选型 |
| 实证论证 | 证据E在假设H1下出现的概率高于H2;因此H1得到支撑 | 研究结论、A/B测试结果 |
-
为你的假设匹配最合适的论证类型:
- 主张某件事一定成立?用演绎论证
- 基于观察主张某件事往往成立?用归纳论证
- 基于过往类似案例主张某件事大概率可行?用类比论证
- 主张某一种解释比其他方案更符合数据?用实证论证
-
可以组合多种论证类型强化说服力(比如用归纳证据支撑的类比论证)
预期产出:选定的论证类型(或组合),以及选择该类型匹配假设的清晰理由。
失败处理:如果没有单一类型能完全匹配,说明假设可能需要拆分为多个子主张,拆分后让每个子主张都能对应自然的论证结构。
Step 3: Construct the Argument
步骤3:搭建论证
Build the logical chain that connects your hypothesis to its justification.
- State the premises (the facts or assumptions you start from)
- Show the logical connection (how the premises lead to the conclusion)
- Steelman the strongest counterargument: state the best opposing case before refuting it
- Address the counterargument directly with evidence or reasoning
Worked example -- Code Review (deductive + inductive):
Hypothesis: "Extracting the validation logic into a shared module will reduce bug duplication across the three API handlers."Premises:
- The three handlers (
,createUser,updateUser) each implement the same input validation with slight variations (observed indeleteUser)src/handlers/- In the last 6 months, 3 of 5 validation bugs were fixed in one handler but not propagated to the others (see issues #42, #57, #61)
- Shared modules enforce a single source of truth for logic (deductive: if one implementation, then one place to fix)
Logical chain: Because the three handlers duplicate the same validation (premise 1), bugs fixed in one are missed in others (premise 2, inductive from 3/5 cases). A shared module means fixes apply once to all callers (deductive from shared-module semantics). Therefore, extraction will reduce bug duplication.Counterargument (steelmanned): "Shared modules introduce coupling -- a change to validation for one handler could break the others."Rebuttal: The handlers already share identical validation intent; the coupling is implicit and harder to maintain. Making it explicit via a shared module with parameterized options (e.g.,) makes the coupling visible and testable. The current implicit duplication is riskier because it hides the dependency.validate(input, { requireEmail: true })
Worked example -- Research (evidential):
Hypothesis: "Pre-training on domain-specific corpora improves downstream task performance more than increasing general corpus size for biomedical NER."Premises:
- BioBERT pre-trained on PubMed (4.5B words) outperforms BERT-Large pre-trained on general English (16B words) on 6/6 biomedical NER benchmarks (Lee et al., 2020)
- SciBERT pre-trained on Semantic Scholar (3.1B words) outperforms BERT-Base on SciERC and JNLPBA despite a smaller pre-training corpus
- General-domain scaling (BERT-Base to BERT-Large, 3x parameters) yields smaller gains on biomedical NER than domain adaptation (BERT-Base to BioBERT, same parameters)
Logical chain: The evidence consistently shows that domain corpus selection outweighs corpus scale for biomedical NER (evidential: these results are more likely if domain specificity matters more than scale). Three independent comparisons point the same direction, strengthening the inductive case.Counterargument (steelmanned): "These results may not generalize beyond biomedical NER -- biomedicine has unusually specialized vocabulary that inflates the domain-adaptation advantage."Rebuttal: Valid limitation. The hypothesis is scoped to biomedical NER specifically. However, similar domain-adaptation gains appear in legal NLP (Legal-BERT) and financial NLP (FinBERT), suggesting the pattern may generalize to other specialized domains, though that is a separate claim requiring its own evidence.
Expected: A complete argument chain with premises, logical connection, a steelmanned counterargument, and a rebuttal. The reader can follow the reasoning step by step.
On failure: If the argument feels weak, check the premises. Weak arguments usually stem from unsupported premises, not faulty logic. Find evidence for each premise or acknowledge it as an assumption. If the counterargument is stronger than the rebuttal, the hypothesis may need revision.
搭建连接假设和合理性支撑的逻辑链。
- 列出前提(你推理的起点事实或假设)
- 展示逻辑关联:说明前提如何推导出结论
- 钢人式呈现最强反方论点:在反驳前先表述对方最合理的反对意见
- 用证据或推理直接回应反方论点
示例:代码评审(演绎+归纳论证)
假设:“将校验逻辑提取到公共模块可以减少三个API处理器中的重复Bug”。前提:
- 三个处理器(
、createUser、updateUser)各自实现了基本一致的输入校验,仅存在微小差异(在deleteUser目录中可验证)src/handlers/- 过去6个月中,5个校验Bug里有3个只在一个处理器中修复,没有同步到其他处理器(见issue #42、#57、#61)
- 公共模块保证逻辑的单一事实来源(演绎逻辑:如果只有一份实现,就只需要在一处修复)
逻辑链:由于三个处理器重复实现了相同的校验逻辑(前提1),一处修复的Bug会在其他地方遗漏(前提2,从3/5的案例归纳而来)。公共模块意味着一次修复对所有调用方生效(从公共模块语义演绎而来),因此提取公共模块可以减少重复Bug。钢人式反方论点:“公共模块会引入耦合——为某一个处理器修改校验逻辑可能会破坏其他处理器”。回应:这三个处理器的校验逻辑本意就是一致的,当前的耦合是隐式的,更难维护。通过带参数的公共模块(比如)把耦合显式化,反而会让耦合可见、可测试。当前的隐式重复隐藏了依赖,风险更高。validate(input, { requireEmail: true })
示例:研究(实证论证)
假设:“对于生物医学NER任务,在领域专属语料上预训练比扩大通用语料规模更能提升下游任务性能”。前提:
- 在PubMed(45亿词)上预训练的BioBERT,在6个生物医学NER基准测试上的表现都优于在通用英语语料(160亿词)上预训练的BERT-Large(Lee等人,2020)
- 在Semantic Scholar(31亿词)上预训练的SciBERT,尽管预训练语料规模更小,在SciERC和JNLPBA任务上的表现优于BERT-Base
- 通用领域 scaling(BERT-Base到BERT-Large,参数扩3倍)在生物医学NER上带来的收益小于领域适配(BERT-Base到BioBERT,参数不变)
逻辑链:证据一致表明,对生物医学NER任务而言,领域语料选择的影响大于语料规模(实证逻辑:如果领域特异性比规模更重要,这些结果的出现概率更高)。三组独立对比都指向同一结论,强化了归纳说服力。钢人式反方论点:“这些结果可能无法推广到生物医学NER之外的场景——生物医学有异常特殊的词汇表,放大了领域适配的优势”。回应:这是合理的局限性,本假设的适用范围本来就限定在生物医学NER场景。不过在法律NLP(Legal-BERT)和金融NLP(FinBERT)场景中也观察到了类似的领域适配收益,说明该模式可能可以推广到其他专业领域,但这是另一个需要单独证据支撑的主张。
预期产出:完整的论证链,包含前提、逻辑关联、钢人式反方论点以及回应,读者可以一步步跟随推理过程。
失败处理:如果论证说服力弱,先检查前提。弱论证通常来自没有支撑的前提,而非逻辑错误。为每个前提寻找证据,或者明确将其标注为假设。如果反方论点比你的回应更有说服力,说明假设可能需要修正。
Step 4: Provide Concrete Examples
步骤4:提供具体示例
Support the argument with independently verifiable evidence. Examples are not illustrations -- they are the empirical foundation that makes the argument testable.
- Provide at least one positive example that confirms the hypothesis
- Provide at least one edge case or boundary example that tests limits
- Ensure each example is independently verifiable: another person can reproduce or check it without relying on your interpretation
- For code claims, reference specific files, line numbers, or commits
- For research claims, cite specific papers, datasets, or experimental results
Example selection criteria:
| Criterion | Good example | Bad example |
|---|---|---|
| Independently verifiable | "Issue #42 shows the bug was fixed in handler A but not B" | "We've seen this kind of bug before" |
| Specific | " | "There's duplication in the codebase" |
| Representative | "3 of 5 validation bugs in the last 6 months followed this pattern" | "I once saw a bug like this" |
| Includes edge cases | "This pattern holds for string inputs but not for file upload validation, which has handler-specific constraints" | (no limitations mentioned) |
Expected: Concrete examples that a reader can verify independently. At least one positive and one edge case. Each references a specific artifact (file, line, issue, paper, dataset).
On failure: If examples are hard to find, the hypothesis may be too broad or not grounded in observable reality. Narrow the scope to what you can actually point to. Absence of examples is a signal, not a gap to paper over with vague references.
用可独立验证的证据支撑你的论证。示例不是插图,而是让论证可测试的实证基础。
- 至少提供一个正面示例验证假设
- 至少提供一个边界案例或边缘场景测试假设的适用极限
- 确保每个示例都可独立验证:其他人不需要依赖你的主观解读就可以复现或核验
- 代码相关的主张要引用具体的文件、行号或commit
- 研究相关的主张要引用具体的论文、数据集或实验结果
示例选择标准:
| 标准 | 好示例 | 坏示例 |
|---|---|---|
| 可独立验证 | “Issue #42显示该Bug在处理器A中修复,但没有同步到B” | “我们之前见过这类Bug” |
| 具体明确 | “第47行的 | “代码库里有重复逻辑” |
| 有代表性 | “过去6个月里5个校验Bug中有3个符合这个规律” | “我之前见过一次类似的Bug” |
| 包含边缘案例 | “该模式对字符串输入有效,但不适用于文件上传校验,这类校验有处理器专属的约束” | (未提及任何局限性) |
预期产出:读者可以独立核验的具体示例,至少包含一个正面示例和一个边缘案例,每个示例都引用具体的 artifact(文件、行号、issue、论文、数据集)。
失败处理:如果很难找到示例,说明假设可能太宽泛,或者没有基于可观察的现实。收窄适用范围到你能实际找到支撑的内容。没有示例是信号,而不是可以用模糊引用掩盖的缺口。
Step 5: Assemble the Complete Argument
步骤5:组装完整论证
Combine hypothesis, argument, and examples into the appropriate format for the context.
-
For code reviews -- structure the comment as:
[S] <one-line summary of the suggestion> **Hypothesis**: <what you believe should change and why> **Argument**: <the logical case, with premises> **Evidence**: <specific files, lines, issues, or metrics> **Suggestion**: <concrete code change or approach> -
For PR descriptions -- structure the body as:markdown
## Why <Hypothesis: what problem this solves and the specific improvement claim> ## Approach <Argument: why this approach was chosen over alternatives> ## Evidence <Examples: benchmarks, bug references, before/after comparisons> -
For ADRs (Architecture Decision Records) -- use the standard ADR format with the triad mapped to Context (hypothesis), Decision (argument), and Consequences (examples/evidence of expected outcomes)
-
For research writing -- map to the standard structure: Introduction states the hypothesis, Methods/Results provide argument and examples, Discussion addresses counterarguments
-
Review the assembled argument for:
- Logical gaps (does the conclusion actually follow from the premises?)
- Missing evidence (are there unsupported premises?)
- Unaddressed counterarguments (is the strongest objection answered?)
- Scope creep (does the argument stay within the hypothesis bounds?)
Expected: A complete, formatted argument appropriate for its context. The reader can evaluate the hypothesis, follow the reasoning, check the evidence, and consider counterarguments -- all in one coherent structure.
On failure: If the assembled argument feels disjointed, the hypothesis may be too broad. Split it into focused sub-arguments, each with its own hypothesis-argument-example triad. Two tight arguments are stronger than one sprawling one.
将假设、论证、示例组合成适配场景的格式。
-
代码评审场景——评论结构如下:
[S] <建议的一句话摘要> **假设**:<你认为应该做什么变更,以及原因> **论证**:<逻辑依据,包含前提> **证据**:<具体的文件、行号、issue或指标> **建议**:<具体的代码变更或落地方案> -
PR描述场景——正文结构如下:markdown
## 背景 <假设:解决的问题,以及具体的改进主张> ## 方案 <论证:为什么选择这个方案而非替代方案> ## 证据 <示例:基准测试、Bug引用、前后对比> -
ADR(架构决策记录)场景——使用标准ADR格式,将三元组对应到上下文(假设)、决策(论证)、影响(预期结果的示例/证据)
-
研究写作场景——对应标准结构:引言说明假设,方法/结果提供论证和示例,讨论回应反方论点
-
检查组装后的论证是否符合要求:
- 逻辑缺口:结论是不是真的能从前提推导出来?
- 证据缺失:有没有未被支撑的前提?
- 未回应的反方论点:最强的反对意见有没有被解答?
- 范围溢出:论证有没有超出假设的限定范围?
预期产出:适配场景的完整、格式规范的论证。读者可以评估假设、跟随推理过程、核验证据、考量反方论点,所有内容都在统一的连贯结构中。
失败处理:如果组装后的论证显得零散,说明假设可能太宽泛。将其拆分为更聚焦的子论证,每个子论证都有自己的假设-论证-示例三元组。两个严谨的小论证比一个松散的大论证更有说服力。
Validation
校验清单
- Hypothesis is falsifiable (someone could disprove it with evidence)
- Hypothesis is scoped to a specific context, not a universal claim
- Argument type is identified and appropriate for the claim
- Premises are stated explicitly, not assumed as shared knowledge
- Logical chain connects premises to conclusion without gaps
- Strongest counterargument is steelmanned and addressed
- At least one positive example supports the hypothesis
- At least one edge case or limitation is acknowledged
- All examples are independently verifiable (references provided)
- Output format matches the context (code review, PR, ADR, research)
- No logical fallacies (appeal to authority, false dichotomy, strawman)
- 假设是可证伪的(其他人可以用证据证伪)
- 假设的适用范围限定在特定上下文,不是普适性主张
- 明确标注了论证类型,且适配对应的主张
- 前提被明确表述,没有默认是所有人都知道的共识
- 逻辑链没有缺口,可以从前提推导到结论
- 最强的反方论点被钢人式呈现并得到回应
- 至少有一个正面示例支撑假设
- 至少明确了一个边缘案例或局限性
- 所有示例都可独立验证(提供了引用来源)
- 输出格式适配场景(代码评审、PR、ADR、研究)
- 没有逻辑谬误(诉诸权威、假二分、稻草人论证)
Common Pitfalls
常见误区
- Stating opinions as hypotheses: "This code is messy" is a preference, not a hypothesis. Rewrite as a testable claim: "This module has 4 responsibilities that should be separated per the single-responsibility principle, as evidenced by its 6 public methods spanning 3 unrelated domains."
- Skipping the counterargument: Unaddressed objections weaken the argument even if the reader never voices them. Always steelman -- state the strongest opposing case in its best form before rebutting it.
- Vague examples: "We've seen this pattern before" is not evidence. Point to specific issues, commits, lines, papers, or datasets. If you cannot find a concrete example, your hypothesis may not be well-grounded.
- Argument from authority: "The senior engineer said so" or "Google does it this way" is not a logical argument. Authority can motivate investigation, but the argument must stand on its own evidence and reasoning.
- Scope creep in conclusions: Drawing conclusions broader than what the evidence supports. If your examples cover 3 API handlers, don't conclude about the entire codebase. Match conclusion scope to evidence scope.
- Conflating argument types: Using inductive language ("tends to") for deductive claims ("must be") or vice versa. Be precise about the strength of your conclusion -- deductive arguments give certainty, inductive arguments give probability.
- 把观点当成假设:“这段代码很乱”是偏好,不是假设。重写为可测试的主张:“根据单一职责原则,这个模块有4个职责应该拆分,证据是它的6个公共方法覆盖了3个不相关的领域”。
- 跳过反方论点:即使读者没有说出来,未回应的反对意见也会削弱论证的说服力。永远做钢人论证——在反驳前先把反方论点以最好的形式表述出来。
- 示例模糊:“我们之前见过这类模式”不是证据,要指向具体的issue、commit、代码行、论文或数据集。如果你找不到具体的示例,说明你的假设可能没有足够的现实支撑。
- 诉诸权威:“高级工程师这么说的”或者“Google就是这么做的”不是逻辑论证。权威可以驱动调研,但论证必须基于自身的证据和推理。
- 结论范围溢出:得出的结论比证据支撑的范围更广。如果你的示例只覆盖了3个API处理器,就不要得出关于整个代码库的结论,让结论范围和证据范围匹配。
- 混淆论证类型:对演绎类主张(“一定是”)使用归纳类表述(“往往是”),反之亦然。要精准表述结论的可信度——演绎论证给出确定性,归纳论证给出概率。
Related Skills
相关技能
- -- applying argumentation to structured code review feedback
review-pull-request - -- constructing evidence-based arguments in research contexts
review-research - -- justifying architectural decisions with the hypothesis-argument-example triad
review-software-architecture - -- skills themselves are structured arguments for how to accomplish a task
create-skill - -- documenting conventions and decisions that benefit from clear justification
write-claude-md
- ——将论证方法应用到结构化代码评审反馈中
review-pull-request - ——在研究场景中构建基于证据的论证
review-research - ——用假设-论证-示例三元组论证架构决策的合理性
review-software-architecture - ——技能本身就是关于如何完成任务的结构化论证
create-skill - ——记录需要清晰合理性支撑的规范和决策
write-claude-md