argumentation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Construct Arguments

构建论证

Build rigorous arguments from hypothesis through reasoning to concrete evidence. Every persuasive technical claim follows the same triad: a clear hypothesis states what you believe, an argument explains why it holds, and examples prove that it holds. This skill teaches you to apply that structure to code reviews, design decisions, research writing, and any context where claims need justification.

从假设出发，经过逻辑推理再落地到具体证据，搭建严谨的论证体系。所有有说服力的技术主张都遵循相同的三元结构：清晰的假设说明你认为什么成立，论证解释为什么成立，示例证明确实成立。本技能教你将这套结构应用到代码评审、设计决策、研究写作等所有需要为主张提供合理性支撑的场景中。

When to Use

适用场景

Writing or reviewing a PR description that proposes a technical change
Justifying a design decision in an ADR (Architecture Decision Record)
Constructing feedback in a code review that goes beyond "I don't like this"
Writing a research argument or technical proposal
Challenging or defending an approach in a technical discussion

撰写或审核提出技术变更的PR描述
在ADR（架构决策记录）中论证设计决策的合理性
输出不止于“我不喜欢这个”的有实质内容的代码评审反馈
撰写研究论证或技术提案
在技术讨论中质疑或论证某一技术方案

Inputs

输入要求

Required: A claim or position that needs justification
Required: Context (code review, design decision, research, documentation)
Optional: Audience (peer developers, reviewers, stakeholders, researchers)
Optional: Counterarguments or alternative positions to address
Optional: Evidence or data available to support the claim

必填：需要合理性支撑的主张或立场
必填：场景上下文（代码评审、设计决策、研究、文档）
可选：受众（同级开发、评审人、利益相关方、研究人员）
可选：需要回应的反方论点或替代方案
可选：可用于支撑主张的证据或数据

Procedure

执行流程

Step 1: Formulate the Hypothesis

步骤1：制定假设

State your claim as a clear, falsifiable hypothesis. A hypothesis is not an opinion or a preference -- it is a specific assertion that can be tested against evidence.

Write the claim in one sentence
Apply the falsifiability test: can someone prove this wrong with evidence?
Scope it narrowly: constrain to a specific context, codebase, or domain
Distinguish from opinions by checking for testable criteria

Falsifiable vs. unfalsifiable:

Unfalsifiable (opinion)	Falsifiable (hypothesis)
"This code is bad"	"This function has O(n^2) complexity where O(n) is achievable"
"We should use TypeScript"	"TypeScript's type system will catch the class of null-reference bugs that caused 4 of our last 6 production incidents"
"The API design is cleaner"	"Replacing the 5 endpoint variants with a single parameterized endpoint reduces the public API surface by 60%"
"This research approach is better"	"Method A achieves higher precision than Method B on dataset X at the 95% confidence level"

Expected: A one-sentence hypothesis that is specific, scoped, and falsifiable. Someone reading it can immediately imagine what evidence would confirm or refute it.

On failure: If the hypothesis feels vague, apply the "how would I disprove this?" test. If you cannot imagine counter-evidence, the claim is an opinion, not a hypothesis. Narrow the scope or add measurable criteria until it becomes testable.

将你的主张表述为清晰、可证伪的假设。假设不是观点或偏好，而是可以通过证据验证的具体断言。

用一句话描述你的主张
应用可证伪性测试：其他人能否通过证据证明这个主张不成立？
收窄适用范围：限定在特定上下文、代码库或领域内
通过检查是否存在可测试的标准，和主观观点做区分

可证伪 vs 不可证伪：

不可证伪（观点）	可证伪（假设）
"这段代码写得不好"	"该函数的时间复杂度为O(n^2)，存在优化到O(n)的空间"
"我们应该用TypeScript"	"TypeScript的类型系统可以捕获导致我们过去6次生产事故中4次的空引用类Bug"
"这个API设计更简洁"	"将5个端点变体替换为单个参数化端点后，公共API面减少60%"
"这个研究方法更好"	"在数据集X上，95%置信度下方法A的精度高于方法B"

预期产出：一句具体、范围明确、可证伪的假设，阅读者可以立刻想到能证明或证伪它的证据。

失败处理：如果假设表述模糊，就问自己“我要怎么证伪它？”。如果你想不到反证，说明这只是个观点而非假设，需要收窄范围或增加可量化标准，直到它变得可测试。

Step 2: Identify the Argument Type

步骤2：确定论证类型

Select the logical structure that best supports your hypothesis. Different claims call for different reasoning strategies.

Review the four argument types:

Type	Structure	Best for
Deductive	If A then B; A is true; therefore B	Formal proofs, type safety claims
Inductive	Observed pattern across N cases; therefore likely in general	Performance data, test results
Analogical	X is similar to Y in relevant ways; Y has property P; therefore X likely has P	Design decisions, technology choices
Evidential	Evidence E is more likely under hypothesis H1 than H2; therefore H1 is supported	Research findings, A/B test results

Match your hypothesis to the strongest argument type:
- Claiming something must be true? Use deductive
- Claiming something tends to be true based on observations? Use inductive
- Claiming something will likely work based on similar prior cases? Use analogical
- Claiming one explanation fits the data better than alternatives? Use evidential
Consider combining types for stronger arguments (e.g., analogical reasoning backed by inductive evidence)

Expected: A chosen argument type (or combination) with a clear rationale for why it fits the hypothesis.

On failure: If no single type fits cleanly, the hypothesis may need splitting into sub-claims. Break it into parts that each have a natural argument structure.

选择最适合支撑你假设的逻辑结构，不同的主张需要匹配不同的推理策略。

了解四类论证类型：

类型	结构	适用场景
演绎论证	如果A则B；A为真；因此B为真	形式化证明、类型安全主张
归纳论证	在N个案例中观察到规律；因此大概率普遍成立	性能数据、测试结果
类比论证	X和Y在相关维度相似；Y具备属性P；因此X大概率也具备P	设计决策、技术选型
实证论证	证据E在假设H1下出现的概率高于H2；因此H1得到支撑	研究结论、A/B测试结果

为你的假设匹配最合适的论证类型：
- 主张某件事一定成立？用演绎论证
- 基于观察主张某件事往往成立？用归纳论证
- 基于过往类似案例主张某件事大概率可行？用类比论证
- 主张某一种解释比其他方案更符合数据？用实证论证
可以组合多种论证类型强化说服力（比如用归纳证据支撑的类比论证）

预期产出：选定的论证类型（或组合），以及选择该类型匹配假设的清晰理由。

失败处理：如果没有单一类型能完全匹配，说明假设可能需要拆分为多个子主张，拆分后让每个子主张都能对应自然的论证结构。

Step 3: Construct the Argument

步骤3：搭建论证

Build the logical chain that connects your hypothesis to its justification.

State the premises (the facts or assumptions you start from)
Show the logical connection (how the premises lead to the conclusion)
Steelman the strongest counterargument: state the best opposing case before refuting it
Address the counterargument directly with evidence or reasoning

Worked example -- Code Review (deductive + inductive):

Hypothesis: "Extracting the validation logic into a shared module will reduce bug duplication across the three API handlers."

Premises:
The three handlers (
createUser
,
updateUser
,
deleteUser
) each implement the same input validation with slight variations (observed in
src/handlers/
)
In the last 6 months, 3 of 5 validation bugs were fixed in one handler but not propagated to the others (see issues #42, #57, #61)

Shared modules enforce a single source of truth for logic (deductive: if one implementation, then one place to fix)
Logical chain: Because the three handlers duplicate the same validation (premise 1), bugs fixed in one are missed in others (premise 2, inductive from 3/5 cases). A shared module means fixes apply once to all callers (deductive from shared-module semantics). Therefore, extraction will reduce bug duplication.

Counterargument (steelmanned): "Shared modules introduce coupling -- a change to validation for one handler could break the others."
Rebuttal: The handlers already share identical validation intent; the coupling is implicit and harder to maintain. Making it explicit via a shared module with parameterized options (e.g.,
validate(input, { requireEmail: true })
) makes the coupling visible and testable. The current implicit duplication is riskier because it hides the dependency.

Worked example -- Research (evidential):

Hypothesis: "Pre-training on domain-specific corpora improves downstream task performance more than increasing general corpus size for biomedical NER."

Premises:

BioBERT pre-trained on PubMed (4.5B words) outperforms BERT-Large pre-trained on general English (16B words) on 6/6 biomedical NER benchmarks (Lee et al., 2020)

SciBERT pre-trained on Semantic Scholar (3.1B words) outperforms BERT-Base on SciERC and JNLPBA despite a smaller pre-training corpus

General-domain scaling (BERT-Base to BERT-Large, 3x parameters) yields smaller gains on biomedical NER than domain adaptation (BERT-Base to BioBERT, same parameters)

Logical chain: The evidence consistently shows that domain corpus selection outweighs corpus scale for biomedical NER (evidential: these results are more likely if domain specificity matters more than scale). Three independent comparisons point the same direction, strengthening the inductive case.

Counterargument (steelmanned): "These results may not generalize beyond biomedical NER -- biomedicine has unusually specialized vocabulary that inflates the domain-adaptation advantage."

Rebuttal: Valid limitation. The hypothesis is scoped to biomedical NER specifically. However, similar domain-adaptation gains appear in legal NLP (Legal-BERT) and financial NLP (FinBERT), suggesting the pattern may generalize to other specialized domains, though that is a separate claim requiring its own evidence.

Expected: A complete argument chain with premises, logical connection, a steelmanned counterargument, and a rebuttal. The reader can follow the reasoning step by step.

On failure: If the argument feels weak, check the premises. Weak arguments usually stem from unsupported premises, not faulty logic. Find evidence for each premise or acknowledge it as an assumption. If the counterargument is stronger than the rebuttal, the hypothesis may need revision.

搭建连接假设和合理性支撑的逻辑链。

列出前提（你推理的起点事实或假设）
展示逻辑关联：说明前提如何推导出结论
钢人式呈现最强反方论点：在反驳前先表述对方最合理的反对意见
用证据或推理直接回应反方论点

示例：代码评审（演绎+归纳论证）

假设：“将校验逻辑提取到公共模块可以减少三个API处理器中的重复Bug”。

前提：
三个处理器（
createUser
、
updateUser
、
deleteUser
）各自实现了基本一致的输入校验，仅存在微小差异（在
src/handlers/
目录中可验证）
过去6个月中，5个校验Bug里有3个只在一个处理器中修复，没有同步到其他处理器（见issue #42、#57、#61）

公共模块保证逻辑的单一事实来源（演绎逻辑：如果只有一份实现，就只需要在一处修复）
逻辑链：由于三个处理器重复实现了相同的校验逻辑（前提1），一处修复的Bug会在其他地方遗漏（前提2，从3/5的案例归纳而来）。公共模块意味着一次修复对所有调用方生效（从公共模块语义演绎而来），因此提取公共模块可以减少重复Bug。

钢人式反方论点：“公共模块会引入耦合——为某一个处理器修改校验逻辑可能会破坏其他处理器”。
回应：这三个处理器的校验逻辑本意就是一致的，当前的耦合是隐式的，更难维护。通过带参数的公共模块（比如
validate(input, { requireEmail: true })
）把耦合显式化，反而会让耦合可见、可测试。当前的隐式重复隐藏了依赖，风险更高。

示例：研究（实证论证）

假设：“对于生物医学NER任务，在领域专属语料上预训练比扩大通用语料规模更能提升下游任务性能”。

前提：

在PubMed（45亿词）上预训练的BioBERT，在6个生物医学NER基准测试上的表现都优于在通用英语语料（160亿词）上预训练的BERT-Large（Lee等人，2020）

在Semantic Scholar（31亿词）上预训练的SciBERT，尽管预训练语料规模更小，在SciERC和JNLPBA任务上的表现优于BERT-Base

通用领域 scaling（BERT-Base到BERT-Large，参数扩3倍）在生物医学NER上带来的收益小于领域适配（BERT-Base到BioBERT，参数不变）

逻辑链：证据一致表明，对生物医学NER任务而言，领域语料选择的影响大于语料规模（实证逻辑：如果领域特异性比规模更重要，这些结果的出现概率更高）。三组独立对比都指向同一结论，强化了归纳说服力。

钢人式反方论点：“这些结果可能无法推广到生物医学NER之外的场景——生物医学有异常特殊的词汇表，放大了领域适配的优势”。

回应：这是合理的局限性，本假设的适用范围本来就限定在生物医学NER场景。不过在法律NLP（Legal-BERT）和金融NLP（FinBERT）场景中也观察到了类似的领域适配收益，说明该模式可能可以推广到其他专业领域，但这是另一个需要单独证据支撑的主张。

预期产出：完整的论证链，包含前提、逻辑关联、钢人式反方论点以及回应，读者可以一步步跟随推理过程。

失败处理：如果论证说服力弱，先检查前提。弱论证通常来自没有支撑的前提，而非逻辑错误。为每个前提寻找证据，或者明确将其标注为假设。如果反方论点比你的回应更有说服力，说明假设可能需要修正。

Step 4: Provide Concrete Examples

步骤4：提供具体示例

Support the argument with independently verifiable evidence. Examples are not illustrations -- they are the empirical foundation that makes the argument testable.

Provide at least one positive example that confirms the hypothesis
Provide at least one edge case or boundary example that tests limits
Ensure each example is independently verifiable: another person can reproduce or check it without relying on your interpretation
For code claims, reference specific files, line numbers, or commits
For research claims, cite specific papers, datasets, or experimental results

Example selection criteria:

Criterion	Good example	Bad example
Independently verifiable	"Issue #42 shows the bug was fixed in handler A but not B"	"We've seen this kind of bug before"
Specific	" `createUser` at line 47 re-implements the same regex as `updateUser` at line 23"	"There's duplication in the codebase"
Representative	"3 of 5 validation bugs in the last 6 months followed this pattern"	"I once saw a bug like this"
Includes edge cases	"This pattern holds for string inputs but not for file upload validation, which has handler-specific constraints"	(no limitations mentioned)

Expected: Concrete examples that a reader can verify independently. At least one positive and one edge case. Each references a specific artifact (file, line, issue, paper, dataset).

On failure: If examples are hard to find, the hypothesis may be too broad or not grounded in observable reality. Narrow the scope to what you can actually point to. Absence of examples is a signal, not a gap to paper over with vague references.

用可独立验证的证据支撑你的论证。示例不是插图，而是让论证可测试的实证基础。

至少提供一个正面示例验证假设
至少提供一个边界案例或边缘场景测试假设的适用极限
确保每个示例都可独立验证：其他人不需要依赖你的主观解读就可以复现或核验
代码相关的主张要引用具体的文件、行号或commit
研究相关的主张要引用具体的论文、数据集或实验结果

示例选择标准：

标准	好示例	坏示例
可独立验证	“Issue #42显示该Bug在处理器A中修复，但没有同步到B”	“我们之前见过这类Bug”
具体明确	“第47行的 `createUser` 和第23行的 `updateUser` 实现了完全相同的正则校验”	“代码库里有重复逻辑”
有代表性	“过去6个月里5个校验Bug中有3个符合这个规律”	“我之前见过一次类似的Bug”
包含边缘案例	“该模式对字符串输入有效，但不适用于文件上传校验，这类校验有处理器专属的约束”	(未提及任何局限性)

预期产出：读者可以独立核验的具体示例，至少包含一个正面示例和一个边缘案例，每个示例都引用具体的 artifact（文件、行号、issue、论文、数据集）。

失败处理：如果很难找到示例，说明假设可能太宽泛，或者没有基于可观察的现实。收窄适用范围到你能实际找到支撑的内容。没有示例是信号，而不是可以用模糊引用掩盖的缺口。

Step 5: Assemble the Complete Argument

步骤5：组装完整论证

Combine hypothesis, argument, and examples into the appropriate format for the context.

For code reviews -- structure the comment as:

[S] <one-line summary of the suggestion>

**Hypothesis**: <what you believe should change and why>

**Argument**: <the logical case, with premises>

**Evidence**: <specific files, lines, issues, or metrics>

**Suggestion**: <concrete code change or approach>

For PR descriptions -- structure the body as:

markdown

## Why

<Hypothesis: what problem this solves and the specific improvement claim>

## Approach

<Argument: why this approach was chosen over alternatives>

## Evidence

<Examples: benchmarks, bug references, before/after comparisons>

For ADRs (Architecture Decision Records) -- use the standard ADR format with the triad mapped to Context (hypothesis), Decision (argument), and Consequences (examples/evidence of expected outcomes)
For research writing -- map to the standard structure: Introduction states the hypothesis, Methods/Results provide argument and examples, Discussion addresses counterarguments
Review the assembled argument for:
- Logical gaps (does the conclusion actually follow from the premises?)
- Missing evidence (are there unsupported premises?)
- Unaddressed counterarguments (is the strongest objection answered?)
- Scope creep (does the argument stay within the hypothesis bounds?)

Expected: A complete, formatted argument appropriate for its context. The reader can evaluate the hypothesis, follow the reasoning, check the evidence, and consider counterarguments -- all in one coherent structure.

On failure: If the assembled argument feels disjointed, the hypothesis may be too broad. Split it into focused sub-arguments, each with its own hypothesis-argument-example triad. Two tight arguments are stronger than one sprawling one.

将假设、论证、示例组合成适配场景的格式。

代码评审场景——评论结构如下：

[S] <建议的一句话摘要>

**假设**：<你认为应该做什么变更，以及原因>

**论证**：<逻辑依据，包含前提>

**证据**：<具体的文件、行号、issue或指标>

**建议**：<具体的代码变更或落地方案>

PR描述场景——正文结构如下：