statistics-verifier

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Statistics Verifier

统计信息验证工具

Structured frameworks for verifying statistical claims, validating research methodology, and detecting analytical errors and biases.
用于验证统计主张、确认研究方法合理性以及检测分析错误与偏差的结构化框架。

Statistical Claim Verification Checklist

统计主张验证清单

Rapid Claim Assessment

快速主张评估

CLAIM VERIFICATION PROTOCOL:

1. SOURCE CHECK
   - Who made the claim?
   - What is their expertise and incentive?
   - Where was it published (peer-reviewed, preprint, press release)?
   - Is the original data or study accessible?

2. METHODOLOGY CHECK
   - What type of study (RCT, observational, survey, meta-analysis)?
   - What was the sample size and population?
   - What was the measurement method?
   - Is the statistical test appropriate for the data type?

3. NUMBER SENSE CHECK
   - Does the claim pass a basic plausibility test?
   - Are units and denominators clearly stated?
   - Absolute vs relative numbers — which is being used?
   - Is the base rate provided for context?

4. REPLICATION CHECK
   - Have other studies found similar results?
   - Are the findings consistent across populations?
   - Has anyone attempted and failed to replicate?

5. CONCLUSION CHECK
   - Does the conclusion follow from the data?
   - Are alternative explanations addressed?
   - Is the scope of the claim proportional to the evidence?
CLAIM VERIFICATION PROTOCOL:

1. SOURCE CHECK
   - Who made the claim?
   - What is their expertise and incentive?
   - Where was it published (peer-reviewed, preprint, press release)?
   - Is the original data or study accessible?

2. METHODOLOGY CHECK
   - What type of study (RCT, observational, survey, meta-analysis)?
   - What was the sample size and population?
   - What was the measurement method?
   - Is the statistical test appropriate for the data type?

3. NUMBER SENSE CHECK
   - Does the claim pass a basic plausibility test?
   - Are units and denominators clearly stated?
   - Absolute vs relative numbers — which is being used?
   - Is the base rate provided for context?

4. REPLICATION CHECK
   - Have other studies found similar results?
   - Are the findings consistent across populations?
   - Has anyone attempted and failed to replicate?

5. CONCLUSION CHECK
   - Does the conclusion follow from the data?
   - Are alternative explanations addressed?
   - Is the scope of the claim proportional to the evidence?

Claim Red Flags

主张警示信号

Red FlagWhat It MeansAction
No sample size givenCannot assess reliabilityRequest or estimate N
Only relative risk reportedMay hide small absolute effectCalculate absolute difference
"Up to X%" framingCherry-picked best caseAsk for median or mean
No confidence intervalPrecision unknownTreat with skepticism
Correlation stated as causationConfounders likely ignoredCheck study design
Self-selected sampleSelection bias likelyNote limitation
Composite endpointMay mask weak individual resultsDecompose the endpoint
Subgroup analysis highlightedLikely post-hoc fishingRequire pre-registration
警示信号含义应对措施
未给出样本量无法评估结果可靠性要求提供或估算样本量N
仅报告相对风险可能掩盖极小的绝对效应计算绝对差值
使用“高达X%”的表述可能是挑选出的最优情况要求提供中位数或平均值
未给出置信区间结果精度未知保持怀疑态度
将相关性表述为因果关系可能忽略了混杂变量检查研究设计
自我选择样本存在选择偏差的可能性标注该局限性
使用复合终点可能掩盖单个指标的弱结果拆分复合终点
突出亚组分析结果可能是事后挖掘数据的结果要求提供预注册信息

Common Statistical Errors

常见统计错误

Error Detection Framework

错误检测框架

CATEGORY 1: DESIGN ERRORS
- Sampling bias (convenience, voluntary response, survivorship)
- Confounding variables not controlled
- Insufficient sample size (underpowered study)
- No control group or inappropriate comparator
- Measurement instrument not validated

CATEGORY 2: ANALYSIS ERRORS
- Multiple comparisons without correction (p-hacking)
- Treating ordinal data as interval
- Assuming normality without checking
- Ignoring missing data patterns (MCAR vs MNAR)
- Using parametric tests on non-parametric data

CATEGORY 3: INTERPRETATION ERRORS
- Confusing statistical significance with practical significance
- Interpreting non-significant result as "no effect"
- Ecological fallacy (group-level applied to individuals)
- Simpson's paradox not checked
- Ignoring effect size and confidence intervals

CATEGORY 4: REPORTING ERRORS
- Selective reporting of favorable results
- Omitting negative or null findings
- Misleading axis scales in visualizations
- Presenting percentages without base numbers
- Switching between absolute and relative metrics
CATEGORY 1: DESIGN ERRORS
- Sampling bias (convenience, voluntary response, survivorship)
- Confounding variables not controlled
- Insufficient sample size (underpowered study)
- No control group or inappropriate comparator
- Measurement instrument not validated

CATEGORY 2: ANALYSIS ERRORS
- Multiple comparisons without correction (p-hacking)
- Treating ordinal data as interval
- Assuming normality without checking
- Ignoring missing data patterns (MCAR vs MNAR)
- Using parametric tests on non-parametric data

CATEGORY 3: INTERPRETATION ERRORS
- Confusing statistical significance with practical significance
- Interpreting non-significant result as "no effect"
- Ecological fallacy (group-level applied to individuals)
- Simpson's paradox not checked
- Ignoring effect size and confidence intervals

CATEGORY 4: REPORTING ERRORS
- Selective reporting of favorable results
- Omitting negative or null findings
- Misleading axis scales in visualizations
- Presenting percentages without base numbers
- Switching between absolute and relative metrics

Error Severity Assessment

错误严重程度评估

Error TypeSeverityImpact on Conclusion
P-hacking / HARKingCriticalInvalidates findings
Selection biasCriticalFundamentally flawed sample
Confounding not addressedHighAlternative explanations remain
Wrong statistical testHighResults may be artifactual
Multiple comparisons uncorrectedHighInflated false positive rate
Small sample without power analysisMediumMay miss real effects
Missing confidence intervalsMediumCannot judge precision
Misleading visualizationMediumMisrepresents magnitude
Minor rounding errorsLowMinimal impact
错误类型严重程度对结论的影响
P值操纵 / 事后假设(HARKing)严重结论完全无效
选择偏差严重样本存在根本性缺陷
未处理混杂变量仍存在其他解释可能
使用错误的统计检验方法结果可能是人为假象
多重比较未校正假阳性率升高
样本量小且无功效分析可能遗漏真实效应
未给出置信区间无法判断结果精度
误导性可视化歪曲结果量级
轻微舍入错误影响极小

Significance Testing Framework

显著性检验框架

Test Selection Guide

检验方法选择指南

CHOOSING THE RIGHT TEST:

DATA TYPE → COMPARISON → TEST

Continuous + 2 groups + independent → Independent t-test (or Mann-Whitney)
Continuous + 2 groups + paired     → Paired t-test (or Wilcoxon signed-rank)
Continuous + 3+ groups + independent → One-way ANOVA (or Kruskal-Wallis)
Continuous + 2+ factors            → Two-way ANOVA (or Friedman)
Continuous + continuous             → Pearson correlation (or Spearman)

Categorical + 2 groups             → Chi-square test (or Fisher's exact)
Categorical + ordered              → Cochran-Armitage trend test
Binary outcome + predictors        → Logistic regression

Time-to-event + groups             → Log-rank test / Cox regression
Count data                          → Poisson regression
Proportion + large sample           → Z-test for proportions
CHOOSING THE RIGHT TEST:

DATA TYPE → COMPARISON → TEST

Continuous + 2 groups + independent → Independent t-test (or Mann-Whitney)
Continuous + 2 groups + paired     → Paired t-test (or Wilcoxon signed-rank)
Continuous + 3+ groups + independent → One-way ANOVA (or Kruskal-Wallis)
Continuous + 2+ factors            → Two-way ANOVA (or Friedman)
Continuous + continuous             → Pearson correlation (or Spearman)

Categorical + 2 groups             → Chi-square test (or Fisher's exact)
Categorical + ordered              → Cochran-Armitage trend test
Binary outcome + predictors        → Logistic regression

Time-to-event + groups             → Log-rank test / Cox regression
Count data                          → Poisson regression
Proportion + large sample           → Z-test for proportions

P-Value Interpretation Guide

P值解读指南

P-VALUE CONTEXT:

p-value = P(data this extreme | null hypothesis is true)

COMMON MISINTERPRETATIONS:
  p = 0.03 does NOT mean:
  - "There is a 3% chance the result is due to chance"
  - "There is a 97% probability the hypothesis is true"
  - "The effect is large or important"
  - "The study will replicate"

  p = 0.03 DOES mean:
  - If the null hypothesis were true, data this extreme
    would occur about 3% of the time by chance alone.

THRESHOLDS (conventional, not absolute):
  p < 0.001  — strong evidence against null
  p < 0.01   — moderate evidence against null
  p < 0.05   — conventional threshold (context-dependent)
  p > 0.05   — insufficient evidence to reject null
                (NOT evidence of no effect)

ALWAYS COMPLEMENT WITH:
  - Effect size (Cohen's d, odds ratio, etc.)
  - Confidence interval (range of plausible values)
  - Practical significance (is the effect meaningful?)
  - Study power (could it have detected a real effect?)
P-VALUE CONTEXT:

p-value = P(data this extreme | null hypothesis is true)

COMMON MISINTERPRETATIONS:
  p = 0.03 does NOT mean:
  - "There is a 3% chance the result is due to chance"
  - "There is a 97% probability the hypothesis is true"
  - "The effect is large or important"
  - "The study will replicate"

  p = 0.03 DOES mean:
  - If the null hypothesis were true, data this extreme
    would occur about 3% of the time by chance alone.

THRESHOLDS (conventional, not absolute):
  p < 0.001  — strong evidence against null
  p < 0.01   — moderate evidence against null
  p < 0.05   — conventional threshold (context-dependent)
  p > 0.05   — insufficient evidence to reject null
                (NOT evidence of no effect)

ALWAYS COMPLEMENT WITH:
  - Effect size (Cohen's d, odds ratio, etc.)
  - Confidence interval (range of plausible values)
  - Practical significance (is the effect meaningful?)
  - Study power (could it have detected a real effect?)

Multiple Comparisons Correction

多重比较校正方法

MethodWhen to UseConservativeness
BonferroniFew comparisons, need strong controlVery conservative
Holm-BonferroniModerate comparisons, step-downLess conservative
Benjamini-HochbergMany comparisons (FDR control)Liberal
Tukey's HSDAll pairwise comparisons after ANOVAModerate
Dunnett'sMultiple treatments vs one controlModerate
方法适用场景保守程度
Bonferroni少量比较,需要严格控制错误率非常保守
Holm-Bonferroni中等数量比较,逐步校正保守程度较低
Benjamini-Hochberg大量比较(控制错误发现率FDR)宽松
Tukey's HSDANOVA后的所有两两比较中等保守
Dunnett's多个处理组与一个对照组比较中等保守

Sample Size Validation

样本量验证

Quick Reference Table

快速参考表

MINIMUM SAMPLE SIZE GUIDELINES:

Survey (population estimate):
  ±3% margin, 95% CI → n ≈ 1,067
  ±5% margin, 95% CI → n ≈ 385
  ±10% margin, 95% CI → n ≈ 97

A/B Test (detecting 5% relative lift):
  Baseline 10% conversion → n ≈ 3,200 per group
  Baseline 5% conversion  → n ≈ 6,400 per group
  Baseline 2% conversion  → n ≈ 16,000 per group

Clinical trial (medium effect d=0.5):
  Two-group comparison, 80% power → n ≈ 64 per group
  Two-group comparison, 90% power → n ≈ 86 per group

Correlation (detecting r=0.3):
  80% power, alpha=0.05 → n ≈ 85
  90% power, alpha=0.05 → n ≈ 113
MINIMUM SAMPLE SIZE GUIDELINES:

Survey (population estimate):
  ±3% margin, 95% CI → n ≈ 1,067
  ±5% margin, 95% CI → n ≈ 385
  ±10% margin, 95% CI → n ≈ 97

A/B Test (detecting 5% relative lift):
  Baseline 10% conversion → n ≈ 3,200 per group
  Baseline 5% conversion  → n ≈ 6,400 per group
  Baseline 2% conversion  → n ≈ 16,000 per group

Clinical trial (medium effect d=0.5):
  Two-group comparison, 80% power → n ≈ 64 per group
  Two-group comparison, 90% power → n ≈ 86 per group

Correlation (detecting r=0.3):
  80% power, alpha=0.05 → n ≈ 85
  90% power, alpha=0.05 → n ≈ ll3

Power Analysis Checklist

功效分析清单

ParameterMust SpecifySource
Alpha (Type I error rate)YesConvention (usually 0.05)
Power (1 - Type II error)YesUsually 0.80 or 0.90
Effect sizeYesPrior research or MCID
Variance / SDYesPilot data or literature
Sample sizeCalculatedOutput of power analysis
Attrition rateRecommendedInflate N by expected dropout
参数是否必须指定来源
Alpha(I类错误率)常规标准(通常为0.05)
功效(1 - II类错误率)通常为0.80或0.90
效应量过往研究结果或最小临床重要差值(MCID)
方差 / 标准差预试验数据或文献
样本量计算得出功效分析的输出结果
流失率建议指定根据预期流失率放大样本量N

Correlation vs Causation Checklist

相关性与因果关系清单

Bradford Hill Criteria for Causation

因果关系的Bradford Hill准则

DOES CORRELATION IMPLY CAUSATION? CHECK:

1. STRENGTH           Is the association large?
                      Larger effects harder to explain away.

2. CONSISTENCY        Replicated across settings, populations?
                      Multiple studies, same finding.

3. SPECIFICITY        Is X linked specifically to Y (not everything)?
                      Less useful for multifactorial diseases.

4. TEMPORALITY        Does X precede Y in time?
                      REQUIRED — cause must come before effect.

5. BIOLOGICAL GRADIENT  Does more X produce more Y (dose-response)?
                        Strong support for causation.

6. PLAUSIBILITY       Is there a credible mechanism?
                      Based on current knowledge.

7. COHERENCE          Consistent with known biology/theory?
                      No conflict with established facts.

8. EXPERIMENT         Does removing X reduce Y?
                      Strongest evidence (RCT).

9. ANALOGY            Similar exposures cause similar effects?
                      Weakest criterion, supporting only.

VERDICT:
  Criteria 1-3 met + Temporality → Suggestive of causation
  Criteria 1-6 met + Experiment  → Strong evidence of causation
  Only correlation observed      → Association only, cannot infer cause
DOES CORRELATION IMPLY CAUSATION? CHECK:

1. STRENGTH           Is the association large?
                      Larger effects harder to explain away.

2. CONSISTENCY        Replicated across settings, populations?
                      Multiple studies, same finding.

3. SPECIFICITY        Is X linked specifically to Y (not everything)?
                      Less useful for multifactorial diseases.

4. TEMPORALITY        Does X precede Y in time?
                      REQUIRED — cause must come before effect.

5. BIOLOGICAL GRADIENT  Does more X produce more Y (dose-response)?
                        Strong support for causation.

6. PLAUSIBILITY       Is there a credible mechanism?
                      Based on current knowledge.

7. COHERENCE          Consistent with known biology/theory?
                      No conflict with established facts.

8. EXPERIMENT         Does removing X reduce Y?
                      Strongest evidence (RCT).

9. ANALOGY            Similar exposures cause similar effects?
                      Weakest criterion, supporting only.

VERDICT:
  Criteria 1-3 met + Temporality → Suggestive of causation
  Criteria 1-6 met + Experiment  → Strong evidence of causation
  Only correlation observed      → Association only, cannot infer cause

Common Third-Variable Confounders

常见第三变量混杂因素

Observed AssociationLikely Confounder
Ice cream sales and drowningWarm weather (season)
Shoe size and reading abilityAge
Hospital visits and death rateIllness severity
Organic food and healthSocioeconomic status
Screen time and depressionSocial isolation, sleep
观测到的关联可能的混杂因素
冰淇淋销量与溺水事件温暖天气(季节)
鞋码与阅读能力年龄
医院就诊次数与死亡率病情严重程度
有机食品与健康状况社会经济地位
屏幕使用时间与抑郁社交孤立、睡眠质量

Survey Methodology Review

调查方法审核

Survey Quality Assessment

调查质量评估

SURVEY METHODOLOGY CHECKLIST:

SAMPLING:
- [ ] Probability sampling method described?
- [ ] Sampling frame defined and appropriate?
- [ ] Response rate reported (acceptable: >60% mail, >80% in-person)?
- [ ] Non-response bias assessed?

QUESTIONNAIRE:
- [ ] Questions validated or adapted from validated instruments?
- [ ] Leading or double-barreled questions absent?
- [ ] Response options balanced and exhaustive?
- [ ] Pilot tested with target population?

ADMINISTRATION:
- [ ] Mode (online, phone, in-person) appropriate?
- [ ] Anonymity/confidentiality assured?
- [ ] Informed consent obtained?
- [ ] Social desirability bias mitigated?

ANALYSIS:
- [ ] Weighting applied for non-response or oversampling?
- [ ] Margin of error and confidence level reported?
- [ ] Subgroup analyses pre-specified (not exploratory)?
SURVEY METHODOLOGY CHECKLIST:

SAMPLING:
- [ ] Probability sampling method described?
- [ ] Sampling frame defined and appropriate?
- [ ] Response rate reported (acceptable: >60% mail, >80% in-person)?
- [ ] Non-response bias assessed?

QUESTIONNAIRE:
- [ ] Questions validated or adapted from validated instruments?
- [ ] Leading or double-barreled questions absent?
- [ ] Response options balanced and exhaustive?
- [ ] Pilot tested with target population?

ADMINISTRATION:
- [ ] Mode (online, phone, in-person) appropriate?
- [ ] Anonymity/confidentiality assured?
- [ ] Informed consent obtained?
- [ ] Social desirability bias mitigated?

ANALYSIS:
- [ ] Weighting applied for non-response or oversampling?
- [ ] Margin of error and confidence level reported?
- [ ] Subgroup analyses pre-specified (not exploratory)?

Data Visualization Integrity Checks

数据可视化完整性检查

Chart Audit Checklist

图表审核清单

CheckWhat to Look ForFail Condition
Y-axis starts at zero (bar charts)Truncated axis exaggerates differencesAxis starts above zero without clear label
Consistent scaleBoth axes have proportional incrementsNon-linear scale without explanation
Area proportional to dataBubble/icon size matches valuesArea misrepresents magnitude
Time axis evenly spacedEqual intervals between data pointsUneven spacing compresses/expands trends
Appropriate chart typeData type matches visualizationPie chart with 20+ categories
Context providedBenchmarks, comparisons, baselinesSingle data point with no reference
Source citedData origin traceableNo source attribution
Dual axes used responsiblyTwo Y-axes can create false correlationsArbitrary scaling implies relationship
检查项检查内容不合格情况
柱状图Y轴从0开始截断Y轴会夸大差异Y轴从非0值开始且无明确标注
刻度一致两个轴的增量成比例使用非线性刻度且未说明
面积与数据成比例气泡/图标大小匹配数值面积错误呈现数据量级
时间轴间隔均匀数据点间隔相等间隔不均压缩/放大趋势
图表类型合适数据类型与可视化方式匹配用饼图展示20+个类别
提供上下文包含基准、对比、基线数据仅展示单个数据点且无参考
标注数据来源可追溯数据出处未标注来源
合理使用双Y轴双Y轴可能造成虚假关联随意缩放轴刻度暗示关联

Misleading Visualization Patterns

误导性可视化模式

WATCH FOR THESE TRICKS:

1. TRUNCATED AXIS
   Small differences look dramatic when baseline removed.
   FIX: Always check if y-axis starts at zero for bar charts.

2. CHERRY-PICKED TIME WINDOW
   Start/end dates chosen to show desired trend.
   FIX: Ask for longer time series with consistent intervals.

3. 3D EFFECTS
   Perspective distortion makes sizes unequal.
   FIX: Use flat 2D charts for accurate comparison.

4. DUAL AXIS MANIPULATION
   Two y-axes scaled to create apparent correlation.
   FIX: Normalize data or use separate panels.

5. CUMULATIVE VS DAILY
   Cumulative charts always go up — hides declining rates.
   FIX: Show rate of change alongside cumulative.
WATCH FOR THESE TRICKS:

1. TRUNCATED AXIS
   Small differences look dramatic when baseline removed.
   FIX: Always check if y-axis starts at zero for bar charts.

2. CHERRY-PICKED TIME WINDOW
   Start/end dates chosen to show desired trend.
   FIX: Ask for longer time series with consistent intervals.

3. 3D EFFECTS
   Perspective distortion makes sizes unequal.
   FIX: Use flat 2D charts for accurate comparison.

4. DUAL AXIS MANIPULATION
   Two y-axes scaled to create apparent correlation.
   FIX: Normalize data or use separate panels.

5. CUMULATIVE VS DAILY
   Cumulative charts always go up — hides declining rates.
   FIX: Show rate of change alongside cumulative.

Bias Detection Framework

偏差检测框架

Cognitive Biases in Data Analysis

数据分析中的认知偏差

BIAS DETECTION CHECKLIST:

CONFIRMATION BIAS
- Are they only presenting data that supports their hypothesis?
- Were negative results reported?
- Was the analysis plan pre-registered?

ANCHORING BIAS
- Is the first number presented influencing interpretation of later data?
- Are comparisons made to appropriate benchmarks?

SURVIVORSHIP BIAS
- Are only successful cases included (ignoring failures)?
- Is the denominator complete (not just survivors)?

AVAILABILITY BIAS
- Are dramatic or recent events overweighted?
- Is systematic data used rather than anecdotal evidence?

PUBLICATION BIAS
- Is there a funnel plot asymmetry (meta-analyses)?
- Are null results published or only significant ones?

TEXAS SHARPSHOOTER FALLACY
- Were clusters or patterns found after looking at data?
- Was the hypothesis formed before or after seeing results?
BIAS DETECTION CHECKLIST:

CONFIRMATION BIAS
- Are they only presenting data that supports their hypothesis?
- Were negative results reported?
- Was the analysis plan pre-registered?

ANCHORING BIAS
- Is the first number presented influencing interpretation of later data?
- Are comparisons made to appropriate benchmarks?

SURVIVORSHIP BIAS
- Are only successful cases included (ignoring failures)?
- Is the denominator complete (not just survivors)?

AVAILABILITY BIAS
- Are dramatic or recent events overweighted?
- Is systematic data used rather than anecdotal evidence?

PUBLICATION BIAS
- Is there a funnel plot asymmetry (meta-analyses)?
- Are null results published or only significant ones?

TEXAS SHARPSHOOTER FALLACY
- Were clusters or patterns found after looking at data?
- Was the hypothesis formed before or after seeing results?

Bias Severity Matrix

偏差严重程度矩阵

BiasDetection MethodMitigation
Selection biasCompare sample to population demographicsProbability sampling, weighting
Measurement biasCheck instrument validity and calibrationValidated instruments, blinding
Reporting biasLook for asymmetric funnel plotsPre-registration, open data
Recall biasCompare to objective recordsProspective data collection
Observer biasCheck if assessors were blindedDouble-blind design
Attrition biasCompare completers vs dropoutsIntention-to-treat analysis
偏差类型检测方法缓解措施
选择偏差对比样本与总体人口统计数据概率抽样、加权处理
测量偏差检查工具的有效性与校准情况使用经验证的工具、盲法
报告偏差查看漏斗图是否不对称预注册、开放数据
回忆偏差与客观记录对比前瞻性数据收集
观察者偏差评估者是否采用盲法双盲设计
流失偏差对比完成者与退出者意向性分析(ITT)

Reproducibility Checklist

可重复性清单

Study Reproducibility Assessment

研究可重复性评估

REPRODUCIBILITY REQUIREMENTS:

DATA AVAILABILITY:
- [ ] Raw data accessible (repository, supplement, on request)?
- [ ] Data dictionary / codebook provided?
- [ ] Data collection protocol documented?

CODE / ANALYSIS:
- [ ] Analysis code shared (GitHub, OSF, supplement)?
- [ ] Software versions and packages specified?
- [ ] Random seeds set for reproducible computation?
- [ ] Pipeline documented end-to-end?

METHODOLOGY:
- [ ] Study pre-registered (OSF, ClinicalTrials.gov)?
- [ ] Deviations from protocol documented?
- [ ] All outcome measures reported (not just significant ones)?
- [ ] Sensitivity analyses included?

REPORTING:
- [ ] Follows reporting guidelines (CONSORT, STROBE, PRISMA)?
- [ ] Effect sizes and confidence intervals reported?
- [ ] Power analysis or sample size justification provided?
- [ ] Limitations section thorough and honest?
REPRODUCIBILITY REQUIREMENTS:

DATA AVAILABILITY:
- [ ] Raw data accessible (repository, supplement, on request)?
- [ ] Data dictionary / codebook provided?
- [ ] Data collection protocol documented?

CODE / ANALYSIS:
- [ ] Analysis code shared (GitHub, OSF, supplement)?
- [ ] Software versions and packages specified?
- [ ] Random seeds set for reproducible computation?
- [ ] Pipeline documented end-to-end?

METHODOLOGY:
- [ ] Study pre-registered (OSF, ClinicalTrials.gov)?
- [ ] Deviations from protocol documented?
- [ ] All outcome measures reported (not just significant ones)?
- [ ] Sensitivity analyses included?

REPORTING:
- [ ] Follows reporting guidelines (CONSORT, STROBE, PRISMA)?
- [ ] Effect sizes and confidence intervals reported?
- [ ] Power analysis or sample size justification provided?
- [ ] Limitations section thorough and honest?

Reporting Standards by Study Type

按研究类型划分的报告标准

Study TypeGuidelineKey Elements
Randomized trialCONSORTFlow diagram, ITT analysis, blinding
Observational studySTROBESelection criteria, confounders, missing data
Systematic reviewPRISMASearch strategy, inclusion criteria, risk of bias
Diagnostic accuracySTARDIndex test, reference standard, flow diagram
Qualitative researchCOREQResearch team, study design, data analysis
Prediction modelTRIPODModel development, validation, performance
研究类型指南核心要素
随机对照试验(RCT)CONSORT流程图、意向性分析、盲法
观察性研究STROBE选择标准、混杂变量、缺失数据
系统综述PRISMA检索策略、纳入标准、偏倚风险
诊断准确性研究STARD指数试验、参考标准、流程图
定性研究COREQ研究团队、研究设计、数据分析
预测模型TRIPOD模型开发、验证、性能

Quick Verification Workflow

快速验证工作流

FAST VERIFICATION (5 minutes):

1. Read the claim carefully — what exactly is being stated?
2. Check: source, sample size, study type
3. Ask: absolute or relative? What is the base rate?
4. Check: confidence interval or margin of error given?
5. Search: has this been replicated independently?

VERDICT CATEGORIES:
  VERIFIED    — multiple strong sources, robust methodology
  PLAUSIBLE   — reasonable evidence, some limitations
  UNCERTAIN   — mixed evidence, methodology concerns
  MISLEADING  — technically true but presented deceptively
  FALSE       — contradicted by strong evidence
  UNVERIFIABLE — cannot assess with available information
FAST VERIFICATION (5 minutes):

1. Read the claim carefully — what exactly is being stated?
2. Check: source, sample size, study type
3. Ask: absolute or relative? What is the base rate?
4. Check: confidence interval or margin of error given?
5. Search: has this been replicated independently?

VERDICT CATEGORIES:
  VERIFIED    — multiple strong sources, robust methodology
  PLAUSIBLE   — reasonable evidence, some limitations
  UNCERTAIN   — mixed evidence, methodology concerns
  MISLEADING  — technically true but presented deceptively
  FALSE       — contradicted by strong evidence
  UNVERIFIABLE — cannot assess with available information

See Also

另请参阅

  • Data Science
  • Research Presenter
  • Product Analytics
  • 数据科学
  • 研究成果展示
  • 产品分析