statistics-verifier
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStatistics Verifier
统计信息验证工具
Structured frameworks for verifying statistical claims, validating research methodology, and detecting analytical errors and biases.
用于验证统计主张、确认研究方法合理性以及检测分析错误与偏差的结构化框架。
Statistical Claim Verification Checklist
统计主张验证清单
Rapid Claim Assessment
快速主张评估
CLAIM VERIFICATION PROTOCOL:
1. SOURCE CHECK
- Who made the claim?
- What is their expertise and incentive?
- Where was it published (peer-reviewed, preprint, press release)?
- Is the original data or study accessible?
2. METHODOLOGY CHECK
- What type of study (RCT, observational, survey, meta-analysis)?
- What was the sample size and population?
- What was the measurement method?
- Is the statistical test appropriate for the data type?
3. NUMBER SENSE CHECK
- Does the claim pass a basic plausibility test?
- Are units and denominators clearly stated?
- Absolute vs relative numbers — which is being used?
- Is the base rate provided for context?
4. REPLICATION CHECK
- Have other studies found similar results?
- Are the findings consistent across populations?
- Has anyone attempted and failed to replicate?
5. CONCLUSION CHECK
- Does the conclusion follow from the data?
- Are alternative explanations addressed?
- Is the scope of the claim proportional to the evidence?CLAIM VERIFICATION PROTOCOL:
1. SOURCE CHECK
- Who made the claim?
- What is their expertise and incentive?
- Where was it published (peer-reviewed, preprint, press release)?
- Is the original data or study accessible?
2. METHODOLOGY CHECK
- What type of study (RCT, observational, survey, meta-analysis)?
- What was the sample size and population?
- What was the measurement method?
- Is the statistical test appropriate for the data type?
3. NUMBER SENSE CHECK
- Does the claim pass a basic plausibility test?
- Are units and denominators clearly stated?
- Absolute vs relative numbers — which is being used?
- Is the base rate provided for context?
4. REPLICATION CHECK
- Have other studies found similar results?
- Are the findings consistent across populations?
- Has anyone attempted and failed to replicate?
5. CONCLUSION CHECK
- Does the conclusion follow from the data?
- Are alternative explanations addressed?
- Is the scope of the claim proportional to the evidence?Claim Red Flags
主张警示信号
| Red Flag | What It Means | Action |
|---|---|---|
| No sample size given | Cannot assess reliability | Request or estimate N |
| Only relative risk reported | May hide small absolute effect | Calculate absolute difference |
| "Up to X%" framing | Cherry-picked best case | Ask for median or mean |
| No confidence interval | Precision unknown | Treat with skepticism |
| Correlation stated as causation | Confounders likely ignored | Check study design |
| Self-selected sample | Selection bias likely | Note limitation |
| Composite endpoint | May mask weak individual results | Decompose the endpoint |
| Subgroup analysis highlighted | Likely post-hoc fishing | Require pre-registration |
| 警示信号 | 含义 | 应对措施 |
|---|---|---|
| 未给出样本量 | 无法评估结果可靠性 | 要求提供或估算样本量N |
| 仅报告相对风险 | 可能掩盖极小的绝对效应 | 计算绝对差值 |
| 使用“高达X%”的表述 | 可能是挑选出的最优情况 | 要求提供中位数或平均值 |
| 未给出置信区间 | 结果精度未知 | 保持怀疑态度 |
| 将相关性表述为因果关系 | 可能忽略了混杂变量 | 检查研究设计 |
| 自我选择样本 | 存在选择偏差的可能性 | 标注该局限性 |
| 使用复合终点 | 可能掩盖单个指标的弱结果 | 拆分复合终点 |
| 突出亚组分析结果 | 可能是事后挖掘数据的结果 | 要求提供预注册信息 |
Common Statistical Errors
常见统计错误
Error Detection Framework
错误检测框架
CATEGORY 1: DESIGN ERRORS
- Sampling bias (convenience, voluntary response, survivorship)
- Confounding variables not controlled
- Insufficient sample size (underpowered study)
- No control group or inappropriate comparator
- Measurement instrument not validated
CATEGORY 2: ANALYSIS ERRORS
- Multiple comparisons without correction (p-hacking)
- Treating ordinal data as interval
- Assuming normality without checking
- Ignoring missing data patterns (MCAR vs MNAR)
- Using parametric tests on non-parametric data
CATEGORY 3: INTERPRETATION ERRORS
- Confusing statistical significance with practical significance
- Interpreting non-significant result as "no effect"
- Ecological fallacy (group-level applied to individuals)
- Simpson's paradox not checked
- Ignoring effect size and confidence intervals
CATEGORY 4: REPORTING ERRORS
- Selective reporting of favorable results
- Omitting negative or null findings
- Misleading axis scales in visualizations
- Presenting percentages without base numbers
- Switching between absolute and relative metricsCATEGORY 1: DESIGN ERRORS
- Sampling bias (convenience, voluntary response, survivorship)
- Confounding variables not controlled
- Insufficient sample size (underpowered study)
- No control group or inappropriate comparator
- Measurement instrument not validated
CATEGORY 2: ANALYSIS ERRORS
- Multiple comparisons without correction (p-hacking)
- Treating ordinal data as interval
- Assuming normality without checking
- Ignoring missing data patterns (MCAR vs MNAR)
- Using parametric tests on non-parametric data
CATEGORY 3: INTERPRETATION ERRORS
- Confusing statistical significance with practical significance
- Interpreting non-significant result as "no effect"
- Ecological fallacy (group-level applied to individuals)
- Simpson's paradox not checked
- Ignoring effect size and confidence intervals
CATEGORY 4: REPORTING ERRORS
- Selective reporting of favorable results
- Omitting negative or null findings
- Misleading axis scales in visualizations
- Presenting percentages without base numbers
- Switching between absolute and relative metricsError Severity Assessment
错误严重程度评估
| Error Type | Severity | Impact on Conclusion |
|---|---|---|
| P-hacking / HARKing | Critical | Invalidates findings |
| Selection bias | Critical | Fundamentally flawed sample |
| Confounding not addressed | High | Alternative explanations remain |
| Wrong statistical test | High | Results may be artifactual |
| Multiple comparisons uncorrected | High | Inflated false positive rate |
| Small sample without power analysis | Medium | May miss real effects |
| Missing confidence intervals | Medium | Cannot judge precision |
| Misleading visualization | Medium | Misrepresents magnitude |
| Minor rounding errors | Low | Minimal impact |
| 错误类型 | 严重程度 | 对结论的影响 |
|---|---|---|
| P值操纵 / 事后假设(HARKing) | 严重 | 结论完全无效 |
| 选择偏差 | 严重 | 样本存在根本性缺陷 |
| 未处理混杂变量 | 高 | 仍存在其他解释可能 |
| 使用错误的统计检验方法 | 高 | 结果可能是人为假象 |
| 多重比较未校正 | 高 | 假阳性率升高 |
| 样本量小且无功效分析 | 中 | 可能遗漏真实效应 |
| 未给出置信区间 | 中 | 无法判断结果精度 |
| 误导性可视化 | 中 | 歪曲结果量级 |
| 轻微舍入错误 | 低 | 影响极小 |
Significance Testing Framework
显著性检验框架
Test Selection Guide
检验方法选择指南
CHOOSING THE RIGHT TEST:
DATA TYPE → COMPARISON → TEST
Continuous + 2 groups + independent → Independent t-test (or Mann-Whitney)
Continuous + 2 groups + paired → Paired t-test (or Wilcoxon signed-rank)
Continuous + 3+ groups + independent → One-way ANOVA (or Kruskal-Wallis)
Continuous + 2+ factors → Two-way ANOVA (or Friedman)
Continuous + continuous → Pearson correlation (or Spearman)
Categorical + 2 groups → Chi-square test (or Fisher's exact)
Categorical + ordered → Cochran-Armitage trend test
Binary outcome + predictors → Logistic regression
Time-to-event + groups → Log-rank test / Cox regression
Count data → Poisson regression
Proportion + large sample → Z-test for proportionsCHOOSING THE RIGHT TEST:
DATA TYPE → COMPARISON → TEST
Continuous + 2 groups + independent → Independent t-test (or Mann-Whitney)
Continuous + 2 groups + paired → Paired t-test (or Wilcoxon signed-rank)
Continuous + 3+ groups + independent → One-way ANOVA (or Kruskal-Wallis)
Continuous + 2+ factors → Two-way ANOVA (or Friedman)
Continuous + continuous → Pearson correlation (or Spearman)
Categorical + 2 groups → Chi-square test (or Fisher's exact)
Categorical + ordered → Cochran-Armitage trend test
Binary outcome + predictors → Logistic regression
Time-to-event + groups → Log-rank test / Cox regression
Count data → Poisson regression
Proportion + large sample → Z-test for proportionsP-Value Interpretation Guide
P值解读指南
P-VALUE CONTEXT:
p-value = P(data this extreme | null hypothesis is true)
COMMON MISINTERPRETATIONS:
p = 0.03 does NOT mean:
- "There is a 3% chance the result is due to chance"
- "There is a 97% probability the hypothesis is true"
- "The effect is large or important"
- "The study will replicate"
p = 0.03 DOES mean:
- If the null hypothesis were true, data this extreme
would occur about 3% of the time by chance alone.
THRESHOLDS (conventional, not absolute):
p < 0.001 — strong evidence against null
p < 0.01 — moderate evidence against null
p < 0.05 — conventional threshold (context-dependent)
p > 0.05 — insufficient evidence to reject null
(NOT evidence of no effect)
ALWAYS COMPLEMENT WITH:
- Effect size (Cohen's d, odds ratio, etc.)
- Confidence interval (range of plausible values)
- Practical significance (is the effect meaningful?)
- Study power (could it have detected a real effect?)P-VALUE CONTEXT:
p-value = P(data this extreme | null hypothesis is true)
COMMON MISINTERPRETATIONS:
p = 0.03 does NOT mean:
- "There is a 3% chance the result is due to chance"
- "There is a 97% probability the hypothesis is true"
- "The effect is large or important"
- "The study will replicate"
p = 0.03 DOES mean:
- If the null hypothesis were true, data this extreme
would occur about 3% of the time by chance alone.
THRESHOLDS (conventional, not absolute):
p < 0.001 — strong evidence against null
p < 0.01 — moderate evidence against null
p < 0.05 — conventional threshold (context-dependent)
p > 0.05 — insufficient evidence to reject null
(NOT evidence of no effect)
ALWAYS COMPLEMENT WITH:
- Effect size (Cohen's d, odds ratio, etc.)
- Confidence interval (range of plausible values)
- Practical significance (is the effect meaningful?)
- Study power (could it have detected a real effect?)Multiple Comparisons Correction
多重比较校正方法
| Method | When to Use | Conservativeness |
|---|---|---|
| Bonferroni | Few comparisons, need strong control | Very conservative |
| Holm-Bonferroni | Moderate comparisons, step-down | Less conservative |
| Benjamini-Hochberg | Many comparisons (FDR control) | Liberal |
| Tukey's HSD | All pairwise comparisons after ANOVA | Moderate |
| Dunnett's | Multiple treatments vs one control | Moderate |
| 方法 | 适用场景 | 保守程度 |
|---|---|---|
| Bonferroni | 少量比较,需要严格控制错误率 | 非常保守 |
| Holm-Bonferroni | 中等数量比较,逐步校正 | 保守程度较低 |
| Benjamini-Hochberg | 大量比较(控制错误发现率FDR) | 宽松 |
| Tukey's HSD | ANOVA后的所有两两比较 | 中等保守 |
| Dunnett's | 多个处理组与一个对照组比较 | 中等保守 |
Sample Size Validation
样本量验证
Quick Reference Table
快速参考表
MINIMUM SAMPLE SIZE GUIDELINES:
Survey (population estimate):
±3% margin, 95% CI → n ≈ 1,067
±5% margin, 95% CI → n ≈ 385
±10% margin, 95% CI → n ≈ 97
A/B Test (detecting 5% relative lift):
Baseline 10% conversion → n ≈ 3,200 per group
Baseline 5% conversion → n ≈ 6,400 per group
Baseline 2% conversion → n ≈ 16,000 per group
Clinical trial (medium effect d=0.5):
Two-group comparison, 80% power → n ≈ 64 per group
Two-group comparison, 90% power → n ≈ 86 per group
Correlation (detecting r=0.3):
80% power, alpha=0.05 → n ≈ 85
90% power, alpha=0.05 → n ≈ 113MINIMUM SAMPLE SIZE GUIDELINES:
Survey (population estimate):
±3% margin, 95% CI → n ≈ 1,067
±5% margin, 95% CI → n ≈ 385
±10% margin, 95% CI → n ≈ 97
A/B Test (detecting 5% relative lift):
Baseline 10% conversion → n ≈ 3,200 per group
Baseline 5% conversion → n ≈ 6,400 per group
Baseline 2% conversion → n ≈ 16,000 per group
Clinical trial (medium effect d=0.5):
Two-group comparison, 80% power → n ≈ 64 per group
Two-group comparison, 90% power → n ≈ 86 per group
Correlation (detecting r=0.3):
80% power, alpha=0.05 → n ≈ 85
90% power, alpha=0.05 → n ≈ ll3Power Analysis Checklist
功效分析清单
| Parameter | Must Specify | Source |
|---|---|---|
| Alpha (Type I error rate) | Yes | Convention (usually 0.05) |
| Power (1 - Type II error) | Yes | Usually 0.80 or 0.90 |
| Effect size | Yes | Prior research or MCID |
| Variance / SD | Yes | Pilot data or literature |
| Sample size | Calculated | Output of power analysis |
| Attrition rate | Recommended | Inflate N by expected dropout |
| 参数 | 是否必须指定 | 来源 |
|---|---|---|
| Alpha(I类错误率) | 是 | 常规标准(通常为0.05) |
| 功效(1 - II类错误率) | 是 | 通常为0.80或0.90 |
| 效应量 | 是 | 过往研究结果或最小临床重要差值(MCID) |
| 方差 / 标准差 | 是 | 预试验数据或文献 |
| 样本量 | 计算得出 | 功效分析的输出结果 |
| 流失率 | 建议指定 | 根据预期流失率放大样本量N |
Correlation vs Causation Checklist
相关性与因果关系清单
Bradford Hill Criteria for Causation
因果关系的Bradford Hill准则
DOES CORRELATION IMPLY CAUSATION? CHECK:
1. STRENGTH Is the association large?
Larger effects harder to explain away.
2. CONSISTENCY Replicated across settings, populations?
Multiple studies, same finding.
3. SPECIFICITY Is X linked specifically to Y (not everything)?
Less useful for multifactorial diseases.
4. TEMPORALITY Does X precede Y in time?
REQUIRED — cause must come before effect.
5. BIOLOGICAL GRADIENT Does more X produce more Y (dose-response)?
Strong support for causation.
6. PLAUSIBILITY Is there a credible mechanism?
Based on current knowledge.
7. COHERENCE Consistent with known biology/theory?
No conflict with established facts.
8. EXPERIMENT Does removing X reduce Y?
Strongest evidence (RCT).
9. ANALOGY Similar exposures cause similar effects?
Weakest criterion, supporting only.
VERDICT:
Criteria 1-3 met + Temporality → Suggestive of causation
Criteria 1-6 met + Experiment → Strong evidence of causation
Only correlation observed → Association only, cannot infer causeDOES CORRELATION IMPLY CAUSATION? CHECK:
1. STRENGTH Is the association large?
Larger effects harder to explain away.
2. CONSISTENCY Replicated across settings, populations?
Multiple studies, same finding.
3. SPECIFICITY Is X linked specifically to Y (not everything)?
Less useful for multifactorial diseases.
4. TEMPORALITY Does X precede Y in time?
REQUIRED — cause must come before effect.
5. BIOLOGICAL GRADIENT Does more X produce more Y (dose-response)?
Strong support for causation.
6. PLAUSIBILITY Is there a credible mechanism?
Based on current knowledge.
7. COHERENCE Consistent with known biology/theory?
No conflict with established facts.
8. EXPERIMENT Does removing X reduce Y?
Strongest evidence (RCT).
9. ANALOGY Similar exposures cause similar effects?
Weakest criterion, supporting only.
VERDICT:
Criteria 1-3 met + Temporality → Suggestive of causation
Criteria 1-6 met + Experiment → Strong evidence of causation
Only correlation observed → Association only, cannot infer causeCommon Third-Variable Confounders
常见第三变量混杂因素
| Observed Association | Likely Confounder |
|---|---|
| Ice cream sales and drowning | Warm weather (season) |
| Shoe size and reading ability | Age |
| Hospital visits and death rate | Illness severity |
| Organic food and health | Socioeconomic status |
| Screen time and depression | Social isolation, sleep |
| 观测到的关联 | 可能的混杂因素 |
|---|---|
| 冰淇淋销量与溺水事件 | 温暖天气(季节) |
| 鞋码与阅读能力 | 年龄 |
| 医院就诊次数与死亡率 | 病情严重程度 |
| 有机食品与健康状况 | 社会经济地位 |
| 屏幕使用时间与抑郁 | 社交孤立、睡眠质量 |
Survey Methodology Review
调查方法审核
Survey Quality Assessment
调查质量评估
SURVEY METHODOLOGY CHECKLIST:
SAMPLING:
- [ ] Probability sampling method described?
- [ ] Sampling frame defined and appropriate?
- [ ] Response rate reported (acceptable: >60% mail, >80% in-person)?
- [ ] Non-response bias assessed?
QUESTIONNAIRE:
- [ ] Questions validated or adapted from validated instruments?
- [ ] Leading or double-barreled questions absent?
- [ ] Response options balanced and exhaustive?
- [ ] Pilot tested with target population?
ADMINISTRATION:
- [ ] Mode (online, phone, in-person) appropriate?
- [ ] Anonymity/confidentiality assured?
- [ ] Informed consent obtained?
- [ ] Social desirability bias mitigated?
ANALYSIS:
- [ ] Weighting applied for non-response or oversampling?
- [ ] Margin of error and confidence level reported?
- [ ] Subgroup analyses pre-specified (not exploratory)?SURVEY METHODOLOGY CHECKLIST:
SAMPLING:
- [ ] Probability sampling method described?
- [ ] Sampling frame defined and appropriate?
- [ ] Response rate reported (acceptable: >60% mail, >80% in-person)?
- [ ] Non-response bias assessed?
QUESTIONNAIRE:
- [ ] Questions validated or adapted from validated instruments?
- [ ] Leading or double-barreled questions absent?
- [ ] Response options balanced and exhaustive?
- [ ] Pilot tested with target population?
ADMINISTRATION:
- [ ] Mode (online, phone, in-person) appropriate?
- [ ] Anonymity/confidentiality assured?
- [ ] Informed consent obtained?
- [ ] Social desirability bias mitigated?
ANALYSIS:
- [ ] Weighting applied for non-response or oversampling?
- [ ] Margin of error and confidence level reported?
- [ ] Subgroup analyses pre-specified (not exploratory)?Data Visualization Integrity Checks
数据可视化完整性检查
Chart Audit Checklist
图表审核清单
| Check | What to Look For | Fail Condition |
|---|---|---|
| Y-axis starts at zero (bar charts) | Truncated axis exaggerates differences | Axis starts above zero without clear label |
| Consistent scale | Both axes have proportional increments | Non-linear scale without explanation |
| Area proportional to data | Bubble/icon size matches values | Area misrepresents magnitude |
| Time axis evenly spaced | Equal intervals between data points | Uneven spacing compresses/expands trends |
| Appropriate chart type | Data type matches visualization | Pie chart with 20+ categories |
| Context provided | Benchmarks, comparisons, baselines | Single data point with no reference |
| Source cited | Data origin traceable | No source attribution |
| Dual axes used responsibly | Two Y-axes can create false correlations | Arbitrary scaling implies relationship |
| 检查项 | 检查内容 | 不合格情况 |
|---|---|---|
| 柱状图Y轴从0开始 | 截断Y轴会夸大差异 | Y轴从非0值开始且无明确标注 |
| 刻度一致 | 两个轴的增量成比例 | 使用非线性刻度且未说明 |
| 面积与数据成比例 | 气泡/图标大小匹配数值 | 面积错误呈现数据量级 |
| 时间轴间隔均匀 | 数据点间隔相等 | 间隔不均压缩/放大趋势 |
| 图表类型合适 | 数据类型与可视化方式匹配 | 用饼图展示20+个类别 |
| 提供上下文 | 包含基准、对比、基线数据 | 仅展示单个数据点且无参考 |
| 标注数据来源 | 可追溯数据出处 | 未标注来源 |
| 合理使用双Y轴 | 双Y轴可能造成虚假关联 | 随意缩放轴刻度暗示关联 |
Misleading Visualization Patterns
误导性可视化模式
WATCH FOR THESE TRICKS:
1. TRUNCATED AXIS
Small differences look dramatic when baseline removed.
FIX: Always check if y-axis starts at zero for bar charts.
2. CHERRY-PICKED TIME WINDOW
Start/end dates chosen to show desired trend.
FIX: Ask for longer time series with consistent intervals.
3. 3D EFFECTS
Perspective distortion makes sizes unequal.
FIX: Use flat 2D charts for accurate comparison.
4. DUAL AXIS MANIPULATION
Two y-axes scaled to create apparent correlation.
FIX: Normalize data or use separate panels.
5. CUMULATIVE VS DAILY
Cumulative charts always go up — hides declining rates.
FIX: Show rate of change alongside cumulative.WATCH FOR THESE TRICKS:
1. TRUNCATED AXIS
Small differences look dramatic when baseline removed.
FIX: Always check if y-axis starts at zero for bar charts.
2. CHERRY-PICKED TIME WINDOW
Start/end dates chosen to show desired trend.
FIX: Ask for longer time series with consistent intervals.
3. 3D EFFECTS
Perspective distortion makes sizes unequal.
FIX: Use flat 2D charts for accurate comparison.
4. DUAL AXIS MANIPULATION
Two y-axes scaled to create apparent correlation.
FIX: Normalize data or use separate panels.
5. CUMULATIVE VS DAILY
Cumulative charts always go up — hides declining rates.
FIX: Show rate of change alongside cumulative.Bias Detection Framework
偏差检测框架
Cognitive Biases in Data Analysis
数据分析中的认知偏差
BIAS DETECTION CHECKLIST:
CONFIRMATION BIAS
- Are they only presenting data that supports their hypothesis?
- Were negative results reported?
- Was the analysis plan pre-registered?
ANCHORING BIAS
- Is the first number presented influencing interpretation of later data?
- Are comparisons made to appropriate benchmarks?
SURVIVORSHIP BIAS
- Are only successful cases included (ignoring failures)?
- Is the denominator complete (not just survivors)?
AVAILABILITY BIAS
- Are dramatic or recent events overweighted?
- Is systematic data used rather than anecdotal evidence?
PUBLICATION BIAS
- Is there a funnel plot asymmetry (meta-analyses)?
- Are null results published or only significant ones?
TEXAS SHARPSHOOTER FALLACY
- Were clusters or patterns found after looking at data?
- Was the hypothesis formed before or after seeing results?BIAS DETECTION CHECKLIST:
CONFIRMATION BIAS
- Are they only presenting data that supports their hypothesis?
- Were negative results reported?
- Was the analysis plan pre-registered?
ANCHORING BIAS
- Is the first number presented influencing interpretation of later data?
- Are comparisons made to appropriate benchmarks?
SURVIVORSHIP BIAS
- Are only successful cases included (ignoring failures)?
- Is the denominator complete (not just survivors)?
AVAILABILITY BIAS
- Are dramatic or recent events overweighted?
- Is systematic data used rather than anecdotal evidence?
PUBLICATION BIAS
- Is there a funnel plot asymmetry (meta-analyses)?
- Are null results published or only significant ones?
TEXAS SHARPSHOOTER FALLACY
- Were clusters or patterns found after looking at data?
- Was the hypothesis formed before or after seeing results?Bias Severity Matrix
偏差严重程度矩阵
| Bias | Detection Method | Mitigation |
|---|---|---|
| Selection bias | Compare sample to population demographics | Probability sampling, weighting |
| Measurement bias | Check instrument validity and calibration | Validated instruments, blinding |
| Reporting bias | Look for asymmetric funnel plots | Pre-registration, open data |
| Recall bias | Compare to objective records | Prospective data collection |
| Observer bias | Check if assessors were blinded | Double-blind design |
| Attrition bias | Compare completers vs dropouts | Intention-to-treat analysis |
| 偏差类型 | 检测方法 | 缓解措施 |
|---|---|---|
| 选择偏差 | 对比样本与总体人口统计数据 | 概率抽样、加权处理 |
| 测量偏差 | 检查工具的有效性与校准情况 | 使用经验证的工具、盲法 |
| 报告偏差 | 查看漏斗图是否不对称 | 预注册、开放数据 |
| 回忆偏差 | 与客观记录对比 | 前瞻性数据收集 |
| 观察者偏差 | 评估者是否采用盲法 | 双盲设计 |
| 流失偏差 | 对比完成者与退出者 | 意向性分析(ITT) |
Reproducibility Checklist
可重复性清单
Study Reproducibility Assessment
研究可重复性评估
REPRODUCIBILITY REQUIREMENTS:
DATA AVAILABILITY:
- [ ] Raw data accessible (repository, supplement, on request)?
- [ ] Data dictionary / codebook provided?
- [ ] Data collection protocol documented?
CODE / ANALYSIS:
- [ ] Analysis code shared (GitHub, OSF, supplement)?
- [ ] Software versions and packages specified?
- [ ] Random seeds set for reproducible computation?
- [ ] Pipeline documented end-to-end?
METHODOLOGY:
- [ ] Study pre-registered (OSF, ClinicalTrials.gov)?
- [ ] Deviations from protocol documented?
- [ ] All outcome measures reported (not just significant ones)?
- [ ] Sensitivity analyses included?
REPORTING:
- [ ] Follows reporting guidelines (CONSORT, STROBE, PRISMA)?
- [ ] Effect sizes and confidence intervals reported?
- [ ] Power analysis or sample size justification provided?
- [ ] Limitations section thorough and honest?REPRODUCIBILITY REQUIREMENTS:
DATA AVAILABILITY:
- [ ] Raw data accessible (repository, supplement, on request)?
- [ ] Data dictionary / codebook provided?
- [ ] Data collection protocol documented?
CODE / ANALYSIS:
- [ ] Analysis code shared (GitHub, OSF, supplement)?
- [ ] Software versions and packages specified?
- [ ] Random seeds set for reproducible computation?
- [ ] Pipeline documented end-to-end?
METHODOLOGY:
- [ ] Study pre-registered (OSF, ClinicalTrials.gov)?
- [ ] Deviations from protocol documented?
- [ ] All outcome measures reported (not just significant ones)?
- [ ] Sensitivity analyses included?
REPORTING:
- [ ] Follows reporting guidelines (CONSORT, STROBE, PRISMA)?
- [ ] Effect sizes and confidence intervals reported?
- [ ] Power analysis or sample size justification provided?
- [ ] Limitations section thorough and honest?Reporting Standards by Study Type
按研究类型划分的报告标准
| Study Type | Guideline | Key Elements |
|---|---|---|
| Randomized trial | CONSORT | Flow diagram, ITT analysis, blinding |
| Observational study | STROBE | Selection criteria, confounders, missing data |
| Systematic review | PRISMA | Search strategy, inclusion criteria, risk of bias |
| Diagnostic accuracy | STARD | Index test, reference standard, flow diagram |
| Qualitative research | COREQ | Research team, study design, data analysis |
| Prediction model | TRIPOD | Model development, validation, performance |
| 研究类型 | 指南 | 核心要素 |
|---|---|---|
| 随机对照试验(RCT) | CONSORT | 流程图、意向性分析、盲法 |
| 观察性研究 | STROBE | 选择标准、混杂变量、缺失数据 |
| 系统综述 | PRISMA | 检索策略、纳入标准、偏倚风险 |
| 诊断准确性研究 | STARD | 指数试验、参考标准、流程图 |
| 定性研究 | COREQ | 研究团队、研究设计、数据分析 |
| 预测模型 | TRIPOD | 模型开发、验证、性能 |
Quick Verification Workflow
快速验证工作流
FAST VERIFICATION (5 minutes):
1. Read the claim carefully — what exactly is being stated?
2. Check: source, sample size, study type
3. Ask: absolute or relative? What is the base rate?
4. Check: confidence interval or margin of error given?
5. Search: has this been replicated independently?
VERDICT CATEGORIES:
VERIFIED — multiple strong sources, robust methodology
PLAUSIBLE — reasonable evidence, some limitations
UNCERTAIN — mixed evidence, methodology concerns
MISLEADING — technically true but presented deceptively
FALSE — contradicted by strong evidence
UNVERIFIABLE — cannot assess with available informationFAST VERIFICATION (5 minutes):
1. Read the claim carefully — what exactly is being stated?
2. Check: source, sample size, study type
3. Ask: absolute or relative? What is the base rate?
4. Check: confidence interval or margin of error given?
5. Search: has this been replicated independently?
VERDICT CATEGORIES:
VERIFIED — multiple strong sources, robust methodology
PLAUSIBLE — reasonable evidence, some limitations
UNCERTAIN — mixed evidence, methodology concerns
MISLEADING — technically true but presented deceptively
FALSE — contradicted by strong evidence
UNVERIFIABLE — cannot assess with available informationSee Also
另请参阅
- Data Science
- Research Presenter
- Product Analytics
- 数据科学
- 研究成果展示
- 产品分析