add-golden

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Add to Golden Dataset

添加至黄金数据集

Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.
具备质量分数说明、偏差检测和版本跟踪功能的多Agent整理工作流。

Quick Start

快速开始

bash
/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxx

bash
/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxx

Task Management (CC 2.1.16)

任务管理(CC 2.1.16)

python
undefined
python
undefined

Create main curation task

创建主整理任务

TaskCreate( subject="Add to golden dataset: {url}", description="Multi-agent curation with quality explanation", activeForm="Curating document" )
TaskCreate( subject="Add to golden dataset: {url}", description="Multi-agent curation with quality explanation", activeForm="Curating document" )

Create subtasks for 9-phase process

为9阶段流程创建子任务

phases = ["Fetch content", "Run quality analysis", "Explain scores", "Check bias", "Check diversity", "Validate", "Get approval", "Write to dataset", "Update version"] for phase in phases: TaskCreate(subject=phase, activeForm=f"{phase}ing")

---
phases = ["Fetch content", "Run quality analysis", "Explain scores", "Check bias", "Check diversity", "Validate", "Get approval", "Write to dataset", "Update version"] for phase in phases: TaskCreate(subject=phase, activeForm=f"{phase}ing")

---

Workflow Overview

工作流概述

PhaseActivitiesOutput
1. Input CollectionGet URL, detect content typeDocument metadata
2. Fetch and ExtractParse document structureStructured content
3. Quality Analysis4 parallel agents evaluateRaw scores
4. Quality ExplanationExplain WHY each scoreScore rationale
5. Bias DetectionCheck for bias in contentBias report
6. Diversity CheckAssess dataset balanceDiversity metrics
7. ValidationSchema, duplicates, gatesValidation status
8. Silver-to-GoldPromote or mark as silverClassification
9. Version TrackingTrack changes, rollbackVersion entry

阶段活动输出
1. 输入收集获取URL,检测内容类型文档元数据
2. 获取与提取解析文档结构结构化内容
3. 质量分析4个并行Agent评估原始分数
4. 质量分数说明解释每个分数的原因分数依据
5. 偏差检测检查内容中的偏差偏差报告
6. 多样性检查评估数据集平衡性多样性指标
7. 验证Schema检查、重复项检查、准入校验验证状态
8. 从白银到黄金升级为黄金或标记为白银分类结果
9. 版本跟踪跟踪变更、支持回滚版本记录

Phase 1-2: Input and Extraction

阶段1-2:输入与提取

Detect content type: article, tutorial, documentation, research_paper.
Extract: title, sections, code blocks, key terms, metadata (author, date).

检测内容类型:文章、教程、文档、研究论文。
提取内容:标题、章节、代码块、关键术语、元数据(作者、日期)。

Phase 3: Parallel Quality Analysis (4 Agents)

阶段3:并行质量分析(4个Agent)

Launch ALL agents in ONE message with
run_in_background=True
.
AgentFocusOutput
code-quality-reviewerAccuracy, coherence, depth, relevanceQuality scores
workflow-architectKeyword directness, paraphrase, reasoningDifficulty level
data-pipeline-engineerPrimary/secondary domains, skill levelTags
test-generatorDirect, paraphrased, multi-hop queriesTest queries
See Quality Scoring for detailed criteria.

通过
run_in_background=True
在一条消息中启动所有Agent。
Agent关注重点输出
code-quality-reviewer准确性、连贯性、深度、相关性质量分数
workflow-architect关键词匹配度、转述质量、推理能力难度等级
data-pipeline-engineer主次领域、技能水平标签
test-generator直接查询、转述查询、多跳查询测试查询
详细标准请参考Quality Scoring

Phase 4: Quality Explanation

阶段4:质量分数说明

Each dimension gets WHY explanation:
markdown
undefined
每个维度都会说明分数原因:
markdown
undefined

Accuracy: [N.NN]/1.0

准确性: [N.NN]/1.0

Why this score:
  • [Specific reason with evidence] What would improve it:
  • [Specific improvement]

---
分数原因:
  • [带证据的具体理由] 改进方向:
  • [具体改进建议]

---

Phase 5: Bias Detection

阶段5:偏差检测

See Bias Detection Guide for patterns.
Check for:
  • Technology bias (favors specific tools)
  • Recency bias (ignores LTS versions)
  • Complexity bias (assumed knowledge)
  • Vendor bias (promotes products)
  • Geographic/cultural bias
Bias ScoreAction
0-2Proceed normally
3-5Add disclaimer
6-8Require user review
9-10Recommend against

请参考Bias Detection Guide中的模式。
检查以下类型的偏差:
  • 技术偏差(偏好特定工具)
  • 时效性偏差(忽略长期支持版本)
  • 复杂度偏差(假设用户具备特定知识)
  • 厂商偏差(推广特定产品)
  • 地域/文化偏差
偏差分数操作
0-2正常推进
3-5添加免责声明
6-8需用户审核
9-10建议拒绝

Phase 6: Diversity Dashboard

阶段6:多样性仪表盘

Track dataset balance across:
  • Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)
  • Difficulty distribution (trivial, easy, medium, hard, adversarial)
Impact assessment: Does new document improve or worsen diversity?

跟踪数据集在以下维度的平衡性:
  • 领域分布(AI/ML、后端、前端、DevOps、安全)
  • 难度分布(简单、易、中等、难、对抗性)
影响评估: 新文档会改善还是恶化数据集的多样性?

Phase 7: Validation

阶段7:验证

  • URL validation (no placeholders)
  • Schema validation (required fields)
  • Duplicate check (>80% similarity)
  • Quality gates (min sections, content length)

  • URL验证(无占位符)
  • Schema验证(必填字段检查)
  • 重复项检查(相似度>80%)
  • 准入校验(最小章节数、内容长度)

Phase 8: Silver-to-Gold Workflow

阶段8:从白银到黄金工作流

See Silver-Gold Promotion for criteria.
StatusCriteriaAction
GOLDScore >= 0.75, no biasAdd to main dataset
SILVERScore 0.55-0.74Add to silver, track
REJECTScore < 0.55Do not add
Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.

升级标准请参考Silver-Gold Promotion
状态标准操作
黄金分数≥0.75,无偏差添加至主数据集
白银分数0.55-0.74添加至白银数据集并跟踪
拒绝分数<0.55不添加
升级标准: 在白银数据集存放7天以上,质量≥0.75,无负面反馈。

Phase 9: Version Tracking

阶段9:版本跟踪

json
{
  "version": "1.2.3",
  "change_type": "ADD|UPDATE|REMOVE|PROMOTE",
  "document_id": "doc-123",
  "quality_score": 0.82,
  "rollback_available": true
}
Update TypeVersion Bump
Add/Update documentPatch (0.0.X)
Remove documentMinor (0.X.0)
Schema changeMajor (X.0.0)

json
{
  "version": "1.2.3",
  "change_type": "ADD|UPDATE|REMOVE|PROMOTE",
  "document_id": "doc-123",
  "quality_score": 0.82,
  "rollback_available": true
}
更新类型版本升级规则
添加/更新文档补丁版本(0.0.X)
删除文档小版本(0.X.0)
Schema变更大版本(X.0.0)

Quality Scoring

质量评分

DimensionWeight
Accuracy0.25
Coherence0.20
Depth0.25
Relevance0.30
Formula:
quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30

维度权重
准确性0.25
连贯性0.20
深度0.25
相关性0.30
计算公式:
quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30

Key Decisions

关键决策

DecisionChoiceRationale
Score explanationRequiredTransparency, actionable feedback
Bias detectionDedicated agentPrevent dataset contamination
Two-tier systemSilver + GoldAllow docs time to mature
Version trackingSemantic versioningClear history, safe rollbacks

决策选择理由
分数说明必填透明化、可落地的反馈
偏差检测专用Agent防止数据集污染
双层体系白银+黄金让文档有成熟的时间
版本跟踪语义化版本控制清晰的历史记录、安全回滚

Related Skills

相关技能

  • golden-dataset-validation
    - Validate existing datasets
  • llm-evaluation
    - LLM output evaluation patterns
  • test-data-management
    - Test data strategies

Version: 2.0.0 (January 2026)
  • golden-dataset-validation
    - 验证现有数据集
  • llm-evaluation
    - LLM输出评估模式
  • test-data-management
    - 测试数据策略

版本: 2.0.0(2026年1月)