add-golden
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAdd to Golden Dataset
添加至黄金数据集
Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.
具备质量分数说明、偏差检测和版本跟踪功能的多Agent整理工作流。
Quick Start
快速开始
bash
/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxxbash
/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxxTask Management (CC 2.1.16)
任务管理(CC 2.1.16)
python
undefinedpython
undefinedCreate main curation task
创建主整理任务
TaskCreate(
subject="Add to golden dataset: {url}",
description="Multi-agent curation with quality explanation",
activeForm="Curating document"
)
TaskCreate(
subject="Add to golden dataset: {url}",
description="Multi-agent curation with quality explanation",
activeForm="Curating document"
)
Create subtasks for 9-phase process
为9阶段流程创建子任务
phases = ["Fetch content", "Run quality analysis", "Explain scores",
"Check bias", "Check diversity", "Validate", "Get approval",
"Write to dataset", "Update version"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
---phases = ["Fetch content", "Run quality analysis", "Explain scores",
"Check bias", "Check diversity", "Validate", "Get approval",
"Write to dataset", "Update version"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
---Workflow Overview
工作流概述
| Phase | Activities | Output |
|---|---|---|
| 1. Input Collection | Get URL, detect content type | Document metadata |
| 2. Fetch and Extract | Parse document structure | Structured content |
| 3. Quality Analysis | 4 parallel agents evaluate | Raw scores |
| 4. Quality Explanation | Explain WHY each score | Score rationale |
| 5. Bias Detection | Check for bias in content | Bias report |
| 6. Diversity Check | Assess dataset balance | Diversity metrics |
| 7. Validation | Schema, duplicates, gates | Validation status |
| 8. Silver-to-Gold | Promote or mark as silver | Classification |
| 9. Version Tracking | Track changes, rollback | Version entry |
| 阶段 | 活动 | 输出 |
|---|---|---|
| 1. 输入收集 | 获取URL,检测内容类型 | 文档元数据 |
| 2. 获取与提取 | 解析文档结构 | 结构化内容 |
| 3. 质量分析 | 4个并行Agent评估 | 原始分数 |
| 4. 质量分数说明 | 解释每个分数的原因 | 分数依据 |
| 5. 偏差检测 | 检查内容中的偏差 | 偏差报告 |
| 6. 多样性检查 | 评估数据集平衡性 | 多样性指标 |
| 7. 验证 | Schema检查、重复项检查、准入校验 | 验证状态 |
| 8. 从白银到黄金 | 升级为黄金或标记为白银 | 分类结果 |
| 9. 版本跟踪 | 跟踪变更、支持回滚 | 版本记录 |
Phase 1-2: Input and Extraction
阶段1-2:输入与提取
Detect content type: article, tutorial, documentation, research_paper.
Extract: title, sections, code blocks, key terms, metadata (author, date).
检测内容类型:文章、教程、文档、研究论文。
提取内容:标题、章节、代码块、关键术语、元数据(作者、日期)。
Phase 3: Parallel Quality Analysis (4 Agents)
阶段3:并行质量分析(4个Agent)
Launch ALL agents in ONE message with .
run_in_background=True| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Accuracy, coherence, depth, relevance | Quality scores |
| workflow-architect | Keyword directness, paraphrase, reasoning | Difficulty level |
| data-pipeline-engineer | Primary/secondary domains, skill level | Tags |
| test-generator | Direct, paraphrased, multi-hop queries | Test queries |
See Quality Scoring for detailed criteria.
通过在一条消息中启动所有Agent。
run_in_background=True| Agent | 关注重点 | 输出 |
|---|---|---|
| code-quality-reviewer | 准确性、连贯性、深度、相关性 | 质量分数 |
| workflow-architect | 关键词匹配度、转述质量、推理能力 | 难度等级 |
| data-pipeline-engineer | 主次领域、技能水平 | 标签 |
| test-generator | 直接查询、转述查询、多跳查询 | 测试查询 |
详细标准请参考Quality Scoring。
Phase 4: Quality Explanation
阶段4:质量分数说明
Each dimension gets WHY explanation:
markdown
undefined每个维度都会说明分数原因:
markdown
undefinedAccuracy: [N.NN]/1.0
准确性: [N.NN]/1.0
Why this score:
- [Specific reason with evidence] What would improve it:
- [Specific improvement]
---分数原因:
- [带证据的具体理由] 改进方向:
- [具体改进建议]
---Phase 5: Bias Detection
阶段5:偏差检测
See Bias Detection Guide for patterns.
Check for:
- Technology bias (favors specific tools)
- Recency bias (ignores LTS versions)
- Complexity bias (assumed knowledge)
- Vendor bias (promotes products)
- Geographic/cultural bias
| Bias Score | Action |
|---|---|
| 0-2 | Proceed normally |
| 3-5 | Add disclaimer |
| 6-8 | Require user review |
| 9-10 | Recommend against |
请参考Bias Detection Guide中的模式。
检查以下类型的偏差:
- 技术偏差(偏好特定工具)
- 时效性偏差(忽略长期支持版本)
- 复杂度偏差(假设用户具备特定知识)
- 厂商偏差(推广特定产品)
- 地域/文化偏差
| 偏差分数 | 操作 |
|---|---|
| 0-2 | 正常推进 |
| 3-5 | 添加免责声明 |
| 6-8 | 需用户审核 |
| 9-10 | 建议拒绝 |
Phase 6: Diversity Dashboard
阶段6:多样性仪表盘
Track dataset balance across:
- Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)
- Difficulty distribution (trivial, easy, medium, hard, adversarial)
Impact assessment: Does new document improve or worsen diversity?
跟踪数据集在以下维度的平衡性:
- 领域分布(AI/ML、后端、前端、DevOps、安全)
- 难度分布(简单、易、中等、难、对抗性)
影响评估: 新文档会改善还是恶化数据集的多样性?
Phase 7: Validation
阶段7:验证
- URL validation (no placeholders)
- Schema validation (required fields)
- Duplicate check (>80% similarity)
- Quality gates (min sections, content length)
- URL验证(无占位符)
- Schema验证(必填字段检查)
- 重复项检查(相似度>80%)
- 准入校验(最小章节数、内容长度)
Phase 8: Silver-to-Gold Workflow
阶段8:从白银到黄金工作流
See Silver-Gold Promotion for criteria.
| Status | Criteria | Action |
|---|---|---|
| GOLD | Score >= 0.75, no bias | Add to main dataset |
| SILVER | Score 0.55-0.74 | Add to silver, track |
| REJECT | Score < 0.55 | Do not add |
Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.
升级标准请参考Silver-Gold Promotion。
| 状态 | 标准 | 操作 |
|---|---|---|
| 黄金 | 分数≥0.75,无偏差 | 添加至主数据集 |
| 白银 | 分数0.55-0.74 | 添加至白银数据集并跟踪 |
| 拒绝 | 分数<0.55 | 不添加 |
升级标准: 在白银数据集存放7天以上,质量≥0.75,无负面反馈。
Phase 9: Version Tracking
阶段9:版本跟踪
json
{
"version": "1.2.3",
"change_type": "ADD|UPDATE|REMOVE|PROMOTE",
"document_id": "doc-123",
"quality_score": 0.82,
"rollback_available": true
}| Update Type | Version Bump |
|---|---|
| Add/Update document | Patch (0.0.X) |
| Remove document | Minor (0.X.0) |
| Schema change | Major (X.0.0) |
json
{
"version": "1.2.3",
"change_type": "ADD|UPDATE|REMOVE|PROMOTE",
"document_id": "doc-123",
"quality_score": 0.82,
"rollback_available": true
}| 更新类型 | 版本升级规则 |
|---|---|
| 添加/更新文档 | 补丁版本(0.0.X) |
| 删除文档 | 小版本(0.X.0) |
| Schema变更 | 大版本(X.0.0) |
Quality Scoring
质量评分
| Dimension | Weight |
|---|---|
| Accuracy | 0.25 |
| Coherence | 0.20 |
| Depth | 0.25 |
| Relevance | 0.30 |
Formula:
quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30| 维度 | 权重 |
|---|---|
| 准确性 | 0.25 |
| 连贯性 | 0.20 |
| 深度 | 0.25 |
| 相关性 | 0.30 |
计算公式:
quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30Key Decisions
关键决策
| Decision | Choice | Rationale |
|---|---|---|
| Score explanation | Required | Transparency, actionable feedback |
| Bias detection | Dedicated agent | Prevent dataset contamination |
| Two-tier system | Silver + Gold | Allow docs time to mature |
| Version tracking | Semantic versioning | Clear history, safe rollbacks |
| 决策 | 选择 | 理由 |
|---|---|---|
| 分数说明 | 必填 | 透明化、可落地的反馈 |
| 偏差检测 | 专用Agent | 防止数据集污染 |
| 双层体系 | 白银+黄金 | 让文档有成熟的时间 |
| 版本跟踪 | 语义化版本控制 | 清晰的历史记录、安全回滚 |
Related Skills
相关技能
- - Validate existing datasets
golden-dataset-validation - - LLM output evaluation patterns
llm-evaluation - - Test data strategies
test-data-management
Version: 2.0.0 (January 2026)
- - 验证现有数据集
golden-dataset-validation - - LLM输出评估模式
llm-evaluation - - 测试数据策略
test-data-management
版本: 2.0.0(2026年1月)