ethics-data-governance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ethics Data Governance

伦理与数据治理

Handle ethics and data governance as part of the research record, not as a late appendix. This is especially important for platform, community, security, health, education, and other sensitive-domain research.
将伦理与数据治理作为研究记录的一部分,而非后期补充的附录。这对于平台、社区、安全、健康、教育及其他敏感领域的研究尤为重要。

Read First

必读内容

  • references/ethics-data-governance.md
  • references/repository-contract.md
  • references/ethics-data-governance.md
  • references/repository-contract.md

Workflow

工作流程

  1. Identify data subjects, platforms, communities, institutions, and affected groups.
  2. Classify sensitivity: public, restricted, personal, vulnerable, illegal, copyrighted, proprietary, or dual-use.
  3. Record collection boundaries, access method, terms-of-service risk, consent, IRB or ethics-board status, and retention policy in
    docs/ethics/data-governance.md
    .
  4. Keep raw sensitive data in ignored local folders unless sharing is approved.
  5. Prefer aggregate, redacted, or synthetic outputs in reports.
  6. Add risk notes to dataset pages and claim pages.
  7. Re-check governance before publishing artifacts, code, or examples.
  1. 识别数据主体、平台、社区、机构及受影响群体。
  2. 对数据敏感性进行分类:公开、受限、个人、易受侵害、非法、受版权保护、专有或两用。
  3. docs/ethics/data-governance.md
    中记录收集范围、访问方式、服务条款风险、知情同意、IRB或伦理委员会状态及留存政策。
  4. 除非获得共享许可,否则将原始敏感数据存储在本地忽略文件夹中。
  5. 报告中优先使用聚合、编辑后或合成的数据输出。
  6. 在数据集页面和声明页面添加风险说明。
  7. 在发布成果、代码或示例前重新检查治理合规性。

Red Flags

警示信号

  • personal identifiers in raw or derived files
  • screenshots that expose users, handles, addresses, chats, or illegal content
  • scraping behind login or against platform restrictions
  • model outputs that can reveal memorized sensitive data
  • publishing datasets without license or consent analysis
  • 原始或衍生文件中包含个人标识符
  • 暴露用户、账号、地址、聊天内容或非法内容的截图
  • 登录后爬取或违反平台限制的爬取行为
  • 可能泄露已记忆敏感数据的模型输出
  • 未进行许可或知情同意分析就发布数据集

Do Not

禁止事项

  • Treat "publicly visible" as automatically safe to redistribute.
  • Put credentials, cookies, session exports, or scraped private content in git.
  • De-identify by hand without recording the transformation.
  • 认为“公开可见”的数据就自动可以安全地重新分发。
  • 将凭据、Cookie、会话导出内容或爬取的私人内容存入git。
  • 手动去标识却不记录转换过程。