ethics-data-governance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEthics Data Governance
伦理与数据治理
Handle ethics and data governance as part of the research record, not as a late
appendix. This is especially important for platform, community, security, health,
education, and other sensitive-domain research.
将伦理与数据治理作为研究记录的一部分,而非后期补充的附录。这对于平台、社区、安全、健康、教育及其他敏感领域的研究尤为重要。
Read First
必读内容
references/ethics-data-governance.mdreferences/repository-contract.md
references/ethics-data-governance.mdreferences/repository-contract.md
Workflow
工作流程
- Identify data subjects, platforms, communities, institutions, and affected groups.
- Classify sensitivity: public, restricted, personal, vulnerable, illegal, copyrighted, proprietary, or dual-use.
- Record collection boundaries, access method, terms-of-service risk, consent,
IRB or ethics-board status, and retention policy in .
docs/ethics/data-governance.md - Keep raw sensitive data in ignored local folders unless sharing is approved.
- Prefer aggregate, redacted, or synthetic outputs in reports.
- Add risk notes to dataset pages and claim pages.
- Re-check governance before publishing artifacts, code, or examples.
- 识别数据主体、平台、社区、机构及受影响群体。
- 对数据敏感性进行分类:公开、受限、个人、易受侵害、非法、受版权保护、专有或两用。
- 在中记录收集范围、访问方式、服务条款风险、知情同意、IRB或伦理委员会状态及留存政策。
docs/ethics/data-governance.md - 除非获得共享许可,否则将原始敏感数据存储在本地忽略文件夹中。
- 报告中优先使用聚合、编辑后或合成的数据输出。
- 在数据集页面和声明页面添加风险说明。
- 在发布成果、代码或示例前重新检查治理合规性。
Red Flags
警示信号
- personal identifiers in raw or derived files
- screenshots that expose users, handles, addresses, chats, or illegal content
- scraping behind login or against platform restrictions
- model outputs that can reveal memorized sensitive data
- publishing datasets without license or consent analysis
- 原始或衍生文件中包含个人标识符
- 暴露用户、账号、地址、聊天内容或非法内容的截图
- 登录后爬取或违反平台限制的爬取行为
- 可能泄露已记忆敏感数据的模型输出
- 未进行许可或知情同意分析就发布数据集
Do Not
禁止事项
- Treat "publicly visible" as automatically safe to redistribute.
- Put credentials, cookies, session exports, or scraped private content in git.
- De-identify by hand without recording the transformation.
- 认为“公开可见”的数据就自动可以安全地重新分发。
- 将凭据、Cookie、会话导出内容或爬取的私人内容存入git。
- 手动去标识却不记录转换过程。