ai-ethics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Ethics
AI伦理
Comprehensive AI ethics skill covering bias detection, fairness assessment, responsible AI development, and regulatory compliance.
涵盖偏见检测、公平性评估、负责任AI开发及合规性的全面AI伦理技能。
When to Use This Skill
何时使用该技能
- Evaluating AI models for bias
- Implementing fairness measures
- Conducting ethical impact assessments
- Ensuring regulatory compliance (EU AI Act, etc.)
- Designing human-in-the-loop systems
- Creating AI transparency documentation
- Developing AI governance frameworks
- 评估AI模型的偏见
- 实施公平性措施
- 开展伦理影响评估
- 确保合规(如欧盟AI法案等)
- 设计人在回路(human-in-the-loop)系统
- 制作AI透明度文档
- 开发AI治理框架
Ethical Principles
伦理原则
Core AI Ethics Principles
核心AI伦理原则
| Principle | Description |
|---|---|
| Fairness | AI should not discriminate against individuals or groups |
| Transparency | AI decisions should be explainable |
| Privacy | Personal data must be protected |
| Accountability | Clear responsibility for AI outcomes |
| Safety | AI should not cause harm |
| Human Agency | Humans should maintain control |
| 原则 | 说明 |
|---|---|
| 公平性(Fairness) | AI不应歧视个人或群体 |
| 透明度(Transparency) | AI的决策应具备可解释性 |
| 隐私性(Privacy) | 个人数据必须得到保护 |
| 问责性(Accountability) | AI结果需明确责任归属 |
| 安全性(Safety) | AI不应造成伤害 |
| 人类自主性(Human Agency) | 人类应保持对AI的控制权 |
Stakeholder Considerations
利益相关方考量
- Users: How does this affect people using the system?
- Subjects: How does this affect people the AI makes decisions about?
- Society: What are broader societal implications?
- Environment: What is the environmental impact?
- 用户:这会如何影响使用系统的人群?
- 受决策影响者:这会如何影响AI决策所针对的人群?
- 社会:更广泛的社会影响是什么?
- 环境:对环境有何影响?
Bias Detection & Mitigation
偏见检测与缓解
Types of AI Bias
AI偏见类型
| Bias Type | Source | Example |
|---|---|---|
| Historical | Training data reflects past discrimination | Hiring models favoring male candidates |
| Representation | Underrepresented groups in training data | Face recognition failing on darker skin |
| Measurement | Proxy variables for protected attributes | ZIP code correlating with race |
| Aggregation | One model for diverse populations | Medical model trained only on one ethnicity |
| Evaluation | Biased evaluation metrics | Accuracy hiding disparate impact |
| 偏见类型 | 来源 | 示例 |
|---|---|---|
| 历史偏见 | 训练数据反映过去的歧视 | 招聘模型偏向男性候选人 |
| 代表性偏见 | 训练数据中群体代表性不足 | 人脸识别在深色皮肤人群上失效 |
| 测量偏见 | 用代理变量替代受保护属性 | 邮政编码与种族相关联 |
| 聚合偏见 | 为多样化群体使用单一模型 | 仅基于单一族群训练的医疗模型 |
| 评估偏见 | 评估指标存在偏见 | 准确率指标掩盖差异化影响 |
Fairness Metrics
公平性指标
Group Fairness:
- Demographic Parity: Equal positive rates across groups
- Equalized Odds: Equal TPR and FPR across groups
- Predictive Parity: Equal precision across groups
Individual Fairness:
- Similar individuals should receive similar predictions
- Counterfactual fairness: Would outcome change if protected attribute differed?
群体公平性:
- 人口统计均等:不同群体的阳性率一致
- 均等机会:不同群体的真阳性率(TPR)和假阳性率(FPR)一致
- 预测均等:不同群体的精确率一致
个体公平性:
- 相似个体应得到相似的预测结果
- 反事实公平性:若受保护属性不同,结果是否会改变?
Bias Mitigation Strategies
偏见缓解策略
Pre-processing:
- Resampling/reweighting training data
- Removing biased features
- Data augmentation for underrepresented groups
In-processing:
- Fairness constraints in loss function
- Adversarial debiasing
- Fair representation learning
Post-processing:
- Threshold adjustment per group
- Calibration
- Reject option classification
预处理:
- 对训练数据进行重采样/加权
- 移除带有偏见的特征
- 为代表性不足的群体扩充数据
中处理:
- 在损失函数中加入公平性约束
- 对抗性去偏
- 公平表示学习
后处理:
- 按群体调整决策阈值
- 校准
- 拒绝选项分类
Explainability & Transparency
可解释性与透明度
Explanation Types
解释类型
| Type | Audience | Purpose |
|---|---|---|
| Global | Developers | Understand overall model behavior |
| Local | End users | Explain specific decisions |
| Counterfactual | Affected parties | What would need to change for different outcome |
| 类型 | 受众 | 目的 |
|---|---|---|
| 全局解释 | 开发者 | 理解模型整体行为 |
| 局部解释 | 终端用户 | 解释具体决策 |
| 反事实解释 | 受影响方 | 说明需改变哪些因素才能得到不同结果 |
Explainability Techniques
可解释性技术
- SHAP: Feature importance values
- LIME: Local interpretable explanations
- Attention maps: For neural networks
- Decision trees: Inherently interpretable
- Feature importance: Global model understanding
- SHAP:特征重要性数值
- LIME:局部可解释性解释
- 注意力图:适用于神经网络
- 决策树:天生具备可解释性
- 特征重要性:全局模型理解
Model Cards
模型卡片(Model Cards)
Document for each model:
- Model purpose and intended use
- Training data description
- Performance metrics by subgroup
- Limitations and ethical considerations
- Version and update history
为每个模型准备的文档:
- 模型用途与预期使用场景
- 训练数据说明
- 按子群体划分的性能指标
- 局限性与伦理考量
- 版本与更新历史
AI Governance
AI治理
AI Risk Assessment
AI风险评估
Risk Categories (EU AI Act):
| Risk Level | Examples | Requirements |
|---|---|---|
| Unacceptable | Social scoring, manipulation | Prohibited |
| High | Healthcare, employment, credit | Strict requirements |
| Limited | Chatbots | Transparency obligations |
| Minimal | Spam filters | No requirements |
风险等级(欧盟AI法案):
| 风险等级 | 示例 | 要求 |
|---|---|---|
| 不可接受 | 社会评分、操纵类AI | 禁止使用 |
| 高风险 | 医疗、就业、信贷类AI | 严格合规要求 |
| 有限风险 | 聊天机器人 | 透明度义务 |
| 低风险 | 垃圾邮件过滤器 | 无特殊要求 |
Governance Framework
治理框架
- Policy: Define ethical principles and boundaries
- Process: Review and approval workflows
- People: Roles and responsibilities (ethics board)
- Technology: Tools for monitoring and enforcement
- 政策:定义伦理原则与边界
- 流程:审核与批准工作流
- 人员:明确角色与职责(如伦理委员会)
- 技术:用于监控与执行的工具
Documentation Requirements
文档要求
- Data provenance and lineage
- Model training documentation
- Testing and validation results
- Deployment and monitoring plans
- Incident response procedures
- 数据来源与谱系
- 模型训练文档
- 测试与验证结果
- 部署与监控计划
- 事件响应流程
Human Oversight
人类监督
Human-in-the-Loop Patterns
人在回路模式
| Pattern | Use Case | Example |
|---|---|---|
| Human-in-the-Loop | High-stakes decisions | Medical diagnosis confirmation |
| Human-on-the-Loop | Monitoring with intervention | Content moderation escalation |
| Human-out-of-Loop | Low-risk, high-volume | Spam filtering |
| 模式 | 使用场景 | 示例 |
|---|---|---|
| 人在回路(Human-in-the-Loop) | 高风险决策 | 医疗诊断确认 |
| 人在环上(Human-on-the-Loop) | 带干预的监控 | 内容审核升级 |
| 人在环外(Human-out-of-Loop) | 低风险、高批量场景 | 垃圾邮件过滤 |
Designing for Human Control
设计人类可控的系统
- Clear escalation paths
- Override capabilities
- Confidence thresholds for automation
- Audit trails
- Feedback mechanisms
- 清晰的升级路径
- 人工覆盖能力
- 自动化的置信度阈值
- 审计追踪
- 反馈机制
Privacy Considerations
隐私考量
Data Minimization
数据最小化
- Collect only necessary data
- Anonymize when possible
- Aggregate rather than individual data
- Delete data when no longer needed
- 仅收集必要数据
- 尽可能匿名化
- 使用聚合数据而非个体数据
- 不再需要时删除数据
Privacy-Preserving Techniques
隐私保护技术
- Differential privacy
- Federated learning
- Secure multi-party computation
- Homomorphic encryption
- Differential privacy(差分隐私)
- Federated learning(联邦学习)
- Secure multi-party computation(安全多方计算)
- Homomorphic encryption(同态加密)
Environmental Impact
环境影响
Considerations
考量因素
- Training compute requirements
- Inference energy consumption
- Hardware lifecycle
- Data center energy sources
- 训练所需的计算资源
- 推理阶段的能耗
- 硬件生命周期
- 数据中心的能源来源
Mitigation
缓解措施
- Efficient architectures
- Model distillation
- Transfer learning
- Green hosting providers
- 高效的模型架构
- 模型蒸馏
- 迁移学习
- 绿色托管服务商
Reference Files
参考文件
- - Detailed bias evaluation methodology
references/bias_assessment.md - - AI regulation requirements
references/regulatory_compliance.md
- - 详细的偏见评估方法论
references/bias_assessment.md - - AI监管要求
references/regulatory_compliance.md
Integration with Other Skills
与其他技能的集成
- machine-learning - For model development
- testing - For bias testing
- documentation - For model cards
- machine-learning - 用于模型开发
- testing - 用于偏见测试
- documentation - 用于制作模型卡片