ai-ethics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Ethics

AI伦理

Comprehensive AI ethics skill covering bias detection, fairness assessment, responsible AI development, and regulatory compliance.
涵盖偏见检测、公平性评估、负责任AI开发及合规性的全面AI伦理技能。

When to Use This Skill

何时使用该技能

  • Evaluating AI models for bias
  • Implementing fairness measures
  • Conducting ethical impact assessments
  • Ensuring regulatory compliance (EU AI Act, etc.)
  • Designing human-in-the-loop systems
  • Creating AI transparency documentation
  • Developing AI governance frameworks
  • 评估AI模型的偏见
  • 实施公平性措施
  • 开展伦理影响评估
  • 确保合规(如欧盟AI法案等)
  • 设计人在回路(human-in-the-loop)系统
  • 制作AI透明度文档
  • 开发AI治理框架

Ethical Principles

伦理原则

Core AI Ethics Principles

核心AI伦理原则

PrincipleDescription
FairnessAI should not discriminate against individuals or groups
TransparencyAI decisions should be explainable
PrivacyPersonal data must be protected
AccountabilityClear responsibility for AI outcomes
SafetyAI should not cause harm
Human AgencyHumans should maintain control
原则说明
公平性(Fairness)AI不应歧视个人或群体
透明度(Transparency)AI的决策应具备可解释性
隐私性(Privacy)个人数据必须得到保护
问责性(Accountability)AI结果需明确责任归属
安全性(Safety)AI不应造成伤害
人类自主性(Human Agency)人类应保持对AI的控制权

Stakeholder Considerations

利益相关方考量

  • Users: How does this affect people using the system?
  • Subjects: How does this affect people the AI makes decisions about?
  • Society: What are broader societal implications?
  • Environment: What is the environmental impact?
  • 用户:这会如何影响使用系统的人群?
  • 受决策影响者:这会如何影响AI决策所针对的人群?
  • 社会:更广泛的社会影响是什么?
  • 环境:对环境有何影响?

Bias Detection & Mitigation

偏见检测与缓解

Types of AI Bias

AI偏见类型

Bias TypeSourceExample
HistoricalTraining data reflects past discriminationHiring models favoring male candidates
RepresentationUnderrepresented groups in training dataFace recognition failing on darker skin
MeasurementProxy variables for protected attributesZIP code correlating with race
AggregationOne model for diverse populationsMedical model trained only on one ethnicity
EvaluationBiased evaluation metricsAccuracy hiding disparate impact
偏见类型来源示例
历史偏见训练数据反映过去的歧视招聘模型偏向男性候选人
代表性偏见训练数据中群体代表性不足人脸识别在深色皮肤人群上失效
测量偏见用代理变量替代受保护属性邮政编码与种族相关联
聚合偏见为多样化群体使用单一模型仅基于单一族群训练的医疗模型
评估偏见评估指标存在偏见准确率指标掩盖差异化影响

Fairness Metrics

公平性指标

Group Fairness:
  • Demographic Parity: Equal positive rates across groups
  • Equalized Odds: Equal TPR and FPR across groups
  • Predictive Parity: Equal precision across groups
Individual Fairness:
  • Similar individuals should receive similar predictions
  • Counterfactual fairness: Would outcome change if protected attribute differed?
群体公平性:
  • 人口统计均等:不同群体的阳性率一致
  • 均等机会:不同群体的真阳性率(TPR)和假阳性率(FPR)一致
  • 预测均等:不同群体的精确率一致
个体公平性:
  • 相似个体应得到相似的预测结果
  • 反事实公平性:若受保护属性不同,结果是否会改变?

Bias Mitigation Strategies

偏见缓解策略

Pre-processing:
  • Resampling/reweighting training data
  • Removing biased features
  • Data augmentation for underrepresented groups
In-processing:
  • Fairness constraints in loss function
  • Adversarial debiasing
  • Fair representation learning
Post-processing:
  • Threshold adjustment per group
  • Calibration
  • Reject option classification
预处理:
  • 对训练数据进行重采样/加权
  • 移除带有偏见的特征
  • 为代表性不足的群体扩充数据
中处理:
  • 在损失函数中加入公平性约束
  • 对抗性去偏
  • 公平表示学习
后处理:
  • 按群体调整决策阈值
  • 校准
  • 拒绝选项分类

Explainability & Transparency

可解释性与透明度

Explanation Types

解释类型

TypeAudiencePurpose
GlobalDevelopersUnderstand overall model behavior
LocalEnd usersExplain specific decisions
CounterfactualAffected partiesWhat would need to change for different outcome
类型受众目的
全局解释开发者理解模型整体行为
局部解释终端用户解释具体决策
反事实解释受影响方说明需改变哪些因素才能得到不同结果

Explainability Techniques

可解释性技术

  • SHAP: Feature importance values
  • LIME: Local interpretable explanations
  • Attention maps: For neural networks
  • Decision trees: Inherently interpretable
  • Feature importance: Global model understanding
  • SHAP:特征重要性数值
  • LIME:局部可解释性解释
  • 注意力图:适用于神经网络
  • 决策树:天生具备可解释性
  • 特征重要性:全局模型理解

Model Cards

模型卡片(Model Cards)

Document for each model:
  • Model purpose and intended use
  • Training data description
  • Performance metrics by subgroup
  • Limitations and ethical considerations
  • Version and update history
为每个模型准备的文档:
  • 模型用途与预期使用场景
  • 训练数据说明
  • 按子群体划分的性能指标
  • 局限性与伦理考量
  • 版本与更新历史

AI Governance

AI治理

AI Risk Assessment

AI风险评估

Risk Categories (EU AI Act):
Risk LevelExamplesRequirements
UnacceptableSocial scoring, manipulationProhibited
HighHealthcare, employment, creditStrict requirements
LimitedChatbotsTransparency obligations
MinimalSpam filtersNo requirements
风险等级(欧盟AI法案):
风险等级示例要求
不可接受社会评分、操纵类AI禁止使用
高风险医疗、就业、信贷类AI严格合规要求
有限风险聊天机器人透明度义务
低风险垃圾邮件过滤器无特殊要求

Governance Framework

治理框架

  1. Policy: Define ethical principles and boundaries
  2. Process: Review and approval workflows
  3. People: Roles and responsibilities (ethics board)
  4. Technology: Tools for monitoring and enforcement
  1. 政策:定义伦理原则与边界
  2. 流程:审核与批准工作流
  3. 人员:明确角色与职责(如伦理委员会)
  4. 技术:用于监控与执行的工具

Documentation Requirements

文档要求

  • Data provenance and lineage
  • Model training documentation
  • Testing and validation results
  • Deployment and monitoring plans
  • Incident response procedures
  • 数据来源与谱系
  • 模型训练文档
  • 测试与验证结果
  • 部署与监控计划
  • 事件响应流程

Human Oversight

人类监督

Human-in-the-Loop Patterns

人在回路模式

PatternUse CaseExample
Human-in-the-LoopHigh-stakes decisionsMedical diagnosis confirmation
Human-on-the-LoopMonitoring with interventionContent moderation escalation
Human-out-of-LoopLow-risk, high-volumeSpam filtering
模式使用场景示例
人在回路(Human-in-the-Loop)高风险决策医疗诊断确认
人在环上(Human-on-the-Loop)带干预的监控内容审核升级
人在环外(Human-out-of-Loop)低风险、高批量场景垃圾邮件过滤

Designing for Human Control

设计人类可控的系统

  • Clear escalation paths
  • Override capabilities
  • Confidence thresholds for automation
  • Audit trails
  • Feedback mechanisms
  • 清晰的升级路径
  • 人工覆盖能力
  • 自动化的置信度阈值
  • 审计追踪
  • 反馈机制

Privacy Considerations

隐私考量

Data Minimization

数据最小化

  • Collect only necessary data
  • Anonymize when possible
  • Aggregate rather than individual data
  • Delete data when no longer needed
  • 仅收集必要数据
  • 尽可能匿名化
  • 使用聚合数据而非个体数据
  • 不再需要时删除数据

Privacy-Preserving Techniques

隐私保护技术

  • Differential privacy
  • Federated learning
  • Secure multi-party computation
  • Homomorphic encryption
  • Differential privacy(差分隐私)
  • Federated learning(联邦学习)
  • Secure multi-party computation(安全多方计算)
  • Homomorphic encryption(同态加密)

Environmental Impact

环境影响

Considerations

考量因素

  • Training compute requirements
  • Inference energy consumption
  • Hardware lifecycle
  • Data center energy sources
  • 训练所需的计算资源
  • 推理阶段的能耗
  • 硬件生命周期
  • 数据中心的能源来源

Mitigation

缓解措施

  • Efficient architectures
  • Model distillation
  • Transfer learning
  • Green hosting providers
  • 高效的模型架构
  • 模型蒸馏
  • 迁移学习
  • 绿色托管服务商

Reference Files

参考文件

  • references/bias_assessment.md
    - Detailed bias evaluation methodology
  • references/regulatory_compliance.md
    - AI regulation requirements
  • references/bias_assessment.md
    - 详细的偏见评估方法论
  • references/regulatory_compliance.md
    - AI监管要求

Integration with Other Skills

与其他技能的集成

  • machine-learning - For model development
  • testing - For bias testing
  • documentation - For model cards
  • machine-learning - 用于模型开发
  • testing - 用于偏见测试
  • documentation - 用于制作模型卡片