privacy-research-engineer-safeguards
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrivacy Research Engineer, Safeguards
隐私研究工程师(防护方向)
When to Use
适用场景
- Frame privacy research questions for safeguard and moderation stacks
- Design PII detection/redaction benchmarks — precision/recall, re-identification risk
- Evaluate de-identification techniques (mask, tokenize, synthetic replace) on realistic prompts
- Study memorization and extraction — can models or logs leak user content?
- Curate privacy-sensitive datasets — synthetic data, consent boundaries, labeling rules
- Run ablations on detector architecture, threshold, or post-processing
- Define logging minimization — what safety systems may store vs must discard
- Write research memos with privacy–utility trade-offs and production recommendations
- Specify promotion criteria for privacy mitigations before prod rollout
- 为防护与审核栈构建隐私研究问题框架
- 设计PII检测/编辑基准——精确率/召回率、重识别风险
- 在真实提示语上评估去识别技术(掩码、 token化、合成替换)
- 研究记忆与提取问题——模型或日志是否可能泄露用户内容?
- 整理隐私敏感数据集——合成数据、同意边界、标注规则
- 对检测器架构、阈值或后处理流程进行消融实验
- 定义日志留存最小化规则——安全系统可存储与必须丢弃的内容
- 撰写包含隐私-效用权衡分析及生产落地建议的研究备忘录
- 制定隐私缓解措施在上线前的推广准入标准
When NOT to Use
不适用场景
- Audit evidence pipelines for GDPR/SOC 2 attestations →
compliance-engineer - Legal DPIA, acceptable-use policy, regulatory mapping →
ai-risk-governance - Harm categories, jailbreak benchmarks, toxic classifiers →
ml-research-engineer-safeguards - Deploy gateways, canaries, safety-path SLOs →
ml-infrastructure-engineer-safeguards - Red-team attack campaigns →
ai-redteam - Enterprise data governance architecture →
data-architect - Human-data platform product ethics (contributor labor) →
product-management-human-data-platform - General literature review unrelated to privacy in ML →
ai-researcher
- 为GDPR/SOC 2认证审核证据管道 → 请联系
compliance-engineer - 法律层面的DPIA、可接受使用政策、法规映射 → 请联系
ai-risk-governance - 危害类别定义、越狱基准测试、毒性分类器 → 请联系
ml-research-engineer-safeguards - 部署网关、金丝雀测试、安全路径SLO → 请联系
ml-infrastructure-engineer-safeguards - 红队攻击演练 → 请联系
ai-redteam - 企业数据治理架构 → 请联系
data-architect - 人类数据平台产品伦理(贡献者劳工相关) → 请联系
product-management-human-data-platform - 与机器学习隐私无关的通用文献综述 → 请联系
ai-researcher
Related skills
相关技能
| Need | Skill |
|---|---|
| Safety classifier research | |
| Safeguard production infra | |
| AI governance and DPIA framing | |
| Compliance controls and evidence | |
| Data classification and lineage | |
| Adversarial extraction testing | |
| General research methods | |
| Human-data platform privacy | |
| Release and incident ops | |
| 需求 | 技能 |
|---|---|
| 安全分类器研究 | |
| 防护系统生产基础设施 | |
| AI治理与DPIA框架搭建 | |
| 合规控制与证据管理 | |
| 数据分类与血缘管理 | |
| 对抗性提取测试 | |
| 通用研究方法 | |
| 人类数据平台隐私 | |
| 发布与事件运维 | |
Core Workflows
核心工作流
1. Privacy research framing
1. 隐私研究框架搭建
Threat model, metrics, baselines.
See .
references/privacy_research_framing.md威胁建模、指标设定、基准线确立。
详见 。
references/privacy_research_framing.md2. PII detection and redaction research
2. PII检测与编辑研究
Detectors, redaction quality, evals.
See .
references/pii_detection_redaction_research.md检测器、编辑质量、评估工作。
详见 。
references/pii_detection_redaction_research.md3. Memorization and extraction
3. 记忆与提取研究
Leakage studies, attack surfaces.
See .
references/memorization_and_extraction.md泄露研究、攻击面分析。
详见 。
references/memorization_and_extraction.md4. Privacy benchmarks and datasets
4. 隐私基准与数据集
Corpora, labeling, versioning.
See .
references/privacy_benchmarks_datasets.md语料库、标注、版本管理。
详见 。
references/privacy_benchmarks_datasets.md5. Logging and retention minimization
5. 日志与留存最小化
Safety observability without over-collection.
See .
references/logging_retention_minimization.md在不过度收集的前提下实现安全可观测性。
详见 。
references/logging_retention_minimization.md6. Handoff to production
6. 向生产环境交付
Promotion bar, monitoring hooks.
See .
references/privacy_to_production_handoff.md准入标准、监控钩子。
详见 。
references/privacy_to_production_handoff.mdOutputs
产出物
- Threat model — assets, adversaries, failure modes for privacy in safeguards
- Benchmark spec — PII types, locales, adversarial variants
- Results table — detection/redaction metrics by slice (language, format)
- Leakage study report — methodology, findings, confidence
- Logging policy draft — fields allowed, TTL, access controls (engineering input to legal)
- Promotion recommendation — go/no-go with privacy–utility summary
- 威胁模型——防护系统中的隐私资产、攻击者、失效模式
- 基准规范——PII类型、地区、对抗性变体
- 结果表格——按维度(语言、格式)划分的检测/编辑指标
- 泄露研究报告——方法论、发现、置信度
- 日志政策草案——允许存储的字段、TTL、访问控制(为法务提供工程输入)
- 推广建议——基于隐私-效用总结的上线/不上线决策
Principles
原则
- Minimize data — collect and retain only what eval and ops truly need
- Separate privacy from safety metrics — low PII leak rate is not interchangeable with low toxicity FN
- Locale and format matter — email in one language ≠ global PII detector
- Synthetic ≠ risk-free — synthetic PII can still encode patterns; document limits
- Legal review for human data — research plans involving real user content need governance sign-off
- 数据最小化——仅收集和留存评估与运维真正需要的数据
- 隐私与安全指标分离——低PII泄露率不能等同于低毒性漏报率
- 地区与格式至关重要——单一语言的邮箱检测≠通用PII检测器
- 合成数据≠无风险——合成PII仍可能编码模式;需记录其局限性
- 人类数据需法务审核——涉及真实用户内容的研究计划需获得治理层面的批准