llm-safety-patterns
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Safety Patterns
LLM安全模式
The Core Principle
核心原则
Identifiers flow AROUND the LLM, not THROUGH it. The LLM sees only content. Attribution happens deterministically.
标识符绕开LLM流转,而非穿过LLM。 LLM仅能看到内容,归因过程是确定性的。
Why This Matters
为何这一点至关重要
When identifiers appear in prompts, bad things happen:
- Hallucination: LLM invents IDs that don't exist
- Confusion: LLM mixes up which ID belongs where
- Injection: Attacker manipulates IDs via prompt injection
- Leakage: IDs appear in logs, caches, traces
- Cross-tenant: LLM could reference other users' data
当标识符出现在提示词中时,会引发以下问题:
- 幻觉: LLM会生成不存在的ID
- 混淆: LLM会搞混ID的归属
- 注入: 攻击者通过提示注入操纵ID
- 泄露: ID出现在日志、缓存、追踪信息中
- 跨租户: LLM可能引用其他用户的数据
The Architecture
架构
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ SYSTEM CONTEXT (flows around LLM) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ PRE-LLM │ ┌─────────────────────┐ │POST-LLM │ │
│ │ FILTER │──────▶│ LLM │───────────▶│ATTRIBUTE│ │
│ │ │ │ │ │ │ │
│ │ Returns │ │ Sees ONLY: │ │ Adds: │ │
│ │ CONTENT │ │ - content text │ │ - IDs │ │
│ │ (no IDs)│ │ - context text │ │ - refs │ │
│ └─────────┘ │ (NO IDs!) │ └─────────┘ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ SYSTEM CONTEXT (flows around LLM) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ PRE-LLM │ ┌─────────────────────┐ │POST-LLM │ │
│ │ FILTER │──────▶│ LLM │───────────▶│ATTRIBUTE│ │
│ │ │ │ │ │ │ │
│ │ Returns │ │ Sees ONLY: │ │ Adds: │ │
│ │ CONTENT │ │ - content text │ │ - IDs │ │
│ │ (no IDs)│ │ - context text │ │ - refs │ │
│ └─────────┘ │ (NO IDs!) │ └─────────┘ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘What NEVER Goes in Prompts
绝对不能出现在提示词中的内容
OrchestKit Forbidden Parameters
OrchestKit 禁用参数
| Parameter | Type | Why Forbidden |
|---|---|---|
| UUID | Can be hallucinated, enables cross-user access |
| UUID | Critical for multi-tenant isolation |
| UUID | Job tracking, not for LLM |
| UUID | Source tracking, not for LLM |
| UUID | Output tracking, not for LLM |
| UUID | RAG reference, not for LLM |
| str | Auth context, not for LLM |
| str | Observability, not for LLM |
| Any UUID | UUID | Pattern: |
| 参数 | 类型 | 禁用原因 |
|---|---|---|
| UUID | 可能被幻觉生成,导致跨用户访问 |
| UUID | 对多租户隔离至关重要 |
| UUID | 用于任务追踪,不适合LLM处理 |
| UUID | 用于来源追踪,不适合LLM处理 |
| UUID | 用于输出追踪,不适合LLM处理 |
| UUID | RAG引用,不适合LLM处理 |
| str | 认证上下文,不适合LLM处理 |
| str | 可观测性相关,不适合LLM处理 |
| 任何UUID | UUID | 格式: |
Detection Pattern
检测模式
python
import re
FORBIDDEN_PATTERNS = [
r'user[_-]?id',
r'tenant[_-]?id',
r'analysis[_-]?id',
r'document[_-]?id',
r'artifact[_-]?id',
r'chunk[_-]?id',
r'session[_-]?id',
r'trace[_-]?id',
r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]
def audit_prompt(prompt: str) -> list[str]:
"""Check for forbidden patterns in prompt"""
violations = []
for pattern in FORBIDDEN_PATTERNS:
if re.search(pattern, prompt, re.IGNORECASE):
violations.append(pattern)
return violationspython
import re
FORBIDDEN_PATTERNS = [
r'user[_-]?id',
r'tenant[_-]?id',
r'analysis[_-]?id',
r'document[_-]?id',
r'artifact[_-]?id',
r'chunk[_-]?id',
r'session[_-]?id',
r'trace[_-]?id',
r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]
def audit_prompt(prompt: str) -> list[str]:
"""Check for forbidden patterns in prompt"""
violations = []
for pattern in FORBIDDEN_PATTERNS:
if re.search(pattern, prompt, re.IGNORECASE):
violations.append(pattern)
return violationsThe Three-Phase Pattern
三阶段模式
Phase 1: Pre-LLM (Filter & Extract)
阶段1:LLM预处理(过滤与提取)
python
async def prepare_for_llm(
query: str,
ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
"""
Filter data and extract content for LLM.
Returns: (content, context_texts, source_references)
"""
# 1. Retrieve with tenant filter
documents = await semantic_search(
query_embedding=embed(query),
ctx=ctx, # Filters by tenant_id, user_id
)
# 2. Save references for attribution
source_refs = SourceRefs(
document_ids=[d.id for d in documents],
chunk_ids=[c.id for c in chunks],
)
# 3. Extract content only (no IDs)
content_texts = [d.content for d in documents]
return query, content_texts, source_refspython
async def prepare_for_llm(
query: str,
ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
"""
Filter data and extract content for LLM.
Returns: (content, context_texts, source_references)
"""
# 1. Retrieve with tenant filter
documents = await semantic_search(
query_embedding=embed(query),
ctx=ctx, # Filters by tenant_id, user_id
)
# 2. Save references for attribution
source_refs = SourceRefs(
document_ids=[d.id for d in documents],
chunk_ids=[c.id for c in chunks],
)
# 3. Extract content only (no IDs)
content_texts = [d.content for d in documents]
return query, content_texts, source_refsPhase 2: LLM Call (Content Only)
阶段2:LLM调用(仅含内容)
python
def build_prompt(content: str, context_texts: list[str]) -> str:
"""
Build prompt with ONLY content, no identifiers.
"""
prompt = f"""
Analyze the following content and provide insights.
CONTENT:
{content}
RELEVANT CONTEXT:
{chr(10).join(f"- {text}" for text in context_texts)}
Provide analysis covering:
1. Key concepts
2. Prerequisites
3. Learning objectives
"""
# AUDIT: Verify no IDs leaked
violations = audit_prompt(prompt)
if violations:
raise SecurityError(f"IDs leaked to prompt: {violations}")
return prompt
async def call_llm(prompt: str) -> dict:
"""LLM only sees content, never IDs"""
response = await llm.generate(prompt)
return parse_response(response)python
def build_prompt(content: str, context_texts: list[str]) -> str:
"""
Build prompt with ONLY content, no identifiers.
"""
prompt = f"""
Analyze the following content and provide insights.
CONTENT:
{content}
RELEVANT CONTEXT:
{chr(10).join(f"- {text}" for text in context_texts)}
Provide analysis covering:
1. Key concepts
2. Prerequisites
3. Learning objectives
"""
# AUDIT: Verify no IDs leaked
violations = audit_prompt(prompt)
if violations:
raise SecurityError(f"IDs leaked to prompt: {violations}")
return prompt
async def call_llm(prompt: str) -> dict:
"""LLM only sees content, never IDs"""
response = await llm.generate(prompt)
return parse_response(response)Phase 3: Post-LLM (Attribute)
阶段3:LLM后处理(归因)
python
async def save_with_attribution(
llm_output: dict,
ctx: RequestContext,
source_refs: SourceRefs,
) -> Analysis:
"""
Attach context and references to LLM output.
Attribution is deterministic, not LLM-generated.
"""
return await Analysis.create(
# Generated
id=uuid4(),
# From RequestContext (system-provided)
user_id=ctx.user_id,
tenant_id=ctx.tenant_id,
analysis_id=ctx.resource_id,
trace_id=ctx.trace_id,
# From Pre-LLM refs (deterministic)
source_document_ids=source_refs.document_ids,
source_chunk_ids=source_refs.chunk_ids,
# From LLM (content only)
content=llm_output["analysis"],
key_concepts=llm_output["key_concepts"],
difficulty=llm_output["difficulty"],
# Metadata
created_at=datetime.now(timezone.utc),
model_used=MODEL_NAME,
)python
async def save_with_attribution(
llm_output: dict,
ctx: RequestContext,
source_refs: SourceRefs,
) -> Analysis:
"""
Attach context and references to LLM output.
Attribution is deterministic, not LLM-generated.
"""
return await Analysis.create(
# Generated
id=uuid4(),
# From RequestContext (system-provided)
user_id=ctx.user_id,
tenant_id=ctx.tenant_id,
analysis_id=ctx.resource_id,
trace_id=ctx.trace_id,
# From Pre-LLM refs (deterministic)
source_document_ids=source_refs.document_ids,
source_chunk_ids=source_refs.chunk_ids,
# From LLM (content only)
content=llm_output["analysis"],
key_concepts=llm_output["key_concepts"],
difficulty=llm_output["difficulty"],
# Metadata
created_at=datetime.now(timezone.utc),
model_used=MODEL_NAME,
)Output Validation
输出验证
After LLM returns, validate:
- Schema: Response matches expected structure
- Guardrails: No toxic/harmful content
- Grounding: Claims are supported by provided context
- No IDs: LLM didn't hallucinate any IDs
python
async def validate_output(
llm_output: dict,
context_texts: list[str],
) -> ValidationResult:
"""Validate LLM output before use"""
# 1. Schema validation
try:
parsed = AnalysisOutput.model_validate(llm_output)
except ValidationError as e:
return ValidationResult(valid=False, reason=f"Schema error: {e}")
# 2. Guardrails
if await contains_toxic_content(parsed.content):
return ValidationResult(valid=False, reason="Toxic content detected")
# 3. Grounding check
if not is_grounded(parsed.content, context_texts):
return ValidationResult(valid=False, reason="Ungrounded claims")
# 4. No hallucinated IDs
if contains_uuid_pattern(parsed.content):
return ValidationResult(valid=False, reason="Hallucinated IDs")
return ValidationResult(valid=True)LLM返回结果后,需验证以下内容:
- Schema: 响应符合预期结构
- Guardrails: 无有毒/有害内容
- Grounding: 声明有提供的上下文支撑
- 无ID: LLM未生成幻觉ID
python
async def validate_output(
llm_output: dict,
context_texts: list[str],
) -> ValidationResult:
"""Validate LLM output before use"""
# 1. Schema validation
try:
parsed = AnalysisOutput.model_validate(llm_output)
except ValidationError as e:
return ValidationResult(valid=False, reason=f"Schema error: {e}")
# 2. Guardrails
if await contains_toxic_content(parsed.content):
return ValidationResult(valid=False, reason="Toxic content detected")
# 3. Grounding check
if not is_grounded(parsed.content, context_texts):
return ValidationResult(valid=False, reason="Ungrounded claims")
# 4. No hallucinated IDs
if contains_uuid_pattern(parsed.content):
return ValidationResult(valid=False, reason="Hallucinated IDs")
return ValidationResult(valid=True)Integration Points in OrchestKit
OrchestKit中的集成点
Content Analysis Workflow
内容分析工作流
backend/app/workflows/
├── agents/
│ ├── execution.py # Add context separation
│ └── prompts/ # Audit all prompts
├── tasks/
│ └── generate_artifact.py # Add attributionbackend/app/workflows/
├── agents/
│ ├── execution.py # Add context separation
│ └── prompts/ # Audit all prompts
├── tasks/
│ └── generate_artifact.py # Add attributionServices
服务
backend/app/services/
├── embeddings/ # Pre-LLM filtering
└── analysis/ # Post-LLM attributionbackend/app/services/
├── embeddings/ # Pre-LLM filtering
└── analysis/ # Post-LLM attributionChecklist Before Any LLM Call
任何LLM调用前的检查清单
- RequestContext available
- Data filtered by tenant_id and user_id
- Content extracted without IDs
- Source references saved
- Prompt passes audit (no forbidden patterns)
- Output validated before use
- Attribution uses context, not LLM output
- 已获取RequestContext
- 数据已按tenant_id和user_id过滤
- 已提取不含ID的内容
- 已保存来源引用
- 提示词通过审核(无禁用模式)
- 输出已验证后再使用
- 归因使用上下文而非LLM输出
Related Skills
相关技能
- - Input sanitization patterns that complement LLM safety
input-validation - - RAG pipeline patterns requiring tenant-scoped retrieval
rag-retrieval - - Output quality assessment including hallucination detection
llm-evaluation - - Automated security scanning for LLM integrations
security-scanning - - 8-layer security architecture including Tavily prompt injection firewall at Layer 2
defense-in-depth
- - 与LLM安全互补的输入清理模式
input-validation - - 需租户范围检索的RAG管道模式
rag-retrieval - - 包括幻觉检测的输出质量评估
llm-evaluation - - LLM集成的自动化安全扫描
security-scanning - - 8层安全架构,包括第2层的Tavily提示注入防火墙
defense-in-depth
Key Decisions
关键决策
| Decision | Choice | Rationale |
|---|---|---|
| ID handling | Flow around LLM, never through | Prevents hallucination, injection, and cross-tenant leakage |
| Output validation | Schema + guardrails + grounding | Defense-in-depth for LLM outputs |
| Attribution approach | Deterministic post-LLM | System context provides IDs, not LLM |
| Prompt auditing | Regex pattern matching | Fast detection of forbidden identifiers |
Version: 1.0.0 (December 2025)
| 决策 | 选择 | 理由 |
|---|---|---|
| ID处理方式 | 绕开LLM流转,而非穿过LLM | 防止幻觉生成、注入攻击和跨租户数据泄露 |
| 输出验证 | Schema + 防护机制 + 事实依据 | 为LLM输出提供纵深防御 |
| 归因方式 | 确定性LLM后处理 | 由系统上下文提供ID,而非LLM生成 |
| 提示词审核 | 正则表达式模式匹配 | 快速检测禁用标识符 |
版本: 1.0.0(2025年12月)
Capability Details
能力详情
context-separation
context-separation
Keywords: context separation, prompt context, id in prompt, parameterized
Solves:
- How do I prevent IDs from leaking into prompts?
- How do I separate system context from prompt content?
- What should never appear in LLM prompts?
关键词: context separation, prompt context, id in prompt, parameterized
解决问题:
- 如何防止ID泄露到提示词中?
- 如何分离系统上下文与提示词内容?
- 绝对不能出现在LLM提示词中的内容有哪些?
pre-llm-filtering
pre-llm-filtering
Keywords: pre-llm, rag filter, data filter, tenant filter
Solves:
- How do I filter data before sending to LLM?
- How do I ensure tenant isolation in RAG?
- How do I scope retrieval to current user?
关键词: pre-llm, rag filter, data filter, tenant filter
解决问题:
- 如何在发送给LLM前过滤数据?
- 如何确保RAG中的租户隔离?
- 如何将检索范围限定为当前用户?
post-llm-attribution
post-llm-attribution
Keywords: attribution, source tracking, provenance, citation
Solves:
- How do I track which sources the LLM used?
- How do I attribute results correctly?
- How do I avoid LLM-generated IDs?
关键词: attribution, source tracking, provenance, citation
解决问题:
- 如何追踪LLM使用的来源?
- 如何正确归因结果?
- 如何避免LLM生成的ID?
output-guardrails
output-guardrails
Keywords: guardrail, output validation, hallucination, toxicity
Solves:
- How do I validate LLM output?
- How do I detect hallucinations?
- How do I prevent toxic content generation?
关键词: guardrail, output validation, hallucination, toxicity
解决问题:
- 如何验证LLM输出?
- 如何检测幻觉?
- 如何防止生成有毒内容?
prompt-audit
prompt-audit
Keywords: prompt audit, prompt security, prompt injection
Solves:
- How do I verify no IDs leaked to prompts?
- How do I audit prompts for security?
- How do I prevent prompt injection?
关键词: prompt audit, prompt security, prompt injection
解决问题:
- 如何验证无ID泄露到提示词中?
- 如何审核提示词的安全性?
- 如何防止提示注入?