llm-safety-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LLM Safety Patterns

LLM安全模式

The Core Principle

核心原则

Identifiers flow AROUND the LLM, not THROUGH it. The LLM sees only content. Attribution happens deterministically.
标识符绕开LLM流转,而非穿过LLM。 LLM仅能看到内容,归因过程是确定性的。

Why This Matters

为何这一点至关重要

When identifiers appear in prompts, bad things happen:
  1. Hallucination: LLM invents IDs that don't exist
  2. Confusion: LLM mixes up which ID belongs where
  3. Injection: Attacker manipulates IDs via prompt injection
  4. Leakage: IDs appear in logs, caches, traces
  5. Cross-tenant: LLM could reference other users' data
当标识符出现在提示词中时,会引发以下问题:
  1. 幻觉: LLM会生成不存在的ID
  2. 混淆: LLM会搞混ID的归属
  3. 注入: 攻击者通过提示注入操纵ID
  4. 泄露: ID出现在日志、缓存、追踪信息中
  5. 跨租户: LLM可能引用其他用户的数据

The Architecture

架构

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   SYSTEM CONTEXT (flows around LLM)                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions     │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│        │                                                       │        │
│        │                                                       │        │
│        ▼                                                       ▼        │
│   ┌─────────┐                                           ┌─────────┐    │
│   │ PRE-LLM │       ┌─────────────────────┐            │POST-LLM │    │
│   │ FILTER  │──────▶│        LLM          │───────────▶│ATTRIBUTE│    │
│   │         │       │                     │            │         │    │
│   │ Returns │       │ Sees ONLY:          │            │ Adds:   │    │
│   │ CONTENT │       │ - content text      │            │ - IDs   │    │
│   │ (no IDs)│       │ - context text      │            │ - refs  │    │
│   └─────────┘       │ (NO IDs!)           │            └─────────┘    │
│                     └─────────────────────┘                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   SYSTEM CONTEXT (flows around LLM)                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions     │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│        │                                                       │        │
│        │                                                       │        │
│        ▼                                                       ▼        │
│   ┌─────────┐                                           ┌─────────┐    │
│   │ PRE-LLM │       ┌─────────────────────┐            │POST-LLM │    │
│   │ FILTER  │──────▶│        LLM          │───────────▶│ATTRIBUTE│    │
│   │         │       │                     │            │         │    │
│   │ Returns │       │ Sees ONLY:          │            │ Adds:   │    │
│   │ CONTENT │       │ - content text      │            │ - IDs   │    │
│   │ (no IDs)│       │ - context text      │            │ - refs  │    │
│   └─────────┘       │ (NO IDs!)           │            └─────────┘    │
│                     └─────────────────────┘                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

What NEVER Goes in Prompts

绝对不能出现在提示词中的内容

OrchestKit Forbidden Parameters

OrchestKit 禁用参数

ParameterTypeWhy Forbidden
user_id
UUIDCan be hallucinated, enables cross-user access
tenant_id
UUIDCritical for multi-tenant isolation
analysis_id
UUIDJob tracking, not for LLM
document_id
UUIDSource tracking, not for LLM
artifact_id
UUIDOutput tracking, not for LLM
chunk_id
UUIDRAG reference, not for LLM
session_id
strAuth context, not for LLM
trace_id
strObservability, not for LLM
Any UUIDUUIDPattern:
[0-9a-f]{8}-...
参数类型禁用原因
user_id
UUID可能被幻觉生成,导致跨用户访问
tenant_id
UUID对多租户隔离至关重要
analysis_id
UUID用于任务追踪,不适合LLM处理
document_id
UUID用于来源追踪,不适合LLM处理
artifact_id
UUID用于输出追踪,不适合LLM处理
chunk_id
UUIDRAG引用,不适合LLM处理
session_id
str认证上下文,不适合LLM处理
trace_id
str可观测性相关,不适合LLM处理
任何UUIDUUID格式:
[0-9a-f]{8}-...

Detection Pattern

检测模式

python
import re

FORBIDDEN_PATTERNS = [
    r'user[_-]?id',
    r'tenant[_-]?id',
    r'analysis[_-]?id',
    r'document[_-]?id',
    r'artifact[_-]?id',
    r'chunk[_-]?id',
    r'session[_-]?id',
    r'trace[_-]?id',
    r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]

def audit_prompt(prompt: str) -> list[str]:
    """Check for forbidden patterns in prompt"""
    violations = []
    for pattern in FORBIDDEN_PATTERNS:
        if re.search(pattern, prompt, re.IGNORECASE):
            violations.append(pattern)
    return violations
python
import re

FORBIDDEN_PATTERNS = [
    r'user[_-]?id',
    r'tenant[_-]?id',
    r'analysis[_-]?id',
    r'document[_-]?id',
    r'artifact[_-]?id',
    r'chunk[_-]?id',
    r'session[_-]?id',
    r'trace[_-]?id',
    r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]

def audit_prompt(prompt: str) -> list[str]:
    """Check for forbidden patterns in prompt"""
    violations = []
    for pattern in FORBIDDEN_PATTERNS:
        if re.search(pattern, prompt, re.IGNORECASE):
            violations.append(pattern)
    return violations

The Three-Phase Pattern

三阶段模式

Phase 1: Pre-LLM (Filter & Extract)

阶段1:LLM预处理(过滤与提取)

python
async def prepare_for_llm(
    query: str,
    ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
    """
    Filter data and extract content for LLM.
    Returns: (content, context_texts, source_references)
    """
    # 1. Retrieve with tenant filter
    documents = await semantic_search(
        query_embedding=embed(query),
        ctx=ctx,  # Filters by tenant_id, user_id
    )

    # 2. Save references for attribution
    source_refs = SourceRefs(
        document_ids=[d.id for d in documents],
        chunk_ids=[c.id for c in chunks],
    )

    # 3. Extract content only (no IDs)
    content_texts = [d.content for d in documents]

    return query, content_texts, source_refs
python
async def prepare_for_llm(
    query: str,
    ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
    """
    Filter data and extract content for LLM.
    Returns: (content, context_texts, source_references)
    """
    # 1. Retrieve with tenant filter
    documents = await semantic_search(
        query_embedding=embed(query),
        ctx=ctx,  # Filters by tenant_id, user_id
    )

    # 2. Save references for attribution
    source_refs = SourceRefs(
        document_ids=[d.id for d in documents],
        chunk_ids=[c.id for c in chunks],
    )

    # 3. Extract content only (no IDs)
    content_texts = [d.content for d in documents]

    return query, content_texts, source_refs

Phase 2: LLM Call (Content Only)

阶段2:LLM调用(仅含内容)

python
def build_prompt(content: str, context_texts: list[str]) -> str:
    """
    Build prompt with ONLY content, no identifiers.
    """
    prompt = f"""
    Analyze the following content and provide insights.

    CONTENT:
    {content}

    RELEVANT CONTEXT:
    {chr(10).join(f"- {text}" for text in context_texts)}

    Provide analysis covering:
    1. Key concepts
    2. Prerequisites
    3. Learning objectives
    """

    # AUDIT: Verify no IDs leaked
    violations = audit_prompt(prompt)
    if violations:
        raise SecurityError(f"IDs leaked to prompt: {violations}")

    return prompt

async def call_llm(prompt: str) -> dict:
    """LLM only sees content, never IDs"""
    response = await llm.generate(prompt)
    return parse_response(response)
python
def build_prompt(content: str, context_texts: list[str]) -> str:
    """
    Build prompt with ONLY content, no identifiers.
    """
    prompt = f"""
    Analyze the following content and provide insights.

    CONTENT:
    {content}

    RELEVANT CONTEXT:
    {chr(10).join(f"- {text}" for text in context_texts)}

    Provide analysis covering:
    1. Key concepts
    2. Prerequisites
    3. Learning objectives
    """

    # AUDIT: Verify no IDs leaked
    violations = audit_prompt(prompt)
    if violations:
        raise SecurityError(f"IDs leaked to prompt: {violations}")

    return prompt

async def call_llm(prompt: str) -> dict:
    """LLM only sees content, never IDs"""
    response = await llm.generate(prompt)
    return parse_response(response)

Phase 3: Post-LLM (Attribute)

阶段3:LLM后处理(归因)

python
async def save_with_attribution(
    llm_output: dict,
    ctx: RequestContext,
    source_refs: SourceRefs,
) -> Analysis:
    """
    Attach context and references to LLM output.
    Attribution is deterministic, not LLM-generated.
    """
    return await Analysis.create(
        # Generated
        id=uuid4(),

        # From RequestContext (system-provided)
        user_id=ctx.user_id,
        tenant_id=ctx.tenant_id,
        analysis_id=ctx.resource_id,
        trace_id=ctx.trace_id,

        # From Pre-LLM refs (deterministic)
        source_document_ids=source_refs.document_ids,
        source_chunk_ids=source_refs.chunk_ids,

        # From LLM (content only)
        content=llm_output["analysis"],
        key_concepts=llm_output["key_concepts"],
        difficulty=llm_output["difficulty"],

        # Metadata
        created_at=datetime.now(timezone.utc),
        model_used=MODEL_NAME,
    )
python
async def save_with_attribution(
    llm_output: dict,
    ctx: RequestContext,
    source_refs: SourceRefs,
) -> Analysis:
    """
    Attach context and references to LLM output.
    Attribution is deterministic, not LLM-generated.
    """
    return await Analysis.create(
        # Generated
        id=uuid4(),

        # From RequestContext (system-provided)
        user_id=ctx.user_id,
        tenant_id=ctx.tenant_id,
        analysis_id=ctx.resource_id,
        trace_id=ctx.trace_id,

        # From Pre-LLM refs (deterministic)
        source_document_ids=source_refs.document_ids,
        source_chunk_ids=source_refs.chunk_ids,

        # From LLM (content only)
        content=llm_output["analysis"],
        key_concepts=llm_output["key_concepts"],
        difficulty=llm_output["difficulty"],

        # Metadata
        created_at=datetime.now(timezone.utc),
        model_used=MODEL_NAME,
    )

Output Validation

输出验证

After LLM returns, validate:
  1. Schema: Response matches expected structure
  2. Guardrails: No toxic/harmful content
  3. Grounding: Claims are supported by provided context
  4. No IDs: LLM didn't hallucinate any IDs
python
async def validate_output(
    llm_output: dict,
    context_texts: list[str],
) -> ValidationResult:
    """Validate LLM output before use"""

    # 1. Schema validation
    try:
        parsed = AnalysisOutput.model_validate(llm_output)
    except ValidationError as e:
        return ValidationResult(valid=False, reason=f"Schema error: {e}")

    # 2. Guardrails
    if await contains_toxic_content(parsed.content):
        return ValidationResult(valid=False, reason="Toxic content detected")

    # 3. Grounding check
    if not is_grounded(parsed.content, context_texts):
        return ValidationResult(valid=False, reason="Ungrounded claims")

    # 4. No hallucinated IDs
    if contains_uuid_pattern(parsed.content):
        return ValidationResult(valid=False, reason="Hallucinated IDs")

    return ValidationResult(valid=True)
LLM返回结果后,需验证以下内容:
  1. Schema: 响应符合预期结构
  2. Guardrails: 无有毒/有害内容
  3. Grounding: 声明有提供的上下文支撑
  4. 无ID: LLM未生成幻觉ID
python
async def validate_output(
    llm_output: dict,
    context_texts: list[str],
) -> ValidationResult:
    """Validate LLM output before use"""

    # 1. Schema validation
    try:
        parsed = AnalysisOutput.model_validate(llm_output)
    except ValidationError as e:
        return ValidationResult(valid=False, reason=f"Schema error: {e}")

    # 2. Guardrails
    if await contains_toxic_content(parsed.content):
        return ValidationResult(valid=False, reason="Toxic content detected")

    # 3. Grounding check
    if not is_grounded(parsed.content, context_texts):
        return ValidationResult(valid=False, reason="Ungrounded claims")

    # 4. No hallucinated IDs
    if contains_uuid_pattern(parsed.content):
        return ValidationResult(valid=False, reason="Hallucinated IDs")

    return ValidationResult(valid=True)

Integration Points in OrchestKit

OrchestKit中的集成点

Content Analysis Workflow

内容分析工作流

backend/app/workflows/
├── agents/
│   ├── execution.py        # Add context separation
│   └── prompts/            # Audit all prompts
├── tasks/
│   └── generate_artifact.py  # Add attribution
backend/app/workflows/
├── agents/
│   ├── execution.py        # Add context separation
│   └── prompts/            # Audit all prompts
├── tasks/
│   └── generate_artifact.py  # Add attribution

Services

服务

backend/app/services/
├── embeddings/            # Pre-LLM filtering
└── analysis/              # Post-LLM attribution
backend/app/services/
├── embeddings/            # Pre-LLM filtering
└── analysis/              # Post-LLM attribution

Checklist Before Any LLM Call

任何LLM调用前的检查清单

  • RequestContext available
  • Data filtered by tenant_id and user_id
  • Content extracted without IDs
  • Source references saved
  • Prompt passes audit (no forbidden patterns)
  • Output validated before use
  • Attribution uses context, not LLM output

  • 已获取RequestContext
  • 数据已按tenant_id和user_id过滤
  • 已提取不含ID的内容
  • 已保存来源引用
  • 提示词通过审核(无禁用模式)
  • 输出已验证后再使用
  • 归因使用上下文而非LLM输出

Related Skills

相关技能

  • input-validation
    - Input sanitization patterns that complement LLM safety
  • rag-retrieval
    - RAG pipeline patterns requiring tenant-scoped retrieval
  • llm-evaluation
    - Output quality assessment including hallucination detection
  • security-scanning
    - Automated security scanning for LLM integrations
  • defense-in-depth
    - 8-layer security architecture including Tavily prompt injection firewall at Layer 2
  • input-validation
    - 与LLM安全互补的输入清理模式
  • rag-retrieval
    - 需租户范围检索的RAG管道模式
  • llm-evaluation
    - 包括幻觉检测的输出质量评估
  • security-scanning
    - LLM集成的自动化安全扫描
  • defense-in-depth
    - 8层安全架构,包括第2层的Tavily提示注入防火墙

Key Decisions

关键决策

DecisionChoiceRationale
ID handlingFlow around LLM, never throughPrevents hallucination, injection, and cross-tenant leakage
Output validationSchema + guardrails + groundingDefense-in-depth for LLM outputs
Attribution approachDeterministic post-LLMSystem context provides IDs, not LLM
Prompt auditingRegex pattern matchingFast detection of forbidden identifiers
Version: 1.0.0 (December 2025)
决策选择理由
ID处理方式绕开LLM流转,而非穿过LLM防止幻觉生成、注入攻击和跨租户数据泄露
输出验证Schema + 防护机制 + 事实依据为LLM输出提供纵深防御
归因方式确定性LLM后处理由系统上下文提供ID,而非LLM生成
提示词审核正则表达式模式匹配快速检测禁用标识符
版本: 1.0.0(2025年12月)

Capability Details

能力详情

context-separation

context-separation

Keywords: context separation, prompt context, id in prompt, parameterized Solves:
  • How do I prevent IDs from leaking into prompts?
  • How do I separate system context from prompt content?
  • What should never appear in LLM prompts?
关键词: context separation, prompt context, id in prompt, parameterized 解决问题:
  • 如何防止ID泄露到提示词中?
  • 如何分离系统上下文与提示词内容?
  • 绝对不能出现在LLM提示词中的内容有哪些?

pre-llm-filtering

pre-llm-filtering

Keywords: pre-llm, rag filter, data filter, tenant filter Solves:
  • How do I filter data before sending to LLM?
  • How do I ensure tenant isolation in RAG?
  • How do I scope retrieval to current user?
关键词: pre-llm, rag filter, data filter, tenant filter 解决问题:
  • 如何在发送给LLM前过滤数据?
  • 如何确保RAG中的租户隔离?
  • 如何将检索范围限定为当前用户?

post-llm-attribution

post-llm-attribution

Keywords: attribution, source tracking, provenance, citation Solves:
  • How do I track which sources the LLM used?
  • How do I attribute results correctly?
  • How do I avoid LLM-generated IDs?
关键词: attribution, source tracking, provenance, citation 解决问题:
  • 如何追踪LLM使用的来源?
  • 如何正确归因结果?
  • 如何避免LLM生成的ID?

output-guardrails

output-guardrails

Keywords: guardrail, output validation, hallucination, toxicity Solves:
  • How do I validate LLM output?
  • How do I detect hallucinations?
  • How do I prevent toxic content generation?
关键词: guardrail, output validation, hallucination, toxicity 解决问题:
  • 如何验证LLM输出?
  • 如何检测幻觉?
  • 如何防止生成有毒内容?

prompt-audit

prompt-audit

Keywords: prompt audit, prompt security, prompt injection Solves:
  • How do I verify no IDs leaked to prompts?
  • How do I audit prompts for security?
  • How do I prevent prompt injection?
关键词: prompt audit, prompt security, prompt injection 解决问题:
  • 如何验证无ID泄露到提示词中?
  • 如何审核提示词的安全性?
  • 如何防止提示注入?