llm-safety-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Safety Patterns

LLM安全模式

The Core Principle

核心原则

Identifiers flow AROUND the LLM, not THROUGH it. The LLM sees only content. Attribution happens deterministically.

标识符绕开LLM流转，而非穿过LLM。 LLM仅能看到内容，归因过程是确定性的。

Why This Matters

为何这一点至关重要

When identifiers appear in prompts, bad things happen:

Hallucination: LLM invents IDs that don't exist
Confusion: LLM mixes up which ID belongs where
Injection: Attacker manipulates IDs via prompt injection
Leakage: IDs appear in logs, caches, traces
Cross-tenant: LLM could reference other users' data

当标识符出现在提示词中时，会引发以下问题：

幻觉： LLM会生成不存在的ID
混淆： LLM会搞混ID的归属
注入： 攻击者通过提示注入操纵ID
泄露： ID出现在日志、缓存、追踪信息中
跨租户： LLM可能引用其他用户的数据

The Architecture

架构

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   SYSTEM CONTEXT (flows around LLM)                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions     │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│        │                                                       │        │
│        │                                                       │        │
│        ▼                                                       ▼        │
│   ┌─────────┐                                           ┌─────────┐    │
│   │ PRE-LLM │       ┌─────────────────────┐            │POST-LLM │    │
│   │ FILTER  │──────▶│        LLM          │───────────▶│ATTRIBUTE│    │
│   │         │       │                     │            │         │    │
│   │ Returns │       │ Sees ONLY:          │            │ Adds:   │    │
│   │ CONTENT │       │ - content text      │            │ - IDs   │    │
│   │ (no IDs)│       │ - context text      │            │ - refs  │    │
│   └─────────┘       │ (NO IDs!)           │            └─────────┘    │
│                     └─────────────────────┘                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   SYSTEM CONTEXT (flows around LLM)                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions     │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│        │                                                       │        │
│        │                                                       │        │
│        ▼                                                       ▼        │
│   ┌─────────┐                                           ┌─────────┐    │
│   │ PRE-LLM │       ┌─────────────────────┐            │POST-LLM │    │
│   │ FILTER  │──────▶│        LLM          │───────────▶│ATTRIBUTE│    │
│   │         │       │                     │            │         │    │
│   │ Returns │       │ Sees ONLY:          │            │ Adds:   │    │
│   │ CONTENT │       │ - content text      │            │ - IDs   │    │
│   │ (no IDs)│       │ - context text      │            │ - refs  │    │
│   └─────────┘       │ (NO IDs!)           │            └─────────┘    │
│                     └─────────────────────┘                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

What NEVER Goes in Prompts

绝对不能出现在提示词中的内容

OrchestKit Forbidden Parameters

OrchestKit 禁用参数

Parameter	Type	Why Forbidden
`user_id`	UUID	Can be hallucinated, enables cross-user access
`tenant_id`	UUID	Critical for multi-tenant isolation
`analysis_id`	UUID	Job tracking, not for LLM
`document_id`	UUID	Source tracking, not for LLM
`artifact_id`	UUID	Output tracking, not for LLM
`chunk_id`	UUID	RAG reference, not for LLM
`session_id`	str	Auth context, not for LLM
`trace_id`	str	Observability, not for LLM
Any UUID	UUID	Pattern: `[0-9a-f]{8}-...`

参数	类型	禁用原因
`user_id`	UUID	可能被幻觉生成，导致跨用户访问
`tenant_id`	UUID	对多租户隔离至关重要
`analysis_id`	UUID	用于任务追踪，不适合LLM处理
`document_id`	UUID	用于来源追踪，不适合LLM处理
`artifact_id`	UUID	用于输出追踪，不适合LLM处理
`chunk_id`	UUID	RAG引用，不适合LLM处理
`session_id`	str	认证上下文，不适合LLM处理
`trace_id`	str	可观测性相关，不适合LLM处理
任何UUID	UUID	格式： `[0-9a-f]{8}-...`

Detection Pattern

检测模式

python

import re

FORBIDDEN_PATTERNS = [
    r'user[_-]?id',
    r'tenant[_-]?id',
    r'analysis[_-]?id',
    r'document[_-]?id',
    r'artifact[_-]?id',
    r'chunk[_-]?id',
    r'session[_-]?id',
    r'trace[_-]?id',
    r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]

def audit_prompt(prompt: str) -> list[str]:
    """Check for forbidden patterns in prompt"""
    violations = []
    for pattern in FORBIDDEN_PATTERNS:
        if re.search(pattern, prompt, re.IGNORECASE):
            violations.append(pattern)
    return violations

python

import re

FORBIDDEN_PATTERNS = [
    r'user[_-]?id',
    r'tenant[_-]?id',
    r'analysis[_-]?id',
    r'document[_-]?id',
    r'artifact[_-]?id',
    r'chunk[_-]?id',
    r'session[_-]?id',
    r'trace[_-]?id',
    r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]

def audit_prompt(prompt: str) -> list[str]:
    """Check for forbidden patterns in prompt"""
    violations = []
    for pattern in FORBIDDEN_PATTERNS:
        if re.search(pattern, prompt, re.IGNORECASE):
            violations.append(pattern)
    return violations

The Three-Phase Pattern

三阶段模式

Phase 1: Pre-LLM (Filter & Extract)

阶段1：LLM预处理（过滤与提取）

python

async def prepare_for_llm(
    query: str,
    ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
    """
    Filter data and extract content for LLM.
    Returns: (content, context_texts, source_references)
    """
    # 1. Retrieve with tenant filter
    documents = await semantic_search(
        query_embedding=embed(query),
        ctx=ctx,  # Filters by tenant_id, user_id
    )

    # 2. Save references for attribution
    source_refs = SourceRefs(
        document_ids=[d.id for d in documents],
        chunk_ids=[c.id for c in chunks],
    )

    # 3. Extract content only (no IDs)
    content_texts = [d.content for d in documents]

    return query, content_texts, source_refs

python

async def prepare_for_llm(
    query: str,
    ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
    """
    Filter data and extract content for LLM.
    Returns: (content, context_texts, source_references)
    """
    # 1. Retrieve with tenant filter
    documents = await semantic_search(
        query_embedding=embed(query),
        ctx=ctx,  # Filters by tenant_id, user_id
    )

    # 2. Save references for attribution
    source_refs = SourceRefs(
        document_ids=[d.id for d in documents],
        chunk_ids=[c.id for c in chunks],
    )

    # 3. Extract content only (no IDs)
    content_texts = [d.content for d in documents]

    return query, content_texts, source_refs

Phase 2: LLM Call (Content Only)

阶段2：LLM调用（仅含内容）

python

def build_prompt(content: str, context_texts: list[str]) -> str:
    """
    Build prompt with ONLY content, no identifiers.
    """
    prompt = f"""
    Analyze the following content and provide insights.

    CONTENT:
    {content}

    RELEVANT CONTEXT:
    {chr(10).join(f"- {text}" for text in context_texts)}

    Provide analysis covering:
    1. Key concepts
    2. Prerequisites
    3. Learning objectives
    """

    # AUDIT: Verify no IDs leaked
    violations = audit_prompt(prompt)
    if violations:
        raise SecurityError(f"IDs leaked to prompt: {violations}")

    return prompt

async def call_llm(prompt: str) -> dict:
    """LLM only sees content, never IDs"""
    response = await llm.generate(prompt)
    return parse_response(response)

python

def build_prompt(content: str, context_texts: list[str]) -> str:
    """
    Build prompt with ONLY content, no identifiers.
    """
    prompt = f"""
    Analyze the following content and provide insights.

    CONTENT:
    {content}

    RELEVANT CONTEXT:
    {chr(10).join(f"- {text}" for text in context_texts)}

    Provide analysis covering:
    1. Key concepts
    2. Prerequisites
    3. Learning objectives
    """

    # AUDIT: Verify no IDs leaked
    violations = audit_prompt(prompt)
    if violations:
        raise SecurityError(f"IDs leaked to prompt: {violations}")

    return prompt

async def call_llm(prompt: str) -> dict:
    """LLM only sees content, never IDs"""
    response = await llm.generate(prompt)
    return parse_response(response)

Phase 3: Post-LLM (Attribute)

阶段3：LLM后处理（归因）

python

async def save_with_attribution(
    llm_output: dict,
    ctx: RequestContext,
    source_refs: SourceRefs,
) -> Analysis:
    """
    Attach context and references to LLM output.
    Attribution is deterministic, not LLM-generated.
    """
    return await Analysis.create(
        # Generated
        id=uuid4(),

        # From RequestContext (system-provided)
        user_id=ctx.user_id,
        tenant_id=ctx.tenant_id,
        analysis_id=ctx.resource_id,
        trace_id=ctx.trace_id,

        # From Pre-LLM refs (deterministic)
        source_document_ids=source_refs.document_ids,
        source_chunk_ids=source_refs.chunk_ids,

        # From LLM (content only)
        content=llm_output["analysis"],
        key_concepts=llm_output["key_concepts"],
        difficulty=llm_output["difficulty"],

        # Metadata
        created_at=datetime.now(timezone.utc),
        model_used=MODEL_NAME,
    )

python

async def save_with_attribution(
    llm_output: dict,
    ctx: RequestContext,
    source_refs: SourceRefs,
) -> Analysis:
    """
    Attach context and references to LLM output.
    Attribution is deterministic, not LLM-generated.
    """
    return await Analysis.create(
        # Generated
        id=uuid4(),

        # From RequestContext (system-provided)
        user_id=ctx.user_id,
        tenant_id=ctx.tenant_id,
        analysis_id=ctx.resource_id,
        trace_id=ctx.trace_id,

        # From Pre-LLM refs (deterministic)
        source_document_ids=source_refs.document_ids,
        source_chunk_ids=source_refs.chunk_ids,

        # From LLM (content only)
        content=llm_output["analysis"],
        key_concepts=llm_output["key_concepts"],
        difficulty=llm_output["difficulty"],

        # Metadata
        created_at=datetime.now(timezone.utc),
        model_used=MODEL_NAME,
    )

Output Validation

输出验证

After LLM returns, validate:

Schema: Response matches expected structure
Guardrails: No toxic/harmful content
Grounding: Claims are supported by provided context
No IDs: LLM didn't hallucinate any IDs

python

async def validate_output(
    llm_output: dict,
    context_texts: list[str],
) -> ValidationResult:
    """Validate LLM output before use"""

    # 1. Schema validation
    try:
        parsed = AnalysisOutput.model_validate(llm_output)
    except ValidationError as e:
        return ValidationResult(valid=False, reason=f"Schema error: {e}")

    # 2. Guardrails
    if await contains_toxic_content(parsed.content):
        return ValidationResult(valid=False, reason="Toxic content detected")

    # 3. Grounding check
    if not is_grounded(parsed.content, context_texts):
        return ValidationResult(valid=False, reason="Ungrounded claims")

    # 4. No hallucinated IDs
    if contains_uuid_pattern(parsed.content):
        return ValidationResult(valid=False, reason="Hallucinated IDs")

    return ValidationResult(valid=True)

LLM返回结果后，需验证以下内容：

Schema： 响应符合预期结构
Guardrails： 无有毒/有害内容
Grounding： 声明有提供的上下文支撑
无ID： LLM未生成幻觉ID

python

async def validate_output(
    llm_output: dict,
    context_texts: list[str],
) -> ValidationResult:
    """Validate LLM output before use"""

    # 1. Schema validation
    try:
        parsed = AnalysisOutput.model_validate(llm_output)
    except ValidationError as e:
        return ValidationResult(valid=False, reason=f"Schema error: {e}")

    # 2. Guardrails
    if await contains_toxic_content(parsed.content):
        return ValidationResult(valid=False, reason="Toxic content detected")

    # 3. Grounding check
    if not is_grounded(parsed.content, context_texts):
        return ValidationResult(valid=False, reason="Ungrounded claims")

    # 4. No hallucinated IDs
    if contains_uuid_pattern(parsed.content):
        return ValidationResult(valid=False, reason="Hallucinated IDs")

    return ValidationResult(valid=True)

Integration Points in OrchestKit

OrchestKit中的集成点

Content Analysis Workflow

内容分析工作流

backend/app/workflows/
├── agents/
│   ├── execution.py        # Add context separation
│   └── prompts/            # Audit all prompts
├── tasks/
│   └── generate_artifact.py  # Add attribution

backend/app/workflows/
├── agents/
│   ├── execution.py        # Add context separation
│   └── prompts/            # Audit all prompts
├── tasks/
│   └── generate_artifact.py  # Add attribution

Services

服务

backend/app/services/
├── embeddings/            # Pre-LLM filtering
└── analysis/              # Post-LLM attribution

backend/app/services/
├── embeddings/            # Pre-LLM filtering
└── analysis/              # Post-LLM attribution

Checklist Before Any LLM Call

任何LLM调用前的检查清单

Related Skills

Key Decisions

关键决策

Decision	Choice	Rationale
ID handling	Flow around LLM, never through	Prevents hallucination, injection, and cross-tenant leakage
Output validation	Schema + guardrails + grounding	Defense-in-depth for LLM outputs
Attribution approach	Deterministic post-LLM	System context provides IDs, not LLM
Prompt auditing	Regex pattern matching	Fast detection of forbidden identifiers

Version: 1.0.0 (December 2025)

决策	选择	理由
ID处理方式	绕开LLM流转，而非穿过LLM	防止幻觉生成、注入攻击和跨租户数据泄露
输出验证	Schema + 防护机制 + 事实依据	为LLM输出提供纵深防御
归因方式	确定性LLM后处理	由系统上下文提供ID，而非LLM生成
提示词审核	正则表达式模式匹配	快速检测禁用标识符

版本： 1.0.0（2025年12月）

Capability Details

能力详情

context-separation

Keywords: context separation, prompt context, id in prompt, parameterized Solves:

How do I prevent IDs from leaking into prompts?
How do I separate system context from prompt content?
What should never appear in LLM prompts?

关键词： context separation, prompt context, id in prompt, parameterized 解决问题：

如何防止ID泄露到提示词中？
如何分离系统上下文与提示词内容？
绝对不能出现在LLM提示词中的内容有哪些？

pre-llm-filtering

Keywords: pre-llm, rag filter, data filter, tenant filter Solves:

How do I filter data before sending to LLM?
How do I ensure tenant isolation in RAG?
How do I scope retrieval to current user?

关键词： pre-llm, rag filter, data filter, tenant filter 解决问题：

如何在发送给LLM前过滤数据？
如何确保RAG中的租户隔离？
如何将检索范围限定为当前用户？

post-llm-attribution

Keywords: attribution, source tracking, provenance, citation Solves:

How do I track which sources the LLM used?
How do I attribute results correctly?
How do I avoid LLM-generated IDs?

关键词： attribution, source tracking, provenance, citation 解决问题：

如何追踪LLM使用的来源？
如何正确归因结果？
如何避免LLM生成的ID？

output-guardrails

Keywords: guardrail, output validation, hallucination, toxicity Solves:

How do I validate LLM output?
How do I detect hallucinations?
How do I prevent toxic content generation?

关键词： guardrail, output validation, hallucination, toxicity 解决问题：

如何验证LLM输出？
如何检测幻觉？
如何防止生成有毒内容？

prompt-audit

Keywords: prompt audit, prompt security, prompt injection Solves:

How do I verify no IDs leaked to prompts?
How do I audit prompts for security?
How do I prevent prompt injection?

关键词： prompt audit, prompt security, prompt injection 解决问题：

如何验证无ID泄露到提示词中？
如何审核提示词的安全性？
如何防止提示注入？

llm-safety-patterns

Original

Translation

LLM Safety Patterns

LLM安全模式

The Core Principle

核心原则

Why This Matters

为何这一点至关重要

The Architecture

架构

What NEVER Goes in Prompts

绝对不能出现在提示词中的内容

OrchestKit Forbidden Parameters

OrchestKit 禁用参数

Detection Pattern

检测模式

The Three-Phase Pattern

三阶段模式

Phase 1: Pre-LLM (Filter & Extract)

阶段1：LLM预处理（过滤与提取）

Phase 2: LLM Call (Content Only)

阶段2：LLM调用（仅含内容）

Phase 3: Post-LLM (Attribute)

阶段3：LLM后处理（归因）

Output Validation

输出验证

Integration Points in OrchestKit

OrchestKit中的集成点

Content Analysis Workflow

内容分析工作流

Services

服务

Checklist Before Any LLM Call

任何LLM调用前的检查清单

Related Skills

相关技能

Key Decisions

关键决策

Capability Details

能力详情

context-separation

context-separation

pre-llm-filtering

pre-llm-filtering

post-llm-attribution

post-llm-attribution

output-guardrails

output-guardrails

prompt-audit

prompt-audit