prompt-repetition

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Prompt Repetition

提示词重复

Problem Being Solved

解决的问题

LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:

Context-Question Problem: The question is unknown when processing context
Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices
Position/Index Problem: Attention weights weaken for specific position information in long lists

Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.

LLM是作为Causal Language Model训练的，每个token仅能关注之前的token，这会导致以下问题：

上下文-问题错位问题：处理上下文时还不知道后续的问题内容
选项前置的选择题问题：先看到答案选项时无法完整理解问题上下文
位置/索引问题：长列表中特定位置信息的注意力权重会被削弱

提示词重复可以让第二次处理过程参考第一次处理的全部内容，有效模拟出双向注意力的部分优势。

When to use this skill

何时使用该技能

When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.
Options-First MCQ: Multiple choice where answer choices appear before the question
Context + Question: Searching for specific information in long contexts
Index/Position Tasks: Position-based queries in inventories or lists
NPC Dialogue: Maintaining consistency for game AI characters
Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought

使用轻量级模型时：claude-haiku、gemini-flash、gpt-4o-mini等
选项前置的选择题：答案选项出现在问题之前的多选题场景
上下文+问题：在长上下文中搜索特定信息的场景
索引/位置任务：库存或列表中基于位置的查询场景
NPC对话：保持游戏AI角色对话一致性的场景
非推理类任务：不使用Chain-of-Thought的任务

How It Works

实现原理

Limitations of Causal Attention

因果注意力的局限性

[Context] → [Question]
    ↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear

[Context] → [Question]
    ↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear

How Prompt Repetition Solves This

提示词重复的解决逻辑

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass

In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.

Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass

在第二次重复处理时，模型会重新处理整个第一轮提示的所有信息，并强化关键概念的注意力权重，从而提升性能。

注意：该技术不会将模型架构修改为双向架构，它是一种提示工程技巧，用于缓解因果模型的局限性。

Research Results (Google Research 2025)

研究结果（Google Research 2025）

Metric	Result
Significant improvement (p < 0.1)	47 / 70 benchmarks
Performance degradation	0
Neutral	23
Improvement rate	67%

Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)

指标	结果
显著提升 (p < 0.1)	47 / 70 项基准测试
性能下降	0
无明显变化	23
提升率	67%

提升幅度最大的场景：Gemini 2.0 Flash-Lite在NameIndex任务上：21.33% → 97.33%（提升76个百分点）

Tested Models

测试覆盖的模型

Gemini 2.0 Flash / Flash Lite
GPT-4o / GPT-4o-mini
Claude 3.7 Sonnet / Claude 3 Haiku
Deepseek V3

Gemini 2.0 Flash / Flash Lite
GPT-4o / GPT-4o-mini
Claude 3.7 Sonnet / Claude 3 Haiku
Deepseek V3

Tested Benchmarks

测试覆盖的基准集

ARC (Challenge) - Scientific reasoning
OpenBookQA - Open-domain QA
GSM8K - Math problems
MMLU-Pro - Multitask language understanding
MATH - Mathematical problem solving
NameIndex / MiddleMatch - Custom position tasks

ARC (Challenge) - 科学推理
OpenBookQA - 开放域问答
GSM8K - 数学问题
MMLU-Pro - 多任务语言理解
MATH - 数学问题求解
NameIndex / MiddleMatch - 自定义位置任务

Application Procedure

应用流程

Step 1: Verify Auto-Apply Target Models

步骤1：确认自动应用的目标模型

Provider	Auto-apply models	Excluded models
Claude	haiku series	opus, sonnet
Gemini	flash, flash-lite	pro, ultra
OpenAI	gpt-4o-mini, gpt-low	gpt-4o, gpt-4

服务商	自动应用模型	排除模型
Claude	haiku系列	opus、sonnet
Gemini	flash、flash-lite	pro、ultra
OpenAI	gpt-4o-mini、gpt-low	gpt-4o、gpt-4

Step 2: Determine Repetition Count by Task Type

步骤2：根据任务类型确定重复次数

Task Type	Keyword Pattern	Repetitions	Expected Improvement
Options-First MCQ	`A. B. C. D.` choices first	2×	+15-40%p
Index/Position	`slot` , `position` , `index` , `N-th`	3×	+50-76%p
Context + Question	General question	2×	+5-15%p
With CoT	`step by step` , `think through`	0× (not applied)	~0%

任务类型	关键词特征	重复次数	预期提升
选项前置的选择题	先出现 `A. B. C. D.` 选项	2次	+15-40个百分点
索引/位置任务	含 `slot` 、 `position` 、 `index` 、 `第N个` 等	3次	+50-76个百分点
上下文+问题	通用问题	2次	+5-15个百分点
包含CoT的任务	含 `step by step` 、 `think through` 等	0次（不应用）	~0%

Step 3: Check Token Limits

步骤3：检查Token限制

python

undefined

python

undefined

Check context before auto-apply

max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens)))

undefined

max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens)))

undefined

Step 4: Prompt Transformation

步骤4：提示词转换

python

def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

python

def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

Practical Examples

实际案例

Example 1: Options-First MCQ (Greatest Effect)

案例1：选项前置的选择题（效果显著）

Before:

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

After (repetition ×2 applied):

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

Expected output:

Accuracy: original 78% → after repetition 93% (+15%p)

应用前：

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

应用后（重复2次）：

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

预期输出：

准确率：原始78% → 重复后93%（+15个百分点）

Example 2: Index/Position Tasks (Maximum Effect)

案例2：索引/位置任务（提升幅度最大）

Before:

Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?

After (repetition ×3 applied): Prompt repeated 3 times

Expected output:

Dragon Scale

Accuracy: original 21% → after repetition 97% (+76%p)

应用前：

Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?

应用后（重复3次）： 提示词重复3次

预期输出：

Dragon Scale

准确率：原始21% → 重复后97%（+76个百分点）

Example 3: Tool Call Prompt Handling

案例3：工具调用提示词处理

Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.

Before:

Use the calculator tool to compute 234 * 567.
What is the result?

After (repetition ×2):

Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?

Research results show that full repetition including tool call sections is also effective.

注意：包含工具调用指令的提示词也会完整重复，采用全量重复的方案是为了实现简单且效果一致。

应用前：

Use the calculator tool to compute 234 * 567.
What is the result?

应用后（重复2次）：

Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?

研究结果表明，包含工具调用部分的全量重复同样有效。

Production-Ready Implementation

生产级实现方案

Auto-Apply Transformer

自动应用转换器

python

"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

python

"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

Context window per model (in tokens)

MODEL_CONTEXT_WINDOWS = { "claude-3-haiku": 200_000, "claude-haiku": 200_000, "gemini-flash": 1_000_000, "gemini-flash-lite": 1_000_000, "gemini-2.0-flash": 1_000_000, "gpt-4o-mini": 128_000, "gpt-low": 128_000, }

Models targeted for auto-apply

AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())

CoT patterns (excluded from apply)

COT_PATTERNS = [ r"step by step", r"think through", r"let's think", r"reasoning:", r"chain of thought", ]

Position/Index patterns (3× repetition)

POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ]

@dataclass class PromptRepetitionConfig: """Prompt repetition configuration""" default_repetitions: int = 2 position_repetitions: int = 3 separator: str = "\n\n" max_context_ratio: float = 0.8 applied_marker: str = ""

class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models"""

def __init__(self, config: Optional[PromptRepetitionConfig] = None):
    self.config = config or PromptRepetitionConfig()

def should_apply(self, model: str, prompt: str) -> bool:
    """Determine whether to auto-apply"""
    # Skip if already applied
    if self.config.applied_marker in prompt:
        return False

    # Check target model
    model_lower = model.lower()
    if not any(m in model_lower for m in AUTO_APPLY_MODELS):
        return False

    # Skip when CoT pattern detected
    prompt_lower = prompt.lower()
    for pattern in COT_PATTERNS:
        if re.search(pattern, prompt_lower):
            return False

    return True

def determine_repetitions(self, prompt: str, model: str) -> int:
    """Determine repetition count based on task type"""
    prompt_lower = prompt.lower()

    # Position/Index pattern detected → 3×
    for pattern in POSITION_PATTERNS:
        if re.search(pattern, prompt_lower):
            return self.config.position_repetitions

    return self.config.default_repetitions

def estimate_tokens(self, text: str) -> int:
    """Simple token count estimation (speed over precision)"""
    # Estimate approximately 4 characters = 1 token
    return len(text) // 4

def transform(self, prompt: str, model: str) -> str:
    """Apply repetition to prompt"""
    if not self.should_apply(model, prompt):
        return prompt

    repetitions = self.determine_repetitions(prompt, model)

    # Check context limit
    model_lower = model.lower()
    max_tokens = 128_000  # Default value
    for m, tokens in MODEL_CONTEXT_WINDOWS.items():
        if m in model_lower:
            max_tokens = tokens
            break

    max_allowed = int(max_tokens * self.config.max_context_ratio)
    prompt_tokens = self.estimate_tokens(prompt)

    # Reduce repetitions if token limit exceeded
    while prompt_tokens * repetitions > max_allowed and repetitions > 1:
        repetitions -= 1

    if repetitions <= 1:
        return prompt

    # Apply repetition + add marker
    repeated = self.config.separator.join([prompt] * repetitions)
    return f"{self.config.applied_marker}\n{repeated}"

def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
    """Wrap LLM call function"""
    def wrapped(prompt: str, **kwargs):
        transformed = self.transform(prompt, model)
        return llm_fn(transformed, **kwargs)
    return wrapped

---

POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ]

class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models"""

def __init__(self, config: Optional[PromptRepetitionConfig] = None):
    self.config = config or PromptRepetitionConfig()

def should_apply(self, model: str, prompt: str) -> bool:
    """Determine whether to auto-apply"""
    # Skip if already applied
    if self.config.applied_marker in prompt:
        return False

    # Check target model
    model_lower = model.lower()
    if not any(m in model_lower for m in AUTO_APPLY_MODELS):
        return False

    # Skip when CoT pattern detected
    prompt_lower = prompt.lower()
    for pattern in COT_PATTERNS:
        if re.search(pattern, prompt_lower):
            return False

    return True

def determine_repetitions(self, prompt: str, model: str) -> int:
    """Determine repetition count based on task type"""
    prompt_lower = prompt.lower()

    # Position/Index pattern detected → 3×
    for pattern in POSITION_PATTERNS:
        if re.search(pattern, prompt_lower):
            return self.config.position_repetitions

    return self.config.default_repetitions

def estimate_tokens(self, text: str) -> int:
    """Simple token count estimation (speed over precision)"""
    # Estimate approximately 4 characters = 1 token
    return len(text) // 4

def transform(self, prompt: str, model: str) -> str:
    """Apply repetition to prompt"""
    if not self.should_apply(model, prompt):
        return prompt

    repetitions = self.determine_repetitions(prompt, model)

    # Check context limit
    model_lower = model.lower()
    max_tokens = 128_000  # Default value
    for m, tokens in MODEL_CONTEXT_WINDOWS.items():
        if m in model_lower:
            max_tokens = tokens
            break

    max_allowed = int(max_tokens * self.config.max_context_ratio)
    prompt_tokens = self.estimate_tokens(prompt)

    # Reduce repetitions if token limit exceeded
    while prompt_tokens * repetitions > max_allowed and repetitions > 1:
        repetitions -= 1

    if repetitions <= 1:
        return prompt

    # Apply repetition + add marker
    repeated = self.config.separator.join([prompt] * repetitions)
    return f"{self.config.applied_marker}\n{repeated}"

def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
    """Wrap LLM call function"""
    def wrapped(prompt: str, **kwargs):
        transformed = self.transform(prompt, model)
        return llm_fn(transformed, **kwargs)
    return wrapped

---

How to Measure Effectiveness (Verification)

效果评估方法（验证）

A/B Testing Method

A/B测试方法

python

def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")

python

def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")

Key Metrics

核心指标

Metric	Measurement Method
Accuracy	Compare correct answer rates
Consistency	Variance across 10 runs of same prompt
Token cost	Input token increase rate
Latency	Compare p50, p99 latency

指标	测量方法
准确率	对比正确答案率
一致性	同一提示词运行10次的结果方差
Token成本	输入Token增长率
延迟	对比p50、p99延迟

When NOT to Use

不适用场景

Case	Reason
Using CoT	Reasoning process already provides context
Reasoning models (opus, sonnet)	Already optimized; minimal effect
Very long prompts	Risk of exceeding context limit
Already repeated	Duplicate application wastes tokens

场景	原因
使用CoT的场景	推理过程已经提供了足够的上下文
推理类模型（opus、sonnet）	本身已经做了优化，提升效果极小
超长提示词	有超出上下文限制的风险
已经重复过的提示词	重复应用会浪费Token

Cost-Accuracy Analysis

成本-准确率分析

Metric	Baseline	With Repetition	Change
Input tokens	500/req	1000/req	+100%
Output tokens	100/req	100/req	0%
Latency (p50)	450ms	460ms	+2%
Latency (p99)	1200ms	1250ms	+4%
Accuracy	78%	89%	+14%p
Cost per correct answer	$0.019	$0.020	+5%

Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.

指标	基准值	应用重复后	变化
输入Token	500/次请求	1000/次请求	+100%
输出Token	100/次请求	100/次请求	0%
延迟（p50）	450ms	460ms	+2%
延迟（p99）	1200ms	1250ms	+4%
准确率	78%	89%	+14个百分点
每正确回答成本	$0.019	$0.020	+5%

核心结论：GPU上的预填充阶段高度并行化，因此输入Token翻倍对延迟的影响极小。

Multi-Agent Integration

多Agent集成

Auto-Apply Strategy Per Agent

各Agent自动应用策略

Agent	Model	Repetition Applied	Applied At
Claude Orchestrator	opus/sonnet	Optional	-
Claude Executor	haiku	Auto	skill_loader.py
Gemini Analyst	flash	Auto	On MCP call
OpenAI	gpt-4o-mini	Auto	skill_loader.py

Agent	模型	应用重复	应用节点
Claude编排器	opus/sonnet	可选	-
Claude执行器	haiku	自动	skill_loader.py
Gemini分析师	flash	自动	MCP调用时
OpenAI	gpt-4o-mini	自动	skill_loader.py

Preventing Duplicate Application

防止重复应用

To prevent duplicate application in multi-agent pipelines:

Use markers: Detect already-applied prompts with
```

```
marker
Pass metadata: Pass
```
x-prompt-repetition-applied: true
```
header between agents
Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents

为了避免多Agent管道中重复应用该技术：

使用标记：通过
```

```
标记检测已经应用过的提示词
传递元数据：在Agent之间传递
```
x-prompt-repetition-applied: true
```
头
编排器统一管理：Claude编排器调用子Agent时跟踪是否已经应用了重复技术

Application Pattern

应用模式

[Claude Sonnet] Planning (no repetition needed)
    ↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
    ↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)

[Claude Sonnet] 规划（无需重复）
    ↓
[Gemini Flash] 分析（自动应用2次重复，添加标记）
    ↓
[Claude Haiku] 执行（检测到标记 → 跳过重复应用）

skill_loader.py Integration Guide

skill_loader.py集成指南

Recommended Implementation

Code to add to skill_loader.py

from prompt_repetition_transformer import PromptRepetitionTransformer

class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer()

def apply_auto_skills(self, prompt: str, model: str) -> str:
    """Handle auto-apply skills"""
    # Auto-apply prompt-repetition
    for skill in self.skills.values():
        auto_apply = skill.get('data', {}).get('auto-apply', {})
        if auto_apply.get('trigger') == 'auto':
            target_models = auto_apply.get('models', [])
            if any(m in model.lower() for m in target_models):
                prompt = self.prompt_transformer.transform(prompt, model)

    return prompt

---

from prompt_repetition_transformer import PromptRepetitionTransformer

class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer()

def apply_auto_skills(self, prompt: str, model: str) -> str:
    """Handle auto-apply skills"""
    # Auto-apply prompt-repetition
    for skill in self.skills.values():
        auto_apply = skill.get('data', {}).get('auto-apply', {})
        if auto_apply.get('trigger') == 'auto':
            target_models = auto_apply.get('models', [])
            if any(m in model.lower() for m in target_models):
                prompt = self.prompt_transformer.transform(prompt, model)

    return prompt

---

Constraints

约束规则

Required Rules

必须遵守的规则

Lightweight models first: Most effective for haiku, flash, mini series
Limit repetitions: 2× for general tasks, max 3× for position tasks
Context monitoring: Be cautious of context overflow due to repetition
Check markers: Mandatory marker check to prevent duplicate application

优先用于轻量级模型：对haiku、flash、mini系列模型效果最显著
限制重复次数：通用任务2次，位置任务最多3次
上下文监控：注意重复导致的上下文溢出风险
检查标记：必须检查标记防止重复应用

Prohibited Rules

禁止规则

No padding substitution: Increasing length with
```
.
```
etc. has no effect (per research)
Do not combine with CoT: Effects cancel out
Do not force-apply to reasoning models: Already optimized
No duplicate application: Consecutive application without markers wastes tokens

不能用填充替代：研究表明用
```
.
```
等增加长度没有效果
不要和CoT结合使用：效果会相互抵消
不要强制应用到推理类模型：这类模型本身已经做了优化
不要重复应用：没有标记的连续应用会浪费Token

Quick Reference

快速参考

=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)

=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)

=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%

=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->

=== 自动应用目标模型 ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== 重复次数 ===
通用任务: 2次
位置/索引（含slot/position/index关键词）: 3次
包含CoT: 0次（不应用）

=== 效果（Google Research 2025） ===
提升率: 67% (47/70基准测试)
性能下降: 0例
最大提升: +76个百分点（NameIndex任务）

=== 成本 ===
输入Token: +100%
延迟: +2%（预填充并行化优化）
每正确回答成本: +5%

=== 防止重复应用 ===
标记: <!-- prompt-repetition-applied -->

prompt-repetition

Original

Translation

Prompt Repetition

提示词重复

Problem Being Solved

解决的问题

When to use this skill

何时使用该技能

How It Works

实现原理

Limitations of Causal Attention

因果注意力的局限性

How Prompt Repetition Solves This

提示词重复的解决逻辑

Research Results (Google Research 2025)

研究结果（Google Research 2025）

Tested Models

测试覆盖的模型

Tested Benchmarks

测试覆盖的基准集

Application Procedure

应用流程

Step 1: Verify Auto-Apply Target Models

步骤1：确认自动应用的目标模型

Step 2: Determine Repetition Count by Task Type

步骤2：根据任务类型确定重复次数

Step 3: Check Token Limits

步骤3：检查Token限制

Check context before auto-apply

Check context before auto-apply

Step 4: Prompt Transformation

步骤4：提示词转换

Practical Examples

实际案例

Example 1: Options-First MCQ (Greatest Effect)

案例1：选项前置的选择题（效果显著）

Example 2: Index/Position Tasks (Maximum Effect)

案例2：索引/位置任务（提升幅度最大）

Example 3: Tool Call Prompt Handling

案例3：工具调用提示词处理

Production-Ready Implementation

生产级实现方案

Auto-Apply Transformer

自动应用转换器

Context window per model (in tokens)

Context window per model (in tokens)

Models targeted for auto-apply

Models targeted for auto-apply

CoT patterns (excluded from apply)

CoT patterns (excluded from apply)

Position/Index patterns (3× repetition)

Position/Index patterns (3× repetition)

How to Measure Effectiveness (Verification)

效果评估方法（验证）

A/B Testing Method

A/B测试方法

Key Metrics

核心指标

When NOT to Use

不适用场景

Cost-Accuracy Analysis

成本-准确率分析

Multi-Agent Integration

多Agent集成

Auto-Apply Strategy Per Agent

各Agent自动应用策略

Preventing Duplicate Application

防止重复应用

Application Pattern

应用模式

skill_loader.py Integration Guide

skill_loader.py集成指南

Recommended Implementation

推荐实现

Code to add to skill_loader.py

Code to add to skill_loader.py

Constraints

约束规则

Required Rules