prompt-repetition

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompt Repetition

提示词重复

Problem Being Solved

解决的问题

LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:
  1. Context-Question Problem: The question is unknown when processing context
  2. Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices
  3. Position/Index Problem: Attention weights weaken for specific position information in long lists
Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.

LLM是作为Causal Language Model训练的,每个token仅能关注之前的token,这会导致以下问题:
  1. 上下文-问题错位问题:处理上下文时还不知道后续的问题内容
  2. 选项前置的选择题问题:先看到答案选项时无法完整理解问题上下文
  3. 位置/索引问题:长列表中特定位置信息的注意力权重会被削弱
提示词重复可以让第二次处理过程参考第一次处理的全部内容,有效模拟出双向注意力的部分优势

When to use this skill

何时使用该技能

  • When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.
  • Options-First MCQ: Multiple choice where answer choices appear before the question
  • Context + Question: Searching for specific information in long contexts
  • Index/Position Tasks: Position-based queries in inventories or lists
  • NPC Dialogue: Maintaining consistency for game AI characters
  • Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought

  • 使用轻量级模型时:claude-haiku、gemini-flash、gpt-4o-mini等
  • 选项前置的选择题:答案选项出现在问题之前的多选题场景
  • 上下文+问题:在长上下文中搜索特定信息的场景
  • 索引/位置任务:库存或列表中基于位置的查询场景
  • NPC对话:保持游戏AI角色对话一致性的场景
  • 非推理类任务:不使用Chain-of-Thought的任务

How It Works

实现原理

Limitations of Causal Attention

因果注意力的局限性

[Context] → [Question]
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear
[Context] → [Question]
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear

How Prompt Repetition Solves This

提示词重复的解决逻辑

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass
In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.
Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass
在第二次重复处理时,模型会重新处理整个第一轮提示的所有信息,并强化关键概念的注意力权重,从而提升性能。
注意:该技术不会将模型架构修改为双向架构,它是一种提示工程技巧,用于缓解因果模型的局限性。

Research Results (Google Research 2025)

研究结果(Google Research 2025)

MetricResult
Significant improvement (p < 0.1)47 / 70 benchmarks
Performance degradation0
Neutral23
Improvement rate67%
Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)
指标结果
显著提升 (p < 0.1)47 / 70 项基准测试
性能下降0
无明显变化23
提升率67%
提升幅度最大的场景:Gemini 2.0 Flash-Lite在NameIndex任务上:21.33% → 97.33%(提升76个百分点)

Tested Models

测试覆盖的模型

  • Gemini 2.0 Flash / Flash Lite
  • GPT-4o / GPT-4o-mini
  • Claude 3.7 Sonnet / Claude 3 Haiku
  • Deepseek V3
  • Gemini 2.0 Flash / Flash Lite
  • GPT-4o / GPT-4o-mini
  • Claude 3.7 Sonnet / Claude 3 Haiku
  • Deepseek V3

Tested Benchmarks

测试覆盖的基准集

  • ARC (Challenge) - Scientific reasoning
  • OpenBookQA - Open-domain QA
  • GSM8K - Math problems
  • MMLU-Pro - Multitask language understanding
  • MATH - Mathematical problem solving
  • NameIndex / MiddleMatch - Custom position tasks

  • ARC (Challenge) - 科学推理
  • OpenBookQA - 开放域问答
  • GSM8K - 数学问题
  • MMLU-Pro - 多任务语言理解
  • MATH - 数学问题求解
  • NameIndex / MiddleMatch - 自定义位置任务

Application Procedure

应用流程

Step 1: Verify Auto-Apply Target Models

步骤1:确认自动应用的目标模型

ProviderAuto-apply modelsExcluded models
Claudehaiku seriesopus, sonnet
Geminiflash, flash-litepro, ultra
OpenAIgpt-4o-mini, gpt-lowgpt-4o, gpt-4
服务商自动应用模型排除模型
Claudehaiku系列opus、sonnet
Geminiflash、flash-litepro、ultra
OpenAIgpt-4o-mini、gpt-lowgpt-4o、gpt-4

Step 2: Determine Repetition Count by Task Type

步骤2:根据任务类型确定重复次数

Task TypeKeyword PatternRepetitionsExpected Improvement
Options-First MCQ
A. B. C. D.
choices first
+15-40%p
Index/Position
slot
,
position
,
index
,
N-th
+50-76%p
Context + QuestionGeneral question+5-15%p
With CoT
step by step
,
think through
(not applied)~0%
任务类型关键词特征重复次数预期提升
选项前置的选择题先出现
A. B. C. D.
选项
2次+15-40个百分点
索引/位置任务
slot
position
index
第N个
3次+50-76个百分点
上下文+问题通用问题2次+5-15个百分点
包含CoT的任务
step by step
think through
0次(不应用)~0%

Step 3: Check Token Limits

步骤3:检查Token限制

python
undefined
python
undefined

Check context before auto-apply

Check context before auto-apply

max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens)))
undefined
max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens)))
undefined

Step 4: Prompt Transformation

步骤4:提示词转换

python
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

python
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

Practical Examples

实际案例

Example 1: Options-First MCQ (Greatest Effect)

案例1:选项前置的选择题(效果显著)

Before:
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
After (repetition ×2 applied):
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
Expected output:
A
Accuracy: original 78% → after repetition 93% (+15%p)

应用前:
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
应用后(重复2次):
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
预期输出:
A
准确率:原始78% → 重复后93%(+15个百分点)

Example 2: Index/Position Tasks (Maximum Effect)

案例2:索引/位置任务(提升幅度最大)

Before:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?
After (repetition ×3 applied): Prompt repeated 3 times
Expected output:
Dragon Scale
Accuracy: original 21% → after repetition 97% (+76%p)

应用前:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?
应用后(重复3次): 提示词重复3次
预期输出:
Dragon Scale
准确率:原始21% → 重复后97%(+76个百分点)

Example 3: Tool Call Prompt Handling

案例3:工具调用提示词处理

Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.
Before:
Use the calculator tool to compute 234 * 567.
What is the result?
After (repetition ×2):
Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?
Research results show that full repetition including tool call sections is also effective.

注意:包含工具调用指令的提示词也会完整重复,采用全量重复的方案是为了实现简单且效果一致。
应用前:
Use the calculator tool to compute 234 * 567.
What is the result?
应用后(重复2次):
Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?
研究结果表明,包含工具调用部分的全量重复同样有效。

Production-Ready Implementation

生产级实现方案

Auto-Apply Transformer

自动应用转换器

python
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
python
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

Context window per model (in tokens)

Context window per model (in tokens)

MODEL_CONTEXT_WINDOWS = { "claude-3-haiku": 200_000, "claude-haiku": 200_000, "gemini-flash": 1_000_000, "gemini-flash-lite": 1_000_000, "gemini-2.0-flash": 1_000_000, "gpt-4o-mini": 128_000, "gpt-low": 128_000, }
MODEL_CONTEXT_WINDOWS = { "claude-3-haiku": 200_000, "claude-haiku": 200_000, "gemini-flash": 1_000_000, "gemini-flash-lite": 1_000_000, "gemini-2.0-flash": 1_000_000, "gpt-4o-mini": 128_000, "gpt-low": 128_000, }

Models targeted for auto-apply

Models targeted for auto-apply

AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())

CoT patterns (excluded from apply)

CoT patterns (excluded from apply)

COT_PATTERNS = [ r"step by step", r"think through", r"let's think", r"reasoning:", r"chain of thought", ]
COT_PATTERNS = [ r"step by step", r"think through", r"let's think", r"reasoning:", r"chain of thought", ]

Position/Index patterns (3× repetition)

Position/Index patterns (3× repetition)

POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ]
@dataclass class PromptRepetitionConfig: """Prompt repetition configuration""" default_repetitions: int = 2 position_repetitions: int = 3 separator: str = "\n\n" max_context_ratio: float = 0.8 applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
    self.config = config or PromptRepetitionConfig()

def should_apply(self, model: str, prompt: str) -> bool:
    """Determine whether to auto-apply"""
    # Skip if already applied
    if self.config.applied_marker in prompt:
        return False

    # Check target model
    model_lower = model.lower()
    if not any(m in model_lower for m in AUTO_APPLY_MODELS):
        return False

    # Skip when CoT pattern detected
    prompt_lower = prompt.lower()
    for pattern in COT_PATTERNS:
        if re.search(pattern, prompt_lower):
            return False

    return True

def determine_repetitions(self, prompt: str, model: str) -> int:
    """Determine repetition count based on task type"""
    prompt_lower = prompt.lower()

    # Position/Index pattern detected → 3×
    for pattern in POSITION_PATTERNS:
        if re.search(pattern, prompt_lower):
            return self.config.position_repetitions

    return self.config.default_repetitions

def estimate_tokens(self, text: str) -> int:
    """Simple token count estimation (speed over precision)"""
    # Estimate approximately 4 characters = 1 token
    return len(text) // 4

def transform(self, prompt: str, model: str) -> str:
    """Apply repetition to prompt"""
    if not self.should_apply(model, prompt):
        return prompt

    repetitions = self.determine_repetitions(prompt, model)

    # Check context limit
    model_lower = model.lower()
    max_tokens = 128_000  # Default value
    for m, tokens in MODEL_CONTEXT_WINDOWS.items():
        if m in model_lower:
            max_tokens = tokens
            break

    max_allowed = int(max_tokens * self.config.max_context_ratio)
    prompt_tokens = self.estimate_tokens(prompt)

    # Reduce repetitions if token limit exceeded
    while prompt_tokens * repetitions > max_allowed and repetitions > 1:
        repetitions -= 1

    if repetitions <= 1:
        return prompt

    # Apply repetition + add marker
    repeated = self.config.separator.join([prompt] * repetitions)
    return f"{self.config.applied_marker}\n{repeated}"

def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
    """Wrap LLM call function"""
    def wrapped(prompt: str, **kwargs):
        transformed = self.transform(prompt, model)
        return llm_fn(transformed, **kwargs)
    return wrapped

---
POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ]
@dataclass class PromptRepetitionConfig: """Prompt repetition configuration""" default_repetitions: int = 2 position_repetitions: int = 3 separator: str = "\n\n" max_context_ratio: float = 0.8 applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
    self.config = config or PromptRepetitionConfig()

def should_apply(self, model: str, prompt: str) -> bool:
    """Determine whether to auto-apply"""
    # Skip if already applied
    if self.config.applied_marker in prompt:
        return False

    # Check target model
    model_lower = model.lower()
    if not any(m in model_lower for m in AUTO_APPLY_MODELS):
        return False

    # Skip when CoT pattern detected
    prompt_lower = prompt.lower()
    for pattern in COT_PATTERNS:
        if re.search(pattern, prompt_lower):
            return False

    return True

def determine_repetitions(self, prompt: str, model: str) -> int:
    """Determine repetition count based on task type"""
    prompt_lower = prompt.lower()

    # Position/Index pattern detected → 3×
    for pattern in POSITION_PATTERNS:
        if re.search(pattern, prompt_lower):
            return self.config.position_repetitions

    return self.config.default_repetitions

def estimate_tokens(self, text: str) -> int:
    """Simple token count estimation (speed over precision)"""
    # Estimate approximately 4 characters = 1 token
    return len(text) // 4

def transform(self, prompt: str, model: str) -> str:
    """Apply repetition to prompt"""
    if not self.should_apply(model, prompt):
        return prompt

    repetitions = self.determine_repetitions(prompt, model)

    # Check context limit
    model_lower = model.lower()
    max_tokens = 128_000  # Default value
    for m, tokens in MODEL_CONTEXT_WINDOWS.items():
        if m in model_lower:
            max_tokens = tokens
            break

    max_allowed = int(max_tokens * self.config.max_context_ratio)
    prompt_tokens = self.estimate_tokens(prompt)

    # Reduce repetitions if token limit exceeded
    while prompt_tokens * repetitions > max_allowed and repetitions > 1:
        repetitions -= 1

    if repetitions <= 1:
        return prompt

    # Apply repetition + add marker
    repeated = self.config.separator.join([prompt] * repetitions)
    return f"{self.config.applied_marker}\n{repeated}"

def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
    """Wrap LLM call function"""
    def wrapped(prompt: str, **kwargs):
        transformed = self.transform(prompt, model)
        return llm_fn(transformed, **kwargs)
    return wrapped

---

How to Measure Effectiveness (Verification)

效果评估方法(验证)

A/B Testing Method

A/B测试方法

python
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")
python
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")

Key Metrics

核心指标

MetricMeasurement Method
AccuracyCompare correct answer rates
ConsistencyVariance across 10 runs of same prompt
Token costInput token increase rate
LatencyCompare p50, p99 latency

指标测量方法
准确率对比正确答案率
一致性同一提示词运行10次的结果方差
Token成本输入Token增长率
延迟对比p50、p99延迟

When NOT to Use

不适用场景

CaseReason
Using CoTReasoning process already provides context
Reasoning models (opus, sonnet)Already optimized; minimal effect
Very long promptsRisk of exceeding context limit
Already repeatedDuplicate application wastes tokens

场景原因
使用CoT的场景推理过程已经提供了足够的上下文
推理类模型(opus、sonnet)本身已经做了优化,提升效果极小
超长提示词有超出上下文限制的风险
已经重复过的提示词重复应用会浪费Token

Cost-Accuracy Analysis

成本-准确率分析

MetricBaselineWith RepetitionChange
Input tokens500/req1000/req+100%
Output tokens100/req100/req0%
Latency (p50)450ms460ms+2%
Latency (p99)1200ms1250ms+4%
Accuracy78%89%+14%p
Cost per correct answer$0.019$0.020+5%
Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.

指标基准值应用重复后变化
输入Token500/次请求1000/次请求+100%
输出Token100/次请求100/次请求0%
延迟(p50)450ms460ms+2%
延迟(p99)1200ms1250ms+4%
准确率78%89%+14个百分点
每正确回答成本$0.019$0.020+5%
核心结论:GPU上的预填充阶段高度并行化,因此输入Token翻倍对延迟的影响极小。

Multi-Agent Integration

多Agent集成

Auto-Apply Strategy Per Agent

各Agent自动应用策略

AgentModelRepetition AppliedApplied At
Claude Orchestratoropus/sonnetOptional-
Claude ExecutorhaikuAutoskill_loader.py
Gemini AnalystflashAutoOn MCP call
OpenAIgpt-4o-miniAutoskill_loader.py
Agent模型应用重复应用节点
Claude编排器opus/sonnet可选-
Claude执行器haiku自动skill_loader.py
Gemini分析师flash自动MCP调用时
OpenAIgpt-4o-mini自动skill_loader.py

Preventing Duplicate Application

防止重复应用

To prevent duplicate application in multi-agent pipelines:
  1. Use markers: Detect already-applied prompts with
    <!-- prompt-repetition-applied -->
    marker
  2. Pass metadata: Pass
    x-prompt-repetition-applied: true
    header between agents
  3. Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents
为了避免多Agent管道中重复应用该技术:
  1. 使用标记:通过
    <!-- prompt-repetition-applied -->
    标记检测已经应用过的提示词
  2. 传递元数据:在Agent之间传递
    x-prompt-repetition-applied: true
  3. 编排器统一管理:Claude编排器调用子Agent时跟踪是否已经应用了重复技术

Application Pattern

应用模式

[Claude Sonnet] Planning (no repetition needed)
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
[Claude Haiku] Execution (marker detected → skip duplicate apply)

[Claude Sonnet] 规划(无需重复)
[Gemini Flash] 分析(自动应用2次重复,添加标记)
[Claude Haiku] 执行(检测到标记 → 跳过重复应用)

skill_loader.py Integration Guide

skill_loader.py集成指南

Recommended Implementation

推荐实现

python
undefined
python
undefined

Code to add to skill_loader.py

Code to add to skill_loader.py

from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
    """Handle auto-apply skills"""
    # Auto-apply prompt-repetition
    for skill in self.skills.values():
        auto_apply = skill.get('data', {}).get('auto-apply', {})
        if auto_apply.get('trigger') == 'auto':
            target_models = auto_apply.get('models', [])
            if any(m in model.lower() for m in target_models):
                prompt = self.prompt_transformer.transform(prompt, model)

    return prompt

---
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
    """Handle auto-apply skills"""
    # Auto-apply prompt-repetition
    for skill in self.skills.values():
        auto_apply = skill.get('data', {}).get('auto-apply', {})
        if auto_apply.get('trigger') == 'auto':
            target_models = auto_apply.get('models', [])
            if any(m in model.lower() for m in target_models):
                prompt = self.prompt_transformer.transform(prompt, model)

    return prompt

---

Constraints

约束规则

Required Rules

必须遵守的规则

  1. Lightweight models first: Most effective for haiku, flash, mini series
  2. Limit repetitions: 2× for general tasks, max 3× for position tasks
  3. Context monitoring: Be cautious of context overflow due to repetition
  4. Check markers: Mandatory marker check to prevent duplicate application
  1. 优先用于轻量级模型:对haiku、flash、mini系列模型效果最显著
  2. 限制重复次数:通用任务2次,位置任务最多3次
  3. 上下文监控:注意重复导致的上下文溢出风险
  4. 检查标记:必须检查标记防止重复应用

Prohibited Rules

禁止规则

  1. No padding substitution: Increasing length with
    .
    etc. has no effect (per research)
  2. Do not combine with CoT: Effects cancel out
  3. Do not force-apply to reasoning models: Already optimized
  4. No duplicate application: Consecutive application without markers wastes tokens

  1. 不能用填充替代:研究表明用
    .
    等增加长度没有效果
  2. 不要和CoT结合使用:效果会相互抵消
  3. 不要强制应用到推理类模型:这类模型本身已经做了优化
  4. 不要重复应用:没有标记的连续应用会浪费Token

Quick Reference

快速参考

=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)

=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)

=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%

=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->

=== 自动应用目标模型 ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== 重复次数 ===
通用任务: 2次
位置/索引(含slot/position/index关键词): 3次
包含CoT: 0次(不应用)

=== 效果(Google Research 2025) ===
提升率: 67% (47/70基准测试)
性能下降: 0例
最大提升: +76个百分点(NameIndex任务)

=== 成本 ===
输入Token: +100%
延迟: +2%(预填充并行化优化)
每正确回答成本: +5%

=== 防止重复应用 ===
标记: <!-- prompt-repetition-applied -->

References

参考资料