prompt-repetition
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrompt Repetition
提示词重复
Problem Being Solved
解决的问题
LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:
- Context-Question Problem: The question is unknown when processing context
- Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices
- Position/Index Problem: Attention weights weaken for specific position information in long lists
Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.
LLM是作为Causal Language Model训练的,每个token仅能关注之前的token,这会导致以下问题:
- 上下文-问题错位问题:处理上下文时还不知道后续的问题内容
- 选项前置的选择题问题:先看到答案选项时无法完整理解问题上下文
- 位置/索引问题:长列表中特定位置信息的注意力权重会被削弱
提示词重复可以让第二次处理过程参考第一次处理的全部内容,有效模拟出双向注意力的部分优势。
When to use this skill
何时使用该技能
- When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.
- Options-First MCQ: Multiple choice where answer choices appear before the question
- Context + Question: Searching for specific information in long contexts
- Index/Position Tasks: Position-based queries in inventories or lists
- NPC Dialogue: Maintaining consistency for game AI characters
- Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought
- 使用轻量级模型时:claude-haiku、gemini-flash、gpt-4o-mini等
- 选项前置的选择题:答案选项出现在问题之前的多选题场景
- 上下文+问题:在长上下文中搜索特定信息的场景
- 索引/位置任务:库存或列表中基于位置的查询场景
- NPC对话:保持游戏AI角色对话一致性的场景
- 非推理类任务:不使用Chain-of-Thought的任务
How It Works
实现原理
Limitations of Causal Attention
因果注意力的局限性
[Context] → [Question]
↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear[Context] → [Question]
↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appearHow Prompt Repetition Solves This
提示词重复的解决逻辑
[First Pass] [Second Pass]
Context → Question → Context' → Question'
↑ ↑
Can reference entire first passIn the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.
Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.
[First Pass] [Second Pass]
Context → Question → Context' → Question'
↑ ↑
Can reference entire first pass在第二次重复处理时,模型会重新处理整个第一轮提示的所有信息,并强化关键概念的注意力权重,从而提升性能。
注意:该技术不会将模型架构修改为双向架构,它是一种提示工程技巧,用于缓解因果模型的局限性。
Research Results (Google Research 2025)
研究结果(Google Research 2025)
| Metric | Result |
|---|---|
| Significant improvement (p < 0.1) | 47 / 70 benchmarks |
| Performance degradation | 0 |
| Neutral | 23 |
| Improvement rate | 67% |
Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)
| 指标 | 结果 |
|---|---|
| 显著提升 (p < 0.1) | 47 / 70 项基准测试 |
| 性能下降 | 0 |
| 无明显变化 | 23 |
| 提升率 | 67% |
提升幅度最大的场景:Gemini 2.0 Flash-Lite在NameIndex任务上:21.33% → 97.33%(提升76个百分点)
Tested Models
测试覆盖的模型
- Gemini 2.0 Flash / Flash Lite
- GPT-4o / GPT-4o-mini
- Claude 3.7 Sonnet / Claude 3 Haiku
- Deepseek V3
- Gemini 2.0 Flash / Flash Lite
- GPT-4o / GPT-4o-mini
- Claude 3.7 Sonnet / Claude 3 Haiku
- Deepseek V3
Tested Benchmarks
测试覆盖的基准集
- ARC (Challenge) - Scientific reasoning
- OpenBookQA - Open-domain QA
- GSM8K - Math problems
- MMLU-Pro - Multitask language understanding
- MATH - Mathematical problem solving
- NameIndex / MiddleMatch - Custom position tasks
- ARC (Challenge) - 科学推理
- OpenBookQA - 开放域问答
- GSM8K - 数学问题
- MMLU-Pro - 多任务语言理解
- MATH - 数学问题求解
- NameIndex / MiddleMatch - 自定义位置任务
Application Procedure
应用流程
Step 1: Verify Auto-Apply Target Models
步骤1:确认自动应用的目标模型
| Provider | Auto-apply models | Excluded models |
|---|---|---|
| Claude | haiku series | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |
| 服务商 | 自动应用模型 | 排除模型 |
|---|---|---|
| Claude | haiku系列 | opus、sonnet |
| Gemini | flash、flash-lite | pro、ultra |
| OpenAI | gpt-4o-mini、gpt-low | gpt-4o、gpt-4 |
Step 2: Determine Repetition Count by Task Type
步骤2:根据任务类型确定重复次数
| Task Type | Keyword Pattern | Repetitions | Expected Improvement |
|---|---|---|---|
| Options-First MCQ | | 2× | +15-40%p |
| Index/Position | | 3× | +50-76%p |
| Context + Question | General question | 2× | +5-15%p |
| With CoT | | 0× (not applied) | ~0% |
| 任务类型 | 关键词特征 | 重复次数 | 预期提升 |
|---|---|---|---|
| 选项前置的选择题 | 先出现 | 2次 | +15-40个百分点 |
| 索引/位置任务 | 含 | 3次 | +50-76个百分点 |
| 上下文+问题 | 通用问题 | 2次 | +5-15个百分点 |
| 包含CoT的任务 | 含 | 0次(不应用) | ~0% |
Step 3: Check Token Limits
步骤3:检查Token限制
python
undefinedpython
undefinedCheck context before auto-apply
Check context before auto-apply
max_context = model_context_window * 0.8 # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))
undefinedmax_context = model_context_window * 0.8 # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))
undefinedStep 4: Prompt Transformation
步骤4:提示词转换
python
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)python
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)Practical Examples
实际案例
Example 1: Options-First MCQ (Greatest Effect)
案例1:选项前置的选择题(效果显著)
Before:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.After (repetition ×2 applied):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.Expected output:
AAccuracy: original 78% → after repetition 93% (+15%p)
应用前:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.应用后(重复2次):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.预期输出:
A准确率:原始78% → 重复后93%(+15个百分点)
Example 2: Index/Position Tasks (Maximum Effect)
案例2:索引/位置任务(提升幅度最大)
Before:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?After (repetition ×3 applied):
Prompt repeated 3 times
Expected output:
Dragon ScaleAccuracy: original 21% → after repetition 97% (+76%p)
应用前:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?应用后(重复3次):
提示词重复3次
预期输出:
Dragon Scale准确率:原始21% → 重复后97%(+76个百分点)
Example 3: Tool Call Prompt Handling
案例3:工具调用提示词处理
Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.
Before:
Use the calculator tool to compute 234 * 567.
What is the result?After (repetition ×2):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?Research results show that full repetition including tool call sections is also effective.
注意:包含工具调用指令的提示词也会完整重复,采用全量重复的方案是为了实现简单且效果一致。
应用前:
Use the calculator tool to compute 234 * 567.
What is the result?应用后(重复2次):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?研究结果表明,包含工具调用部分的全量重复同样有效。
Production-Ready Implementation
生产级实现方案
Auto-Apply Transformer
自动应用转换器
python
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import repython
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import reContext window per model (in tokens)
Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
Models targeted for auto-apply
Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
CoT patterns (excluded from apply)
CoT patterns (excluded from apply)
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
Position/Index patterns (3× repetition)
Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
# Skip if already applied
if self.config.applied_marker in prompt:
return False
# Check target model
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# Skip when CoT pattern detected
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""Determine repetition count based on task type"""
prompt_lower = prompt.lower()
# Position/Index pattern detected → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""Simple token count estimation (speed over precision)"""
# Estimate approximately 4 characters = 1 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""Apply repetition to prompt"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# Check context limit
model_lower = model.lower()
max_tokens = 128_000 # Default value
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# Reduce repetitions if token limit exceeded
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# Apply repetition + add marker
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""Wrap LLM call function"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrapped
---POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
# Skip if already applied
if self.config.applied_marker in prompt:
return False
# Check target model
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# Skip when CoT pattern detected
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""Determine repetition count based on task type"""
prompt_lower = prompt.lower()
# Position/Index pattern detected → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""Simple token count estimation (speed over precision)"""
# Estimate approximately 4 characters = 1 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""Apply repetition to prompt"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# Check context limit
model_lower = model.lower()
max_tokens = 128_000 # Default value
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# Reduce repetitions if token limit exceeded
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# Apply repetition + add marker
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""Wrap LLM call function"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrapped
---How to Measure Effectiveness (Verification)
效果评估方法(验证)
A/B Testing Method
A/B测试方法
python
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""A/B test for prompt repetition effectiveness"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# Baseline
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# With Repetition
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")python
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""A/B test for prompt repetition effectiveness"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# Baseline
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# With Repetition
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")Key Metrics
核心指标
| Metric | Measurement Method |
|---|---|
| Accuracy | Compare correct answer rates |
| Consistency | Variance across 10 runs of same prompt |
| Token cost | Input token increase rate |
| Latency | Compare p50, p99 latency |
| 指标 | 测量方法 |
|---|---|
| 准确率 | 对比正确答案率 |
| 一致性 | 同一提示词运行10次的结果方差 |
| Token成本 | 输入Token增长率 |
| 延迟 | 对比p50、p99延迟 |
When NOT to Use
不适用场景
| Case | Reason |
|---|---|
| Using CoT | Reasoning process already provides context |
| Reasoning models (opus, sonnet) | Already optimized; minimal effect |
| Very long prompts | Risk of exceeding context limit |
| Already repeated | Duplicate application wastes tokens |
| 场景 | 原因 |
|---|---|
| 使用CoT的场景 | 推理过程已经提供了足够的上下文 |
| 推理类模型(opus、sonnet) | 本身已经做了优化,提升效果极小 |
| 超长提示词 | 有超出上下文限制的风险 |
| 已经重复过的提示词 | 重复应用会浪费Token |
Cost-Accuracy Analysis
成本-准确率分析
| Metric | Baseline | With Repetition | Change |
|---|---|---|---|
| Input tokens | 500/req | 1000/req | +100% |
| Output tokens | 100/req | 100/req | 0% |
| Latency (p50) | 450ms | 460ms | +2% |
| Latency (p99) | 1200ms | 1250ms | +4% |
| Accuracy | 78% | 89% | +14%p |
| Cost per correct answer | $0.019 | $0.020 | +5% |
Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.
| 指标 | 基准值 | 应用重复后 | 变化 |
|---|---|---|---|
| 输入Token | 500/次请求 | 1000/次请求 | +100% |
| 输出Token | 100/次请求 | 100/次请求 | 0% |
| 延迟(p50) | 450ms | 460ms | +2% |
| 延迟(p99) | 1200ms | 1250ms | +4% |
| 准确率 | 78% | 89% | +14个百分点 |
| 每正确回答成本 | $0.019 | $0.020 | +5% |
核心结论:GPU上的预填充阶段高度并行化,因此输入Token翻倍对延迟的影响极小。
Multi-Agent Integration
多Agent集成
Auto-Apply Strategy Per Agent
各Agent自动应用策略
| Agent | Model | Repetition Applied | Applied At |
|---|---|---|---|
| Claude Orchestrator | opus/sonnet | Optional | - |
| Claude Executor | haiku | Auto | skill_loader.py |
| Gemini Analyst | flash | Auto | On MCP call |
| OpenAI | gpt-4o-mini | Auto | skill_loader.py |
| Agent | 模型 | 应用重复 | 应用节点 |
|---|---|---|---|
| Claude编排器 | opus/sonnet | 可选 | - |
| Claude执行器 | haiku | 自动 | skill_loader.py |
| Gemini分析师 | flash | 自动 | MCP调用时 |
| OpenAI | gpt-4o-mini | 自动 | skill_loader.py |
Preventing Duplicate Application
防止重复应用
To prevent duplicate application in multi-agent pipelines:
- Use markers: Detect already-applied prompts with marker
<!-- prompt-repetition-applied --> - Pass metadata: Pass header between agents
x-prompt-repetition-applied: true - Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents
为了避免多Agent管道中重复应用该技术:
- 使用标记:通过标记检测已经应用过的提示词
<!-- prompt-repetition-applied --> - 传递元数据:在Agent之间传递头
x-prompt-repetition-applied: true - 编排器统一管理:Claude编排器调用子Agent时跟踪是否已经应用了重复技术
Application Pattern
应用模式
[Claude Sonnet] Planning (no repetition needed)
↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)[Claude Sonnet] 规划(无需重复)
↓
[Gemini Flash] 分析(自动应用2次重复,添加标记)
↓
[Claude Haiku] 执行(检测到标记 → 跳过重复应用)skill_loader.py Integration Guide
skill_loader.py集成指南
Recommended Implementation
推荐实现
python
undefinedpython
undefinedCode to add to skill_loader.py
Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def init(self, ...):
# ... existing code ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""Handle auto-apply skills"""
# Auto-apply prompt-repetition
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt
---from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def init(self, ...):
# ... existing code ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""Handle auto-apply skills"""
# Auto-apply prompt-repetition
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt
---Constraints
约束规则
Required Rules
必须遵守的规则
- Lightweight models first: Most effective for haiku, flash, mini series
- Limit repetitions: 2× for general tasks, max 3× for position tasks
- Context monitoring: Be cautious of context overflow due to repetition
- Check markers: Mandatory marker check to prevent duplicate application
- 优先用于轻量级模型:对haiku、flash、mini系列模型效果最显著
- 限制重复次数:通用任务2次,位置任务最多3次
- 上下文监控:注意重复导致的上下文溢出风险
- 检查标记:必须检查标记防止重复应用
Prohibited Rules
禁止规则
- No padding substitution: Increasing length with etc. has no effect (per research)
. - Do not combine with CoT: Effects cancel out
- Do not force-apply to reasoning models: Already optimized
- No duplicate application: Consecutive application without markers wastes tokens
- 不能用填充替代:研究表明用等增加长度没有效果
. - 不要和CoT结合使用:效果会相互抵消
- 不要强制应用到推理类模型:这类模型本身已经做了优化
- 不要重复应用:没有标记的连续应用会浪费Token
Quick Reference
快速参考
=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)
=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)
=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%
=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->=== 自动应用目标模型 ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== 重复次数 ===
通用任务: 2次
位置/索引(含slot/position/index关键词): 3次
包含CoT: 0次(不应用)
=== 效果(Google Research 2025) ===
提升率: 67% (47/70基准测试)
性能下降: 0例
最大提升: +76个百分点(NameIndex任务)
=== 成本 ===
输入Token: +100%
延迟: +2%(预填充并行化优化)
每正确回答成本: +5%
=== 防止重复应用 ===
标记: <!-- prompt-repetition-applied -->