Loading...
Loading...
A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).
npx skill4agent add akillness/oh-my-skills prompt-repetition[Context] → [Question]
↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear[First Pass] [Second Pass]
Context → Question → Context' → Question'
↑ ↑
Can reference entire first passNote: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.
| Metric | Result |
|---|---|
| Significant improvement (p < 0.1) | 47 / 70 benchmarks |
| Performance degradation | 0 |
| Neutral | 23 |
| Improvement rate | 67% |
| Provider | Auto-apply models | Excluded models |
|---|---|---|
| Claude | haiku series | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |
| Task Type | Keyword Pattern | Repetitions | Expected Improvement |
|---|---|---|---|
| Options-First MCQ | | 2× | +15-40%p |
| Index/Position | | 3× | +50-76%p |
| Context + Question | General question | 2× | +5-15%p |
| With CoT | | 0× (not applied) | ~0% |
# Check context before auto-apply
max_context = model_context_window * 0.8 # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.AInventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?Dragon ScaleUse the calculator tool to compute 234 * 567.
What is the result?Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?Research results show that full repetition including tool call sections is also effective.
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
# Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
# CoT patterns (excluded from apply)
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
# Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
# Skip if already applied
if self.config.applied_marker in prompt:
return False
# Check target model
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# Skip when CoT pattern detected
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""Determine repetition count based on task type"""
prompt_lower = prompt.lower()
# Position/Index pattern detected → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""Simple token count estimation (speed over precision)"""
# Estimate approximately 4 characters = 1 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""Apply repetition to prompt"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# Check context limit
model_lower = model.lower()
max_tokens = 128_000 # Default value
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# Reduce repetitions if token limit exceeded
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# Apply repetition + add marker
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""Wrap LLM call function"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrappeddef run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""A/B test for prompt repetition effectiveness"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# Baseline
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# With Repetition
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")| Metric | Measurement Method |
|---|---|
| Accuracy | Compare correct answer rates |
| Consistency | Variance across 10 runs of same prompt |
| Token cost | Input token increase rate |
| Latency | Compare p50, p99 latency |
| Case | Reason |
|---|---|
| Using CoT | Reasoning process already provides context |
| Reasoning models (opus, sonnet) | Already optimized; minimal effect |
| Very long prompts | Risk of exceeding context limit |
| Already repeated | Duplicate application wastes tokens |
| Metric | Baseline | With Repetition | Change |
|---|---|---|---|
| Input tokens | 500/req | 1000/req | +100% |
| Output tokens | 100/req | 100/req | 0% |
| Latency (p50) | 450ms | 460ms | +2% |
| Latency (p99) | 1200ms | 1250ms | +4% |
| Accuracy | 78% | 89% | +14%p |
| Cost per correct answer | $0.019 | $0.020 | +5% |
| Agent | Model | Repetition Applied | Applied At |
|---|---|---|---|
| Claude Orchestrator | opus/sonnet | Optional | - |
| Claude Executor | haiku | Auto | skill_loader.py |
| Gemini Analyst | flash | Auto | On MCP call |
| OpenAI | gpt-4o-mini | Auto | skill_loader.py |
<!-- prompt-repetition-applied -->x-prompt-repetition-applied: true[Claude Sonnet] Planning (no repetition needed)
↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)# Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def __init__(self, ...):
# ... existing code ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""Handle auto-apply skills"""
# Auto-apply prompt-repetition
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt.=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)
=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)
=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%
=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->