auto-claude-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Auto-Claude Optimization

Auto-Claude优化

Performance tuning, cost reduction, and efficiency improvements.
性能调优、成本降低与效率提升。

Performance Overview

性能概览

Key Metrics

关键指标

MetricImpactOptimization
API latencyBuild speedModel selection, caching
Token usageCostPrompt efficiency, context limits
Memory queriesSpeedEmbedding model, index tuning
Build iterationsTimeSpec quality, QA settings
指标影响优化方向
API latency构建速度模型选择、缓存
Token usage成本提示词效率、上下文限制
Memory queries速度嵌入模型、索引调优
Build iterations时间规格质量、QA设置

Model Optimization

模型优化

Model Selection

模型选择

ModelSpeedCostQualityUse Case
claude-opus-4-5-20251101SlowHighBestComplex features
claude-sonnet-4-5-20250929FastMediumGoodStandard features
bash
undefined
模型速度成本质量适用场景
claude-opus-4-5-20251101最佳复杂功能
claude-sonnet-4-5-20250929中等良好标准功能
bash
undefined

Override model in .env

Override model in .env

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
undefined
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
undefined

Extended Thinking Tokens

扩展思考Token

Configure thinking budget per agent:
AgentDefaultRecommended
Spec creation16000Keep default for quality
Planning5000Reduce to 3000 for speed
Coding0Keep disabled
QA Review10000Reduce to 5000 for speed
python
undefined
为每个Agent配置思考预算:
Agent默认值推荐值
规格创建16000保持默认以保证质量
规划5000降低至3000以提升速度
编码0保持禁用
QA审核10000降低至5000以提升速度
python
undefined

In agent configuration

In agent configuration

max_thinking_tokens=5000 # or None to disable
undefined
max_thinking_tokens=5000 # or None to disable
undefined

Token Optimization

Token优化

Reduce Context Size

缩小上下文规模

  1. Smaller spec files
    bash
    # Keep specs concise
    # Bad: 5000 word spec
    # Good: 500 word spec with clear criteria
  2. Limit codebase scanning
    python
    # In context/builder.py
    MAX_CONTEXT_FILES = 50  # Reduce from 100
  3. Use targeted searches
    bash
    # Instead of full codebase scan
    # Focus on relevant directories
  1. 更简洁的规格文件
    bash
    # Keep specs concise
    # Bad: 5000 word spec
    # Good: 500 word spec with clear criteria
  2. 限制代码库扫描范围
    python
    # In context/builder.py
    MAX_CONTEXT_FILES = 50  # Reduce from 100
  3. 使用针对性搜索
    bash
    # Instead of full codebase scan
    # Focus on relevant directories

Efficient Prompts

高效提示词

Optimize system prompts in
apps/backend/prompts/
:
markdown
<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...

<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.
优化
apps/backend/prompts/
中的系统提示词:
markdown
<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...

<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.

Memory Optimization

内存优化

bash
undefined
bash
undefined

Use efficient embedding model

Use efficient embedding model

OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

Or offline with smaller model

Or offline with smaller model

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
undefined
OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
undefined

Speed Optimization

速度优化

Parallel Execution

并行执行

bash
undefined
bash
undefined

Enable more parallel agents (default: 4)

Enable more parallel agents (default: 4)

MAX_PARALLEL_AGENTS=8
undefined
MAX_PARALLEL_AGENTS=8
undefined

Reduce QA Iterations

减少QA迭代次数

bash
undefined
bash
undefined

Limit QA loop iterations

Limit QA loop iterations

MAX_QA_ITERATIONS=10 # Default: 50
MAX_QA_ITERATIONS=10 # Default: 50

Skip QA for quick iterations

Skip QA for quick iterations

python run.py --spec 001 --skip-qa
undefined
python run.py --spec 001 --skip-qa
undefined

Faster Spec Creation

更快的规格创建

bash
undefined
bash
undefined

Force simple complexity for quick tasks

Force simple complexity for quick tasks

python spec_runner.py --task "Fix typo" --complexity simple
python spec_runner.py --task "Fix typo" --complexity simple

Skip research phase

Skip research phase

SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."
undefined
SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."
undefined

API Timeout Tuning

API超时调优

bash
undefined
bash
undefined

Reduce timeout for faster failure detection

Reduce timeout for faster failure detection

API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)
undefined
API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)
undefined

Cost Management

成本管理

Monitor Token Usage

监控Token使用情况

bash
undefined
bash
undefined

Enable cost tracking

Enable cost tracking

ENABLE_COST_TRACKING=true
ENABLE_COST_TRACKING=true

View usage report

View usage report

python usage_report.py --spec 001
undefined
python usage_report.py --spec 001
undefined

Cost Reduction Strategies

成本降低策略

  1. Use cheaper models for simple tasks
    bash
    # For simple specs
    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."
  2. Limit context window
    bash
    MAX_CONTEXT_TOKENS=50000  # Reduce from 100000
  3. Batch similar tasks
    bash
    # Create specs together, run together
    python spec_runner.py --task "Add feature A"
    python spec_runner.py --task "Add feature B"
    python run.py --spec 001
    python run.py --spec 002
  4. Use local models for memory
    bash
    # Ollama for memory (free)
    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama
  1. 简单任务使用更便宜的模型
    bash
    # For simple specs
    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."
  2. 限制上下文窗口
    bash
    MAX_CONTEXT_TOKENS=50000  # Reduce from 100000
  3. 批量处理相似任务
    bash
    # Create specs together, run together
    python spec_runner.py --task "Add feature A"
    python spec_runner.py --task "Add feature B"
    python run.py --spec 001
    python run.py --spec 002
  4. 使用本地模型处理内存
    bash
    # Ollama for memory (free)
    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama

Cost Estimation

成本估算

OperationEstimated TokensCost (Opus)Cost (Sonnet)
Simple spec10k~$0.30~$0.06
Standard spec50k~$1.50~$0.30
Complex spec200k~$6.00~$1.20
Build (simple)50k~$1.50~$0.30
Build (standard)200k~$6.00~$1.20
Build (complex)500k~$15.00~$3.00
操作预估Token量成本(Opus)成本(Sonnet)
简单规格10k~$0.30~$0.06
标准规格50k~$1.50~$0.30
复杂规格200k~$6.00~$1.20
构建(简单)50k~$1.50~$0.30
构建(标准)200k~$6.00~$1.20
构建(复杂)500k~$15.00~$3.00

Memory System Optimization

内存系统优化

Embedding Performance

嵌入性能

bash
undefined
bash
undefined

Faster embeddings

Faster embeddings

OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast
OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast

Higher quality (slower)

Higher quality (slower)

OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim
OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim

Offline (fastest, free)

Offline (fastest, free)

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
undefined
OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
undefined

Query Optimization

查询优化

python
undefined
python
undefined

Limit search results

Limit search results

memory.search("query", limit=10) # Instead of 100
memory.search("query", limit=10) # Instead of 100

Use semantic caching

Use semantic caching

ENABLE_MEMORY_CACHE=true
undefined
ENABLE_MEMORY_CACHE=true
undefined

Database Maintenance

数据库维护

bash
undefined
bash
undefined

Compact database periodically

Compact database periodically

python -c "from integrations.graphiti.memory import compact_database; compact_database()"
python -c "from integrations.graphiti.memory import compact_database; compact_database()"

Clear old episodes

Clear old episodes

python query_memory.py --cleanup --older-than 30d
undefined
python query_memory.py --cleanup --older-than 30d
undefined

Build Efficiency

构建效率

Spec Quality = Build Speed

规格质量 = 构建速度

High-quality specs reduce iterations:
markdown
undefined
高质量规格可减少迭代次数:
markdown
undefined

Good spec (fewer iterations)

Good spec (fewer iterations)

Acceptance Criteria

Acceptance Criteria

  • User can log in with email/password
  • Invalid credentials show error message
  • Successful login redirects to /dashboard
  • Session persists for 24 hours
  • User can log in with email/password
  • Invalid credentials show error message
  • Successful login redirects to /dashboard
  • Session persists for 24 hours

Bad spec (more iterations)

Bad spec (more iterations)

Acceptance Criteria

Acceptance Criteria

  • Login works
undefined
  • Login works
undefined

Subtask Granularity

子任务粒度

Optimal subtask size:
  • Too large: Agent gets stuck, needs recovery
  • Too small: Overhead per subtask
  • Optimal: 30-60 minutes of work each
最优子任务规模:
  • 过大:Agent陷入停滞,需要恢复
  • 过小:子任务 overhead 过高
  • 最优:每个子任务对应30-60分钟工作量

Parallel Work

并行工作

Let agents spawn subagents for parallel execution:
Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)
让Agent生成子Agent以并行执行:
Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)

Environment Tuning

环境调优

Optimal .env Configuration

最优.env配置

bash
undefined
bash
undefined

Performance-focused configuration

Performance-focused configuration

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 API_TIMEOUT_MS=180000 MAX_PARALLEL_AGENTS=6
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 API_TIMEOUT_MS=180000 MAX_PARALLEL_AGENTS=6

Memory optimization

Memory optimization

GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama OLLAMA_LLM_MODEL=llama3.2:3b OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama OLLAMA_LLM_MODEL=llama3.2:3b OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

Reduce verbosity

Reduce verbosity

DEBUG=false ENABLE_FANCY_UI=false
undefined
DEBUG=false ENABLE_FANCY_UI=false
undefined

Resource Limits

资源限制

bash
undefined
bash
undefined

Limit Python memory

Limit Python memory

export PYTHONMALLOC=malloc
export PYTHONMALLOC=malloc

Set max file descriptors

Set max file descriptors

ulimit -n 4096
undefined
ulimit -n 4096
undefined

Benchmarking

基准测试

Measure Build Time

测量构建时间

bash
undefined
bash
undefined

Time a build

Time a build

time python run.py --spec 001
time python run.py --spec 001

Compare models

Compare models

time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001
undefined
time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001
undefined

Profile Memory Usage

分析内存使用

bash
undefined
bash
undefined

Monitor memory

Monitor memory

watch -n 1 'ps aux | grep python | head -5'
watch -n 1 'ps aux | grep python | head -5'

Profile script

Profile script

python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"
undefined
python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"
undefined

Quick Wins

快速优化方案

Immediate Optimizations

即时优化措施

  1. Switch to Sonnet for most tasks
    bash
    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
  2. Use Ollama for memory
    bash
    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama
  3. Skip QA for prototypes
    bash
    python run.py --spec 001 --skip-qa
  4. Force simple complexity for small tasks
    bash
    python spec_runner.py --task "..." --complexity simple
  1. 大部分任务切换为Sonnet模型
    bash
    AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
  2. 使用Ollama处理内存
    bash
    GRAPHITI_LLM_PROVIDER=ollama
    GRAPHITI_EMBEDDER_PROVIDER=ollama
  3. 原型开发跳过QA
    bash
    python run.py --spec 001 --skip-qa
  4. 小型任务强制设置为简单复杂度
    bash
    python spec_runner.py --task "..." --complexity simple

Medium-Term Improvements

中期改进措施

  1. Optimize prompts in
    apps/backend/prompts/
  2. Configure project-specific security allowlist
  3. Set up memory caching
  4. Tune parallel agent count
  1. 优化
    apps/backend/prompts/
    中的提示词
  2. 配置项目专属安全白名单
  3. 启用内存缓存
  4. 调整并行Agent数量

Long-Term Strategies

长期策略

  1. Self-hosted LLM for memory (Ollama)
  2. Caching layer for common operations
  3. Incremental context building
  4. Project-specific prompt optimization
  1. 自托管LLM处理内存(Ollama)
  2. 为常见操作添加缓存层
  3. 增量式上下文构建
  4. 项目专属提示词优化

Related Skills

相关技能

  • auto-claude-memory: Memory configuration
  • auto-claude-build: Build process
  • auto-claude-troubleshooting: Debugging
  • auto-claude-memory: 内存配置
  • auto-claude-build: 构建流程
  • auto-claude-troubleshooting: 调试