Auto-Claude Optimization

Auto-Claude优化

Performance tuning, cost reduction, and efficiency improvements.

性能调优、成本降低与效率提升。

Performance Overview

性能概览

Key Metrics

关键指标

Metric	Impact	Optimization
API latency	Build speed	Model selection, caching
Token usage	Cost	Prompt efficiency, context limits
Memory queries	Speed	Embedding model, index tuning
Build iterations	Time	Spec quality, QA settings

指标	影响	优化方向
API latency	构建速度	模型选择、缓存
Token usage	成本	提示词效率、上下文限制
Memory queries	速度	嵌入模型、索引调优
Build iterations	时间	规格质量、QA设置

Model Optimization

模型优化

Model Selection

模型选择

Model	Speed	Cost	Quality	Use Case
claude-opus-4-5-20251101	Slow	High	Best	Complex features
claude-sonnet-4-5-20250929	Fast	Medium	Good	Standard features

bash

undefined

模型	速度	成本	质量	适用场景
claude-opus-4-5-20251101	慢	高	最佳	复杂功能
claude-sonnet-4-5-20250929	快	中等	良好	标准功能

bash

undefined

Override model in .env

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

undefined

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

undefined

Extended Thinking Tokens

扩展思考Token

Configure thinking budget per agent:

Agent	Default	Recommended
Spec creation	16000	Keep default for quality
Planning	5000	Reduce to 3000 for speed
Coding	0	Keep disabled
QA Review	10000	Reduce to 5000 for speed

python

undefined

为每个Agent配置思考预算：

Agent	默认值	推荐值
规格创建	16000	保持默认以保证质量
规划	5000	降低至3000以提升速度
编码	0	保持禁用
QA审核	10000	降低至5000以提升速度

python

undefined

In agent configuration

max_thinking_tokens=5000 # or None to disable

undefined

max_thinking_tokens=5000 # or None to disable

undefined

Token Optimization

Token优化

Reduce Context Size

缩小上下文规模

Smaller spec files

bash

# Keep specs concise
# Bad: 5000 word spec
# Good: 500 word spec with clear criteria

Limit codebase scanning

python

# In context/builder.py
MAX_CONTEXT_FILES = 50  # Reduce from 100

Use targeted searches

bash

# Instead of full codebase scan
# Focus on relevant directories

更简洁的规格文件

bash

# Keep specs concise
# Bad: 5000 word spec
# Good: 500 word spec with clear criteria

限制代码库扫描范围

python

# In context/builder.py
MAX_CONTEXT_FILES = 50  # Reduce from 100

使用针对性搜索

bash

# Instead of full codebase scan
# Focus on relevant directories

Efficient Prompts

高效提示词

Optimize system prompts in

apps/backend/prompts/

:

markdown

<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...

<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.

优化

apps/backend/prompts/

中的系统提示词：

markdown

<!-- Bad: Verbose -->
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...

<!-- Good: Concise -->
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.

Memory Optimization

内存优化

bash

undefined

bash

undefined

Use efficient embedding model

OPENAI_EMBEDDING_MODEL=text-embedding-3-small

Or offline with smaller model

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

undefined

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

undefined

Speed Optimization

速度优化

Parallel Execution

并行执行

bash

undefined

bash

undefined

Enable more parallel agents (default: 4)

MAX_PARALLEL_AGENTS=8

undefined

MAX_PARALLEL_AGENTS=8

undefined

Reduce QA Iterations

减少QA迭代次数

bash

undefined

bash

undefined

Limit QA loop iterations

MAX_QA_ITERATIONS=10 # Default: 50

Skip QA for quick iterations

python run.py --spec 001 --skip-qa

undefined

python run.py --spec 001 --skip-qa

undefined

Faster Spec Creation

更快的规格创建

bash

undefined

bash

undefined

Force simple complexity for quick tasks

python spec_runner.py --task "Fix typo" --complexity simple

Skip research phase

SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."

undefined

SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."

undefined

API Timeout Tuning

API超时调优

bash

undefined

bash

undefined

Reduce timeout for faster failure detection

API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)

undefined

API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)

undefined

Cost Management

成本管理

Monitor Token Usage

监控Token使用情况

bash

undefined

bash

undefined

Enable cost tracking

ENABLE_COST_TRACKING=true

View usage report

python usage_report.py --spec 001

undefined

python usage_report.py --spec 001

undefined

Cost Reduction Strategies

成本降低策略

Use cheaper models for simple tasks

bash

# For simple specs
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."

Limit context window

bash

MAX_CONTEXT_TOKENS=50000  # Reduce from 100000

Batch similar tasks

bash

# Create specs together, run together
python spec_runner.py --task "Add feature A"
python spec_runner.py --task "Add feature B"
python run.py --spec 001
python run.py --spec 002

Use local models for memory

bash

# Ollama for memory (free)
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama

简单任务使用更便宜的模型

bash

# For simple specs
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."

限制上下文窗口

bash

MAX_CONTEXT_TOKENS=50000  # Reduce from 100000

批量处理相似任务

bash

# Create specs together, run together
python spec_runner.py --task "Add feature A"
python spec_runner.py --task "Add feature B"
python run.py --spec 001
python run.py --spec 002

使用本地模型处理内存

bash

# Ollama for memory (free)
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama

Cost Estimation

成本估算

Operation	Estimated Tokens	Cost (Opus)	Cost (Sonnet)
Simple spec	10k	~$0.30	~$0.06
Standard spec	50k	~$1.50	~$0.30
Complex spec	200k	~$6.00	~$1.20
Build (simple)	50k	~$1.50	~$0.30
Build (standard)	200k	~$6.00	~$1.20
Build (complex)	500k	~$15.00	~$3.00

操作	预估Token量	成本（Opus）	成本（Sonnet）
简单规格	10k	~$0.30	~$0.06
标准规格	50k	~$1.50	~$0.30
复杂规格	200k	~$6.00	~$1.20
构建（简单）	50k	~$1.50	~$0.30
构建（标准）	200k	~$6.00	~$1.20
构建（复杂）	500k	~$15.00	~$3.00

Memory System Optimization

内存系统优化

Embedding Performance

嵌入性能

bash

undefined

bash

undefined

Faster embeddings

OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast

Higher quality (slower)

OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim

Offline (fastest, free)

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

undefined

OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

undefined

Query Optimization

查询优化

python

undefined

python

undefined

Limit search results

memory.search("query", limit=10) # Instead of 100

Use semantic caching

ENABLE_MEMORY_CACHE=true

undefined

ENABLE_MEMORY_CACHE=true

undefined

Database Maintenance

数据库维护

bash

undefined

bash

undefined

Compact database periodically

python -c "from integrations.graphiti.memory import compact_database; compact_database()"

Clear old episodes

python query_memory.py --cleanup --older-than 30d

undefined

python query_memory.py --cleanup --older-than 30d

undefined

Build Efficiency

构建效率

Spec Quality = Build Speed

规格质量 = 构建速度

High-quality specs reduce iterations:

markdown

undefined

高质量规格可减少迭代次数：

markdown

undefined

Good spec (fewer iterations)

Acceptance Criteria

User can log in with email/password
Invalid credentials show error message
Successful login redirects to /dashboard
Session persists for 24 hours

User can log in with email/password
Invalid credentials show error message
Successful login redirects to /dashboard
Session persists for 24 hours

Bad spec (more iterations)

Acceptance Criteria

Login works

undefined

Login works

undefined

Subtask Granularity

子任务粒度

Optimal subtask size:

Too large: Agent gets stuck, needs recovery
Too small: Overhead per subtask
Optimal: 30-60 minutes of work each

最优子任务规模：

过大：Agent陷入停滞，需要恢复
过小：子任务 overhead 过高
最优：每个子任务对应30-60分钟工作量

Parallel Work

并行工作

Let agents spawn subagents for parallel execution:

Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)

让Agent生成子Agent以并行执行：

Main Coder
├── Subagent 1: Frontend (parallel)
├── Subagent 2: Backend (parallel)
└── Subagent 3: Tests (parallel)

Environment Tuning

环境调优

Optimal .env Configuration

最优.env配置

bash

undefined

bash

undefined

Performance-focused configuration

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 API_TIMEOUT_MS=180000 MAX_PARALLEL_AGENTS=6

Memory optimization

GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama OLLAMA_LLM_MODEL=llama3.2:3b OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384

Reduce verbosity

DEBUG=false ENABLE_FANCY_UI=false

undefined

DEBUG=false ENABLE_FANCY_UI=false

undefined

Resource Limits

资源限制

bash

undefined

bash

undefined

Limit Python memory

export PYTHONMALLOC=malloc

Set max file descriptors

ulimit -n 4096

undefined

ulimit -n 4096

undefined

Benchmarking

基准测试

Measure Build Time

测量构建时间

bash

undefined

bash

undefined

Time a build

time python run.py --spec 001

Compare models

time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001

undefined

time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001

undefined

Profile Memory Usage

分析内存使用

bash

undefined

bash

undefined

Monitor memory

watch -n 1 'ps aux | grep python | head -5'

Profile script

python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"

undefined

python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"

undefined

Quick Wins

快速优化方案

Immediate Optimizations

即时优化措施

Switch to Sonnet for most tasks

bash

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

Use Ollama for memory

bash

GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama

Skip QA for prototypes
bash
```
python run.py --spec 001 --skip-qa
```

Force simple complexity for small tasks

bash

python spec_runner.py --task "..." --complexity simple

大部分任务切换为Sonnet模型

bash

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

使用Ollama处理内存

bash

GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama

原型开发跳过QA
bash
```
python run.py --spec 001 --skip-qa
```

小型任务强制设置为简单复杂度

bash

python spec_runner.py --task "..." --complexity simple

Medium-Term Improvements

中期改进措施

Optimize prompts in
```
apps/backend/prompts/
```
Configure project-specific security allowlist
Set up memory caching
Tune parallel agent count

优化
```
apps/backend/prompts/
```
中的提示词
配置项目专属安全白名单
启用内存缓存
调整并行Agent数量

Long-Term Strategies

长期策略

Self-hosted LLM for memory (Ollama)
Caching layer for common operations
Incremental context building
Project-specific prompt optimization

自托管LLM处理内存（Ollama）
为常见操作添加缓存层
增量式上下文构建
项目专属提示词优化

auto-claude-optimization

Original

Translation

Auto-Claude Optimization

Auto-Claude优化

Performance Overview

性能概览

Key Metrics

关键指标

Model Optimization

模型优化

Model Selection

模型选择

Override model in .env

Override model in .env

Extended Thinking Tokens

扩展思考Token

In agent configuration

In agent configuration

Token Optimization

Token优化

Reduce Context Size

缩小上下文规模

Efficient Prompts

高效提示词

Memory Optimization

内存优化

Use efficient embedding model

Use efficient embedding model

Or offline with smaller model

Or offline with smaller model

Speed Optimization

速度优化

Parallel Execution

并行执行

Enable more parallel agents (default: 4)

Enable more parallel agents (default: 4)

Reduce QA Iterations

减少QA迭代次数

Limit QA loop iterations

Limit QA loop iterations

Skip QA for quick iterations

Skip QA for quick iterations

Faster Spec Creation

更快的规格创建

Force simple complexity for quick tasks

Force simple complexity for quick tasks

Skip research phase

Skip research phase

API Timeout Tuning

API超时调优

Reduce timeout for faster failure detection

Reduce timeout for faster failure detection

Cost Management

成本管理

Monitor Token Usage

监控Token使用情况

Enable cost tracking

Enable cost tracking

View usage report

View usage report

Cost Reduction Strategies

成本降低策略

Cost Estimation

成本估算

Memory System Optimization

内存系统优化

Embedding Performance

嵌入性能

Faster embeddings

Faster embeddings

Higher quality (slower)

Higher quality (slower)

Offline (fastest, free)

Offline (fastest, free)

Query Optimization

查询优化

Limit search results

Limit search results

Use semantic caching