hyperagents-self-improving-ai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHyperAgents Self-Improving AI Skill
HyperAgents 自我改进AI Skill
Overview
概述
HyperAgents is a framework for building self-referential self-improving AI agents that can optimize for any computable task. The system uses a meta-agent to iteratively improve a task-agent by generating and evaluating code modifications. The framework supports multiple domains (code generation, reasoning, math, etc.) and uses foundation models to drive the self-improvement loop.
Key Capabilities:
- Self-referential meta-learning where agents modify their own code
- Multi-domain support (code, math, reasoning tasks)
- Iterative improvement through generation-evaluation loops
- Integration with OpenAI, Anthropic, and Google Gemini models
- Docker-based safe execution environment
HyperAgents是一个用于构建自指式自我改进AI Agent的框架,可针对任意可计算任务进行优化。该系统使用元Agent通过生成和评估代码修改来迭代改进任务Agent。框架支持多个领域(代码生成、推理、数学等),并使用基础模型驱动自我改进循环。
核心功能:
- 支持Agent修改自身代码的自指式元学习
- 多领域支持(代码、数学、推理任务等)
- 通过生成-评估循环实现迭代改进
- 集成OpenAI、Anthropic和Google Gemini模型
- 基于Docker的安全执行环境
Installation
安装
Prerequisites
前置依赖
bash
undefinedbash
undefinedInstall system dependencies (Fedora/RHEL)
安装系统依赖(Fedora/RHEL)
sudo dnf install -y python3.12-devel graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel
sudo dnf install -y python3.12-devel graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel
For Ubuntu/Debian:
针对Ubuntu/Debian系统:
sudo apt-get install -y python3.12-dev graphviz libgraphviz-dev cmake ninja-build libbz2-dev zlib1g-dev libncurses-dev libffi-dev
sudo apt-get install -y python3.12-dev graphviz libgraphviz-dev cmake ninja-build libbz2-dev zlib1g-dev libncurses-dev libffi-dev
undefinedundefinedSetup
配置步骤
bash
undefinedbash
undefinedClone the repository
克隆仓库
git clone https://github.com/facebookresearch/HyperAgents.git
cd HyperAgents
git clone https://github.com/facebookresearch/HyperAgents.git
cd HyperAgents
Create virtual environment
创建虚拟环境
python3.12 -m venv venv_nat
source venv_nat/bin/activate
python3.12 -m venv venv_nat
source venv_nat/bin/activate
Install dependencies
安装依赖
pip install -r requirements.txt
pip install -r requirements_dev.txt
pip install -r requirements.txt
pip install -r requirements_dev.txt
Build Docker container for safe execution
构建用于安全执行的Docker容器
docker build --network=host -t hyperagents .
undefineddocker build --network=host -t hyperagents .
undefinedEnvironment Configuration
环境配置
Create a file with your API keys:
.envbash
undefined创建包含API密钥的 文件:
.envbash
undefined.env file
.env 文件
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GEMINI_API_KEY=your_gemini_key_here
undefinedOPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GEMINI_API_KEY=your_gemini_key_here
undefinedInitialize Agents
初始化Agent
bash
undefinedbash
undefinedSetup initial agent implementations
设置初始Agent实现
bash ./setup_initial.sh
undefinedbash ./setup_initial.sh
undefinedCore Concepts
核心概念
Architecture
架构
- Task Agent: Solves domain-specific tasks (code generation, math, etc.)
- Meta Agent: Observes task agent performance and generates improvements
- Generation Loop: Iteratively evolves agents through self-improvement cycles
- 任务Agent:解决特定领域任务(代码生成、数学等)
- 元Agent:观察任务Agent的性能并生成改进方案
- 生成循环:通过自我改进周期迭代进化Agent
File Structure
文件结构
HyperAgents/
├── agent/ # Foundation model interfaces
├── domains/ # Task-specific implementations
├── utils/ # Common utilities
├── meta_agent.py # Meta-agent implementation
├── task_agent.py # Task-agent implementation
├── generate_loop.py # Main entry point
└── run_meta_agent.py # Meta-agent execution scriptHyperAgents/
├── agent/ # 基础模型接口
├── domains/ # 任务特定实现
├── utils/ # 通用工具
├── meta_agent.py # 元Agent实现
├── task_agent.py # 任务Agent实现
├── generate_loop.py # 主入口
└── run_meta_agent.py # 元Agent执行脚本Usage
使用方法
Running the Self-Improvement Loop
运行自我改进循环
bash
undefinedbash
undefinedBasic usage with default settings
使用默认设置的基础用法
python generate_loop.py --domains code_generation
python generate_loop.py --domains code_generation
Multiple domains
多领域运行
python generate_loop.py --domains math reasoning
python generate_loop.py --domains math reasoning
Custom configuration
自定义配置
python generate_loop.py
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4
undefinedpython generate_loop.py
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4
undefinedKey Command-Line Arguments
关键命令行参数
python
undefinedpython
undefinedCommon arguments for generate_loop.py
generate_loop.py的常用参数
--domains # Domain(s) to optimize (code_generation, math, reasoning, etc.)
--max_iterations # Maximum improvement iterations
--output_dir # Directory for outputs (default: outputs/)
--model_name # Foundation model to use
--baseline # Baseline agent to compare against
--temperature # Sampling temperature for generation
--num_samples # Number of samples per iteration
undefined--domains # 要优化的领域(code_generation、math、reasoning等)
--max_iterations # 最大改进迭代次数
--output_dir # 输出目录(默认:outputs/)
--model_name # 使用的基础模型
--baseline # 用于对比的基准Agent
--temperature # 生成时的采样温度
--num_samples # 每次迭代的样本数量
undefinedWorking with Task Agents
任务Agent使用指南
Creating a Custom Task Agent
创建自定义任务Agent
python
undefinedpython
undefinedtask_agent.py - Basic structure
task_agent.py - 基础结构
from typing import Any, Dict, List
from agent.base_agent import BaseAgent
class MyTaskAgent(BaseAgent):
"""Custom task agent for specific domain."""
def __init__(self, config: Dict[str, Any]):
super().__init__(config)
self.domain = config.get('domain', 'custom')
def solve_task(self, task_input: str) -> str:
"""
Main method to solve a task.
Args:
task_input: Input task specification
Returns:
Solution to the task
"""
# Generate prompt for the model
prompt = self._create_prompt(task_input)
# Get model response
response = self.model.generate(
prompt=prompt,
temperature=self.config.get('temperature', 0.7),
max_tokens=self.config.get('max_tokens', 2048)
)
# Post-process response
solution = self._parse_solution(response)
return solution
def _create_prompt(self, task_input: str) -> str:
"""Create prompt for the model."""
return f"""Solve the following task:Task: {task_input}
Solution:"""
def _parse_solution(self, response: str) -> str:
"""Extract solution from model response."""
# Custom parsing logic
return response.strip()
def evaluate(self, task_input: str, solution: str) -> float:
"""
Evaluate solution quality.
Returns:
Score between 0 and 1
"""
# Domain-specific evaluation
return self._compute_score(task_input, solution)undefinedfrom typing import Any, Dict, List
from agent.base_agent import BaseAgent
class MyTaskAgent(BaseAgent):
"""针对特定领域的自定义任务Agent。"""
def __init__(self, config: Dict[str, Any]):
super().__init__(config)
self.domain = config.get('domain', 'custom')
def solve_task(self, task_input: str) -> str:
"""
解决任务的核心方法。
参数:
task_input: 输入任务描述
返回:
任务解决方案
"""
# 为模型生成提示词
prompt = self._create_prompt(task_input)
# 获取模型响应
response = self.model.generate(
prompt=prompt,
temperature=self.config.get('temperature', 0.7),
max_tokens=self.config.get('max_tokens', 2048)
)
# 后处理响应
solution = self._parse_solution(response)
return solution
def _create_prompt(self, task_input: str) -> str:
"""为模型创建提示词。"""
return f"""解决以下任务:任务: {task_input}
解决方案:"""
def _parse_solution(self, response: str) -> str:
"""从模型响应中提取解决方案。"""
# 自定义解析逻辑
return response.strip()
def evaluate(self, task_input: str, solution: str) -> float:
"""
评估解决方案质量。
返回:
0到1之间的分数
"""
# 领域特定的评估逻辑
return self._compute_score(task_input, solution)undefinedUsing the Task Agent
使用任务Agent
python
from task_agent import MyTaskAgentpython
from task_agent import MyTaskAgentInitialize agent
初始化Agent
config = {
'domain': 'custom',
'model_name': 'gpt-4',
'temperature': 0.7,
'max_tokens': 2048
}
agent = MyTaskAgent(config)
config = {
'domain': 'custom',
'model_name': 'gpt-4',
'temperature': 0.7,
'max_tokens': 2048
}
agent = MyTaskAgent(config)
Solve a task
解决任务
task = "Write a function to compute Fibonacci numbers"
solution = agent.solve_task(task)
score = agent.evaluate(task, solution)
print(f"Solution: {solution}")
print(f"Score: {score}")
undefinedtask = "编写一个计算斐波那契数列的函数"
solution = agent.solve_task(task)
score = agent.evaluate(task, solution)
print(f"解决方案: {solution}")
print(f"分数: {score}")
undefinedWorking with Meta Agents
元Agent使用指南
Meta Agent Structure
元Agent结构
python
undefinedpython
undefinedmeta_agent.py - Core implementation
meta_agent.py - 核心实现
from typing import Dict, List, Any
import difflib
class MetaAgent:
"""Meta-agent that improves task agents."""
def __init__(self, config: Dict[str, Any]):
self.config = config
self.model = self._initialize_model()
self.history = []
def generate_improvement(
self,
current_code: str,
performance_data: List[Dict[str, Any]]
) -> str:
"""
Generate improved version of task agent.
Args:
current_code: Current task agent implementation
performance_data: Performance metrics from recent runs
Returns:
Improved code implementation
"""
# Analyze performance
insights = self._analyze_performance(performance_data)
# Generate improvement prompt
prompt = self._create_meta_prompt(current_code, insights)
# Generate new code
improved_code = self.model.generate(
prompt=prompt,
temperature=self.config.get('meta_temperature', 0.8),
max_tokens=self.config.get('meta_max_tokens', 4096)
)
# Validate and extract code
validated_code = self._validate_code(improved_code)
# Store in history
self.history.append({
'original': current_code,
'improved': validated_code,
'insights': insights
})
return validated_code
def _analyze_performance(
self,
performance_data: List[Dict[str, Any]]
) -> Dict[str, Any]:
"""Analyze performance metrics to identify improvement areas."""
# Compute statistics
scores = [d['score'] for d in performance_data]
avg_score = sum(scores) / len(scores)
# Identify failure patterns
failures = [d for d in performance_data if d['score'] < 0.5]
return {
'average_score': avg_score,
'num_failures': len(failures),
'failure_patterns': self._extract_patterns(failures)
}
def _create_meta_prompt(
self,
current_code: str,
insights: Dict[str, Any]
) -> str:
"""Create prompt for meta-level improvement."""
return f"""You are a meta-agent tasked with improving an AI task agent.Current Implementation:
python
{current_code}Performance Analysis:
- Average Score: {insights['average_score']:.2f}
- Failures: {insights['num_failures']}
- Common Issues: {insights.get('failure_patterns', 'None identified')}
Generate an improved version that addresses these issues.
Output only the complete improved code.
Improved Implementation:
python
def _validate_code(self, code: str) -> str:
"""Validate and extract code from response."""
# Extract code block
if '```python' in code:
code = code.split('```python')[1].split('```')[0]
# Basic syntax validation
try:
compile(code, '<string>', 'exec')
except SyntaxError as e:
raise ValueError(f"Generated code has syntax error: {e}")
return code.strip()
def compute_diff(self, old_code: str, new_code: str) -> List[str]:
"""Compute diff between code versions."""
diff = difflib.unified_diff(
old_code.splitlines(keepends=True),
new_code.splitlines(keepends=True),
fromfile='old_agent.py',
tofile='new_agent.py'
)
return list(diff)from typing import Dict, List, Any
import difflib
class MetaAgent:
"""用于改进任务Agent的元Agent。"""
def __init__(self, config: Dict[str, Any]):
self.config = config
self.model = self._initialize_model()
self.history = []
def generate_improvement(
self,
current_code: str,
performance_data: List[Dict[str, Any]]
) -> str:
"""
生成任务Agent的改进版本。
参数:
current_code: 当前任务Agent的实现代码
performance_data: 最近运行的性能指标
返回:
改进后的代码实现
"""
# 分析性能
insights = self._analyze_performance(performance_data)
# 生成改进提示词
prompt = self._create_meta_prompt(current_code, insights)
# 生成新代码
improved_code = self.model.generate(
prompt=prompt,
temperature=self.config.get('meta_temperature', 0.8),
max_tokens=self.config.get('meta_max_tokens', 4096)
)
# 验证并提取代码
validated_code = self._validate_code(improved_code)
# 存入历史记录
self.history.append({
'original': current_code,
'improved': validated_code,
'insights': insights
})
return validated_code
def _analyze_performance(
self,
performance_data: List[Dict[str, Any]]
) -> Dict[str, Any]:
"""分析性能指标以确定改进方向。"""
# 计算统计数据
scores = [d['score'] for d in performance_data]
avg_score = sum(scores) / len(scores)
# 识别失败模式
failures = [d for d in performance_data if d['score'] < 0.5]
return {
'average_score': avg_score,
'num_failures': len(failures),
'failure_patterns': self._extract_patterns(failures)
}
def _create_meta_prompt(
self,
current_code: str,
insights: Dict[str, Any]
) -> str:
"""创建元级改进的提示词。"""
return f"""你是一个负责改进AI任务Agent的元Agent。当前实现:
python
{current_code}性能分析:
- 平均分数: {insights['average_score']:.2f}
- 失败次数: {insights['num_failures']}
- 常见问题: {insights.get('failure_patterns', '未识别到')}
生成一个解决这些问题的改进版本。仅输出完整的改进代码。
改进后的实现:
python
def _validate_code(self, code: str) -> str:
"""验证并从响应中提取代码。"""
# 提取代码块
if '```python' in code:
code = code.split('```python')[1].split('```')[0]
# 基础语法验证
try:
compile(code, '<string>', 'exec')
except SyntaxError as e:
raise ValueError(f"生成的代码存在语法错误: {e}")
return code.strip()
def compute_diff(self, old_code: str, new_code: str) -> List[str]:
"""计算代码版本之间的差异。"""
diff = difflib.unified_diff(
old_code.splitlines(keepends=True),
new_code.splitlines(keepends=True),
fromfile='old_agent.py',
tofile='new_agent.py'
)
return list(diff)Running Meta Agent
运行元Agent
python
undefinedpython
undefinedrun_meta_agent.py - Example usage
run_meta_agent.py - 使用示例
from meta_agent import MetaAgent
from task_agent import MyTaskAgent
import json
def run_meta_improvement_cycle(
initial_agent_code: str,
test_tasks: List[str],
num_iterations: int = 5
):
"""Run multiple iterations of meta-improvement."""
# Initialize meta-agent
meta_config = {
'model_name': 'gpt-4',
'meta_temperature': 0.8,
'meta_max_tokens': 4096
}
meta_agent = MetaAgent(meta_config)
current_code = initial_agent_code
for iteration in range(num_iterations):
print(f"\n=== Iteration {iteration + 1} ===")
# Evaluate current agent
performance_data = evaluate_agent(current_code, test_tasks)
avg_score = sum(d['score'] for d in performance_data) / len(performance_data)
print(f"Current Performance: {avg_score:.3f}")
# Generate improvement
improved_code = meta_agent.generate_improvement(
current_code,
performance_data
)
# Show diff
diff = meta_agent.compute_diff(current_code, improved_code)
print("Changes:")
print(''.join(diff[:20])) # Show first 20 lines
# Update current code
current_code = improved_code
# Save checkpoint
with open(f'agent_iteration_{iteration}.py', 'w') as f:
f.write(current_code)
return current_codedef evaluate_agent(agent_code: str, test_tasks: List[str]) -> List[Dict[str, Any]]:
"""Evaluate agent on test tasks."""
# Create agent from code
namespace = {}
exec(agent_code, namespace)
AgentClass = namespace['MyTaskAgent']
agent = AgentClass({'model_name': 'gpt-4'})
results = []
for task in test_tasks:
solution = agent.solve_task(task)
score = agent.evaluate(task, solution)
results.append({
'task': task,
'solution': solution,
'score': score
})
return resultsfrom meta_agent import MetaAgent
from task_agent import MyTaskAgent
import json
def run_meta_improvement_cycle(
initial_agent_code: str,
test_tasks: List[str],
num_iterations: int = 5
):
"""运行多轮元改进循环。"""
# 初始化元Agent
meta_config = {
'model_name': 'gpt-4',
'meta_temperature': 0.8,
'meta_max_tokens': 4096
}
meta_agent = MetaAgent(meta_config)
current_code = initial_agent_code
for iteration in range(num_iterations):
print(f"\n=== 迭代 {iteration + 1} ===")
# 评估当前Agent
performance_data = evaluate_agent(current_code, test_tasks)
avg_score = sum(d['score'] for d in performance_data) / len(performance_data)
print(f"当前性能: {avg_score:.3f}")
# 生成改进方案
improved_code = meta_agent.generate_improvement(
current_code,
performance_data
)
# 显示差异
diff = meta_agent.compute_diff(current_code, improved_code)
print("变更:")
print(''.join(diff[:20])) # 显示前20行
# 更新当前代码
current_code = improved_code
# 保存检查点
with open(f'agent_iteration_{iteration}.py', 'w') as f:
f.write(current_code)
return current_codedef evaluate_agent(agent_code: str, test_tasks: List[str]) -> List[Dict[str, Any]]:
"""在测试任务上评估Agent。"""
# 从代码创建Agent
namespace = {}
exec(agent_code, namespace)
AgentClass = namespace['MyTaskAgent']
agent = AgentClass({'model_name': 'gpt-4'})
results = []
for task in test_tasks:
solution = agent.solve_task(task)
score = agent.evaluate(task, solution)
results.append({
'task': task,
'solution': solution,
'score': score
})
return resultsUsage
使用示例
if name == 'main':
# Read initial agent code
with open('initial_agent.py', 'r') as f:
initial_code = f.read()
# Define test tasks
test_tasks = [
"Implement binary search",
"Write a function to reverse a linked list",
"Create a trie data structure"
]
# Run improvement loop
final_code = run_meta_improvement_cycle(
initial_code,
test_tasks,
num_iterations=5
)
print("\nFinal agent saved!")undefinedif name == 'main':
# 读取初始Agent代码
with open('initial_agent.py', 'r') as f:
initial_code = f.read()
# 定义测试任务
test_tasks = [
"实现二分查找",
"编写一个反转链表的函数",
"创建前缀树数据结构"
]
# 运行改进循环
final_code = run_meta_improvement_cycle(
initial_code,
test_tasks,
num_iterations=5
)
print("\n最终Agent已保存!")undefinedDomain-Specific Implementation
特定领域实现
Code Generation Domain
代码生成领域
python
undefinedpython
undefineddomains/code_generation/agent.py
domains/code_generation/agent.py
from typing import Dict, Any, List
import ast
import subprocess
class CodeGenerationAgent:
"""Agent specialized for code generation tasks."""
def generate_code(self, specification: str) -> str:
"""Generate code from specification."""
prompt = f"""Generate Python code for the following specification:{specification}
Requirements:
- Include proper error handling
- Add docstrings
- Follow PEP 8 style guide
Code:
python
code = self.model.generate(prompt)
return self._extract_code(code)
def test_code(self, code: str, test_cases: List[Dict[str, Any]]) -> float:
"""Test generated code against test cases."""
try:
# Create temporary module
namespace = {}
exec(code, namespace)
passed = 0
for test in test_cases:
func_name = test['function']
inputs = test['inputs']
expected = test['expected']
func = namespace[func_name]
result = func(*inputs)
if result == expected:
passed += 1
return passed / len(test_cases)
except Exception as e:
print(f"Test error: {e}")
return 0.0
def _extract_code(self, response: str) -> str:
"""Extract code from model response."""
if '```python' in response:
code = response.split('```python')[1].split('```')[0]
else:
code = response
# Validate syntax
try:
ast.parse(code)
except SyntaxError:
raise ValueError("Generated code has syntax errors")
return code.strip()from typing import Dict, Any, List
import ast
import subprocess
class CodeGenerationAgent:
"""专门用于代码生成任务的Agent。"""
def generate_code(self, specification: str) -> str:
"""根据描述生成代码。"""
prompt = f"""为以下描述生成Python代码:{specification}
要求:
- 包含适当的错误处理
- 添加文档字符串
- 遵循PEP 8风格指南
代码:
python
code = self.model.generate(prompt)
return self._extract_code(code)
def test_code(self, code: str, test_cases: List[Dict[str, Any]]) -> float:
"""根据测试用例测试生成的代码。"""
try:
# 创建临时模块
namespace = {}
exec(code, namespace)
passed = 0
for test in test_cases:
func_name = test['function']
inputs = test['inputs']
expected = test['expected']
func = namespace[func_name]
result = func(*inputs)
if result == expected:
passed += 1
return passed / len(test_cases)
except Exception as e:
print(f"测试错误: {e}")
return 0.0
def _extract_code(self, response: str) -> str:
"""从模型响应中提取代码。"""
if '```python' in response:
code = response.split('```python')[1].split('```')[0]
else:
code = response
# 验证语法
try:
ast.parse(code)
except SyntaxError:
raise ValueError("生成的代码存在语法错误")
return code.strip()Math Reasoning Domain
数学推理领域
python
undefinedpython
undefineddomains/math/agent.py
domains/math/agent.py
import re
from typing import Optional
class MathReasoningAgent:
"""Agent for mathematical reasoning tasks."""
def solve_math_problem(self, problem: str) -> Dict[str, Any]:
"""Solve a math problem with step-by-step reasoning."""
prompt = f"""Solve the following math problem step by step:Problem: {problem}
Show your work clearly. Format your final answer as: ANSWER: <value>
Solution:"""
response = self.model.generate(prompt)
return {
'reasoning': response,
'answer': self._extract_answer(response)
}
def _extract_answer(self, response: str) -> Optional[str]:
"""Extract final answer from reasoning."""
# Look for ANSWER: pattern
match = re.search(r'ANSWER:\s*([^\n]+)', response, re.IGNORECASE)
if match:
return match.group(1).strip()
# Look for boxed answer (LaTeX)
match = re.search(r'\\boxed\{([^}]+)\}', response)
if match:
return match.group(1).strip()
# Try to find number at end
numbers = re.findall(r'-?\d+\.?\d*', response)
if numbers:
return numbers[-1]
return None
def evaluate_answer(
self,
predicted: str,
ground_truth: str,
tolerance: float = 1e-5
) -> bool:
"""Evaluate if answer is correct."""
try:
pred_val = float(predicted)
true_val = float(ground_truth)
return abs(pred_val - true_val) < tolerance
except (ValueError, TypeError):
# Fallback to string comparison
return predicted.strip() == ground_truth.strip()undefinedimport re
from typing import Optional
class MathReasoningAgent:
"""用于数学推理任务的Agent。"""
def solve_math_problem(self, problem: str) -> Dict[str, Any]:
"""逐步解决数学问题。"""
prompt = f"""逐步解决以下数学问题:问题: {problem}
清晰展示解题步骤。最终答案格式为: ANSWER: <数值>
解决方案:"""
response = self.model.generate(prompt)
return {
'reasoning': response,
'answer': self._extract_answer(response)
}
def _extract_answer(self, response: str) -> Optional[str]:
"""从推理过程中提取最终答案。"""
# 查找ANSWER:模式
match = re.search(r'ANSWER:\s*([^\n]+)', response, re.IGNORECASE)
if match:
return match.group(1).strip()
# 查找boxed格式的答案(LaTeX)
match = re.search(r'\\boxed\{([^}]+)\}', response)
if match:
return match.group(1).strip()
# 尝试查找末尾的数字
numbers = re.findall(r'-?\d+\.?\d*', response)
if numbers:
return numbers[-1]
return None
def evaluate_answer(
self,
predicted: str,
ground_truth: str,
tolerance: float = 1e-5
) -> bool:
"""评估答案是否正确。"""
try:
pred_val = float(predicted)
true_val = float(ground_truth)
return abs(pred_val - true_val) < tolerance
except (ValueError, TypeError):
# 退回到字符串比较
return predicted.strip() == ground_truth.strip()undefinedConfiguration Patterns
配置模式
Agent Configuration
Agent配置
python
undefinedpython
undefinedconfig.py - Common configuration patterns
config.py - 通用配置模式
from dataclasses import dataclass
from typing import Optional
@dataclass
class AgentConfig:
"""Configuration for task agents."""
model_name: str = 'gpt-4'
temperature: float = 0.7
max_tokens: int = 2048
top_p: float = 1.0
frequency_penalty: float = 0.0
presence_penalty: float = 0.0
timeout: int = 60
max_retries: int = 3
@dataclass
class MetaAgentConfig:
"""Configuration for meta-agents."""
model_name: str = 'gpt-4'
meta_temperature: float = 0.8
meta_max_tokens: int = 4096
improvement_iterations: int = 5
min_improvement_threshold: float = 0.05
use_reflection: bool = True
@dataclass
class ExperimentConfig:
"""Configuration for experiments."""
domain: str
num_iterations: int = 10
num_eval_samples: int = 100
output_dir: str = './outputs'
save_checkpoints: bool = True
checkpoint_interval: int = 1
seed: Optional[int] = None
from dataclasses import dataclass
from typing import Optional
@dataclass
class AgentConfig:
"""任务Agent的配置。"""
model_name: str = 'gpt-4'
temperature: float = 0.7
max_tokens: int = 2048
top_p: float = 1.0
frequency_penalty: float = 0.0
presence_penalty: float = 0.0
timeout: int = 60
max_retries: int = 3
@dataclass
class MetaAgentConfig:
"""元Agent的配置。"""
model_name: str = 'gpt-4'
meta_temperature: float = 0.8
meta_max_tokens: int = 4096
improvement_iterations: int = 5
min_improvement_threshold: float = 0.05
use_reflection: bool = True
@dataclass
class ExperimentConfig:
"""实验配置。"""
domain: str
num_iterations: int = 10
num_eval_samples: int = 100
output_dir: str = './outputs'
save_checkpoints: bool = True
checkpoint_interval: int = 1
seed: Optional[int] = None
Usage
使用示例
agent_config = AgentConfig(
model_name='gpt-4-turbo',
temperature=0.5,
max_tokens=4096
)
meta_config = MetaAgentConfig(
improvement_iterations=10,
min_improvement_threshold=0.02
)
undefinedagent_config = AgentConfig(
model_name='gpt-4-turbo',
temperature=0.5,
max_tokens=4096
)
meta_config = MetaAgentConfig(
improvement_iterations=10,
min_improvement_threshold=0.02
)
undefinedLoading Models
模型加载
python
undefinedpython
undefinedagent/base_agent.py - Model initialization
agent/base_agent.py - 模型初始化
from typing import Dict, Any
import os
from dotenv import load_dotenv
class BaseAgent:
"""Base class for all agents."""
def __init__(self, config: Dict[str, Any]):
load_dotenv()
self.config = config
self.model = self._initialize_model()
def _initialize_model(self):
"""Initialize the foundation model."""
model_name = self.config.get('model_name', 'gpt-4')
if 'gpt' in model_name.lower():
from openai import OpenAI
api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)
return OpenAIModel(client, model_name)
elif 'claude' in model_name.lower():
from anthropic import Anthropic
api_key = os.getenv('ANTHROPIC_API_KEY')
client = Anthropic(api_key=api_key)
return AnthropicModel(client, model_name)
elif 'gemini' in model_name.lower():
import google.generativeai as genai
api_key = os.getenv('GEMINI_API_KEY')
genai.configure(api_key=api_key)
return GeminiModel(model_name)
else:
raise ValueError(f"Unsupported model: {model_name}")class OpenAIModel:
"""Wrapper for OpenAI models."""
def __init__(self, client, model_name: str):
self.client = client
self.model_name = model_name
def generate(
self,
prompt: str,
temperature: float = 0.7,
max_tokens: int = 2048,
**kwargs
) -> str:
"""Generate response from OpenAI model."""
response = self.client.chat.completions.create(
model=self.model_name,
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
max_tokens=max_tokens,
**kwargs
)
return response.choices[0].message.contentundefinedfrom typing import Dict, Any
import os
from dotenv import load_dotenv
class BaseAgent:
"""所有Agent的基类。"""
def __init__(self, config: Dict[str, Any]):
load_dotenv()
self.config = config
self.model = self._initialize_model()
def _initialize_model(self):
"""初始化基础模型。"""
model_name = self.config.get('model_name', 'gpt-4')
if 'gpt' in model_name.lower():
from openai import OpenAI
api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI(api_key=api_key)
return OpenAIModel(client, model_name)
elif 'claude' in model_name.lower():
from anthropic import Anthropic
api_key = os.getenv('ANTHROPIC_API_KEY')
client = Anthropic(api_key=api_key)
return AnthropicModel(client, model_name)
elif 'gemini' in model_name.lower():
import google.generativeai as genai
api_key = os.getenv('GEMINI_API_KEY')
genai.configure(api_key=api_key)
return GeminiModel(model_name)
else:
raise ValueError(f"不支持的模型: {model_name}")class OpenAIModel:
"""OpenAI模型的包装类。"""
def __init__(self, client, model_name: str):
self.client = client
self.model_name = model_name
def generate(
self,
prompt: str,
temperature: float = 0.7,
max_tokens: int = 2048,
**kwargs
) -> str:
"""从OpenAI模型生成响应。"""
response = self.client.chat.completions.create(
model=self.model_name,
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
max_tokens=max_tokens,
**kwargs
)
return response.choices[0].message.contentundefinedAdvanced Patterns
高级模式
Batched Evaluation
批量评估
python
undefinedpython
undefinedutils/evaluation.py
utils/evaluation.py
from typing import List, Dict, Any
from concurrent.futures import ThreadPoolExecutor, as_completed
import numpy as np
class BatchEvaluator:
"""Efficiently evaluate agents on multiple tasks."""
def __init__(self, max_workers: int = 10):
self.max_workers = max_workers
def evaluate_batch(
self,
agent,
tasks: List[str],
ground_truths: List[Any]
) -> Dict[str, Any]:
"""Evaluate agent on batch of tasks in parallel."""
results = []
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
futures = {
executor.submit(self._evaluate_single, agent, task, truth): idx
for idx, (task, truth) in enumerate(zip(tasks, ground_truths))
}
for future in as_completed(futures):
idx = futures[future]
try:
result = future.result()
results.append((idx, result))
except Exception as e:
print(f"Task {idx} failed: {e}")
results.append((idx, {'score': 0.0, 'error': str(e)}))
# Sort by original order
results.sort(key=lambda x: x[0])
results = [r[1] for r in results]
return {
'results': results,
'mean_score': np.mean([r['score'] for r in results]),
'std_score': np.std([r['score'] for r in results]),
'success_rate': sum(1 for r in results if r['score'] > 0.5) / len(results)
}
def _evaluate_single(self, agent, task: str, ground_truth: Any) -> Dict[str, Any]:
"""Evaluate single task."""
solution = agent.solve_task(task)
score = agent.evaluate(task, solution, ground_truth)
return {
'task': task,
'solution': solution,
'score': score,
'correct': score > 0.5
}undefinedfrom typing import List, Dict, Any
from concurrent.futures import ThreadPoolExecutor, as_completed
import numpy as np
class BatchEvaluator:
"""高效评估Agent在多个任务上的表现。"""
def __init__(self, max_workers: int = 10):
self.max_workers = max_workers
def evaluate_batch(
self,
agent,
tasks: List[str],
ground_truths: List[Any]
) -> Dict[str, Any]:
"""并行评估Agent在批量任务上的表现。"""
results = []
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
futures = {
executor.submit(self._evaluate_single, agent, task, truth): idx
for idx, (task, truth) in enumerate(zip(tasks, ground_truths))
}
for future in as_completed(futures):
idx = futures[future]
try:
result = future.result()
results.append((idx, result))
except Exception as e:
print(f"任务 {idx} 失败: {e}")
results.append((idx, {'score': 0.0, 'error': str(e)}))
# 按原始顺序排序
results.sort(key=lambda x: x[0])
results = [r[1] for r in results]
return {
'results': results,
'mean_score': np.mean([r['score'] for r in results]),
'std_score': np.std([r['score'] for r in results]),
'success_rate': sum(1 for r in results if r['score'] > 0.5) / len(results)
}
def _evaluate_single(self, agent, task: str, ground_truth: Any) -> Dict[str, Any]:
"""评估单个任务。"""
solution = agent.solve_task(task)
score = agent.evaluate(task, solution, ground_truth)
return {
'task': task,
'solution': solution,
'score': score,
'correct': score > 0.5
}undefinedCheckpointing
检查点管理
python
undefinedpython
undefinedutils/checkpointing.py
utils/checkpointing.py
import json
import pickle
from pathlib import Path
from typing import Any, Dict
class CheckpointManager:
"""Manage experiment checkpoints."""
def __init__(self, output_dir: str):
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
def save_checkpoint(
self,
iteration: int,
agent_code: str,
performance_data: Dict[str, Any],
meta_data: Dict[str, Any]
):
"""Save checkpoint for an iteration."""
checkpoint_dir = self.output_dir / f'iteration_{iteration}'
checkpoint_dir.mkdir(exist_ok=True)
# Save agent code
with open(checkpoint_dir / 'agent.py', 'w') as f:
f.write(agent_code)
# Save performance data
with open(checkpoint_dir / 'performance.json', 'w') as f:
json.dump(performance_data, f, indent=2)
# Save metadata
with open(checkpoint_dir / 'metadata.pkl', 'wb') as f:
pickle.dump(meta_data, f)
print(f"Checkpoint saved to {checkpoint_dir}")
def load_checkpoint(self, iteration: int) -> Dict[str, Any]:
"""Load checkpoint from an iteration."""
checkpoint_dir = self.output_dir / f'iteration_{iteration}'
with open(checkpoint_dir / 'agent.py', 'r') as f:
agent_code = f.read()
with open(checkpoint_dir / 'performance.json', 'r') as f:
performance_data = json.load(f)
with open(checkpoint_dir / 'metadata.pkl', 'rb') as f:
meta_data = pickle.load(f)
return {
'agent_code': agent_code,
'performance_data': performance_data,
'meta_data': meta_data
}
def list_checkpoints(self) -> List[int]:
"""List available checkpoints."""
iterations = []
for path in self.output_dir.glob('iteration_*'):
if path.is_dir():
iteration = int(path.name.split('_')[1])
iterations.append(iteration)
return sorted(iterations)undefinedimport json
import pickle
from pathlib import Path
from typing import Any, Dict
class CheckpointManager:
"""管理实验检查点。"""
def __init__(self, output_dir: str):
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
def save_checkpoint(
self,
iteration: int,
agent_code: str,
performance_data: Dict[str, Any],
meta_data: Dict[str, Any]
):
"""为迭代保存检查点。"""
checkpoint_dir = self.output_dir / f'iteration_{iteration}'
checkpoint_dir.mkdir(exist_ok=True)
# 保存Agent代码
with open(checkpoint_dir / 'agent.py', 'w') as f:
f.write(agent_code)
# 保存性能数据
with open(checkpoint_dir / 'performance.json', 'w') as f:
json.dump(performance_data, f, indent=2)
# 保存元数据
with open(checkpoint_dir / 'metadata.pkl', 'wb') as f:
pickle.dump(meta_data, f)
print(f"检查点已保存到 {checkpoint_dir}")
def load_checkpoint(self, iteration: int) -> Dict[str, Any]:
"""加载指定迭代的检查点。"""
checkpoint_dir = self.output_dir / f'iteration_{iteration}'
with open(checkpoint_dir / 'agent.py', 'r') as f:
agent_code = f.read()
with open(checkpoint_dir / 'performance.json', 'r') as f:
performance_data = json.load(f)
with open(checkpoint_dir / 'metadata.pkl', 'rb') as f:
meta_data = pickle.load(f)
return {
'agent_code': agent_code,
'performance_data': performance_data,
'meta_data': meta_data
}
def list_checkpoints(self) -> List[int]:
"""列出可用的检查点。"""
iterations = []
for path in self.output_dir.glob('iteration_*'):
if path.is_dir():
iteration = int(path.name.split('_')[1])
iterations.append(iteration)
return sorted(iterations)undefinedTroubleshooting
故障排除
Common Issues
常见问题
1. API Key Errors
python
undefined1. API密钥错误
python
undefinedVerify environment variables
验证环境变量
import os
from dotenv import load_dotenv
load_dotenv()
required_keys = ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GEMINI_API_KEY']
for key in required_keys:
value = os.getenv(key)
if value:
print(f"{key}: {'*' * 20} (set)")
else:
print(f"{key}: NOT SET")
**2. Docker Execution Errors**
```bashimport os
from dotenv import load_dotenv
load_dotenv()
required_keys = ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GEMINI_API_KEY']
for key in required_keys:
value = os.getenv(key)
if value:
print(f"{key}: {'*' * 20} (已设置)")
else:
print(f"{key}: 未设置")
**2. Docker执行错误**
```bashVerify Docker is running
验证Docker是否运行
docker ps
docker ps
Rebuild container if needed
必要时重新构建容器
docker build --no-cache --network=host -t hyperagents .
docker build --no-cache --network=host -t hyperagents .
Check container logs
查看容器日志
docker logs <container_id>
**3. Code Generation Syntax Errors**
```pythondocker logs <container_id>
**3. 代码生成语法错误**
```pythonAdd validation wrapper
添加验证包装
import ast
def validate_generated_code(code: str) -> bool:
"""Validate Python syntax before execution."""
try:
ast.parse(code)
return True
except SyntaxError as e:
print(f"Syntax error at line {e.lineno}: {e.msg}")
print(f"Text: {e.text}")
return False
import ast
def validate_generated_code(code: str) -> bool:
"""执行前验证Python语法。"""
try:
ast.parse(code)
return True
except SyntaxError as e:
print(f"第 {e.lineno} 行语法错误: {e.msg}")
print(f"代码文本: {e.text}")
return False
Use in meta-agent
在元Agent中使用
if validate_generated_code(improved_code):
current_code = improved_code
else:
print("Generated code has errors, keeping current version")
**4. Performance Degradation**
```pythonif validate_generated_code(improved_code):
current_code = improved_code
else:
print("生成的代码存在错误,保留当前版本")
**4. 性能下降**
```pythonTrack performance over iterations
跟踪迭代过程中的性能
def monitor_performance(history: List[Dict[str, float]]):
"""Monitor for performance degradation."""
if len(history) < 3:
return
recent_scores = [h['score'] for h in history[-3:]]
if all(recent_scores[i] < recent_scores[i-1] for i in range(1, len(recent_scores))):
print("WARNING: Performance degrading for 3 consecutive iterations")
print("Consider:")
print(" - Reducing temperature")
print(" - Changing meta-agent prompt")
print(" - Rolling back to earlier checkpoint")
**5. Memory Issues with Large Contexts**
```pythondef monitor_performance(history: List[Dict[str, float]]):
"""监控性能是否下降。"""
if len(history) < 3:
return
recent_scores = [h['score'] for h in history[-3:]]
if all(recent_scores[i] < recent_scores[i-1] for i in range(1, len(recent_scores))):
print("警告: 连续3次迭代性能下降")
print("建议:")
print(" - 降低采样温度")
print(" - 修改元Agent提示词")
print(" - 回滚到之前的检查点")
**5. 大上下文内存问题**
```pythonImplement context truncation
实现上下文截断
def truncate_context(
context: str,
max_tokens: int = 8000,
tokenizer=None
) -> str:
"""Truncate context to fit within token limit."""
if tokenizer is None:
# Rough approximation: 4 chars per token
max_chars = max_tokens * 4
if len(context) > max_chars:
return context[:max_chars] + "\n... (truncated)"
else:
tokens = tokenizer.encode(context)
if len(tokens) > max_tokens:
truncated = tokenizer.decode(tokens[:max_tokens])
return truncated + "\n... (truncated)"
return contextundefineddef truncate_context(
context: str,
max_tokens: int = 8000,
tokenizer=None
) -> str:
"""截断上下文以适应令牌限制。"""
if tokenizer is None:
# 粗略估算: 每个令牌对应4个字符
max_chars = max_tokens * 4
if len(context) > max_chars:
return context[:max_chars] + "\n... (已截断)"
else:
tokens = tokenizer.encode(context)
if len(tokens) > max_tokens:
truncated = tokenizer.decode(tokens[:max_tokens])
return truncated + "\n... (已截断)"
return contextundefinedDebugging Tips
调试技巧
python
undefinedpython
undefinedEnable verbose logging
启用详细日志
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('hyperagents.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger('hyperagents')
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('hyperagents.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger('hyperagents')
Use in code
在代码中使用
logger.debug(f"Generating improvement for iteration {iteration}")
logger.info(f"Current score: {current_score:.3f}")
logger.warning(f"Low performance detected: {score:.3f}")
logger.error(f"Failed to generate valid code: {error}")
undefinedlogger.debug(f"为迭代 {iteration} 生成改进方案")
logger.info(f"当前分数: {current_score:.3f}")
logger.warning(f"检测到性能低下: {score:.3f}")
logger.error(f"生成有效代码失败: {error}")
undefinedSafety Checks
安全检查
python
undefinedpython
undefinedImplement safety checks before executing generated code
执行生成的代码前实现安全检查
import re
def safety_check(code: str) -> Dict[str, bool]:
"""Check for potentially dangerous operations."""
checks = {
'no_file_deletion': 'os.remove' not in code and 'shutil.rmtree' not in code,
'no_system_calls': 'os.system' not in code and 'subprocess.call' not in code,
'no_network': 'requests.' not in code and 'urllib' not in code,
'no_eval': 'eval(' not in code and 'exec(' not in code,
}
all_safe = all(checks.values())
return {
'safe': all_safe,
'checks': checks
}import re
def safety_check(code: str) -> Dict[str, bool]:
"""检查潜在危险操作。"""
checks = {
'no_file_deletion': 'os.remove' not in code and 'shutil.rmtree' not in code,
'no_system_calls': 'os.system' not in code and 'subprocess.call' not in code,
'no_network': 'requests.' not in code and 'urllib' not in code,
'no_eval': 'eval(' not in code and 'exec(' not in code,
}
all_safe = all(checks.values())
return {
'safe': all_safe,
'checks': checks
}Use before execution
执行前使用
safety_result = safety_check(generated_code)
if not safety_result['safe']:
print("WARNING: Potentially unsafe code detected!")
print(f"Failed checks: {[k for k, v in safety_result['checks'].items() if not v]}")
# Decide whether to proceed
undefinedsafety_result = safety_check(generated_code)
if not safety_result['safe']:
print("警告: 检测到潜在不安全代码!")
print(f"未通过检查项: {[k for k, v in safety_result['checks'].items() if not v]}")
# 决定是否继续执行
undefinedBest Practices
最佳实践
- Always use environment variables for API keys, never hardcode
- Checkpoint frequently to avoid losing progress
- Validate generated code before execution
- Monitor performance across iterations to detect degradation
- Use Docker containers for safe code execution
- Implement timeouts for long-running operations
- Log extensively for debugging and analysis
- Test on small batches before full-scale runs
- 始终使用环境变量存储API密钥,切勿硬编码
- 频繁保存检查点避免进度丢失
- 执行前验证生成的代码
- 跨迭代监控性能以检测性能下降
- 使用Docker容器实现安全代码执行
- 为长时间运行的操作设置超时
- 广泛记录日志用于调试和分析
- 全量运行前先测试小批量任务