hyperagents-self-improving-ai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HyperAgents Self-Improving AI Skill

HyperAgents 自我改进AI Skill

Skill by ara.so — AI Agent Skills collection.

由 ara.so 开发的Skill — AI Agent技能集合。

Overview

概述

HyperAgents is a framework for building self-referential self-improving AI agents that can optimize for any computable task. The system uses a meta-agent to iteratively improve a task-agent by generating and evaluating code modifications. The framework supports multiple domains (code generation, reasoning, math, etc.) and uses foundation models to drive the self-improvement loop.

Key Capabilities:

Self-referential meta-learning where agents modify their own code
Multi-domain support (code, math, reasoning tasks)
Iterative improvement through generation-evaluation loops
Integration with OpenAI, Anthropic, and Google Gemini models
Docker-based safe execution environment

HyperAgents是一个用于构建自指式自我改进AI Agent的框架，可针对任意可计算任务进行优化。该系统使用元Agent通过生成和评估代码修改来迭代改进任务Agent。框架支持多个领域（代码生成、推理、数学等），并使用基础模型驱动自我改进循环。

核心功能：

支持Agent修改自身代码的自指式元学习
多领域支持（代码、数学、推理任务等）
通过生成-评估循环实现迭代改进
集成OpenAI、Anthropic和Google Gemini模型
基于Docker的安全执行环境

Installation

安装

Prerequisites

前置依赖

bash

undefined

bash

undefined

Install system dependencies (Fedora/RHEL)

安装系统依赖（Fedora/RHEL）

sudo dnf install -y python3.12-devel graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel

For Ubuntu/Debian:

针对Ubuntu/Debian系统：

sudo apt-get install -y python3.12-dev graphviz libgraphviz-dev cmake ninja-build libbz2-dev zlib1g-dev libncurses-dev libffi-dev

undefined

undefined

Setup

配置步骤

bash

undefined

bash

undefined

Clone the repository

克隆仓库

git clone https://github.com/facebookresearch/HyperAgents.git cd HyperAgents

Create virtual environment

创建虚拟环境

python3.12 -m venv venv_nat source venv_nat/bin/activate

Install dependencies

安装依赖

pip install -r requirements.txt pip install -r requirements_dev.txt

Build Docker container for safe execution

构建用于安全执行的Docker容器

docker build --network=host -t hyperagents .

undefined

docker build --network=host -t hyperagents .

undefined

Environment Configuration

环境配置

Create a

.env

file with your API keys:

bash

undefined

创建包含API密钥的

.env

文件：

bash

undefined

.env file

.env 文件

OPENAI_API_KEY=your_openai_key_here ANTHROPIC_API_KEY=your_anthropic_key_here GEMINI_API_KEY=your_gemini_key_here

undefined

OPENAI_API_KEY=your_openai_key_here ANTHROPIC_API_KEY=your_anthropic_key_here GEMINI_API_KEY=your_gemini_key_here

undefined

Initialize Agents

初始化Agent

bash

undefined

bash

undefined

Setup initial agent implementations

设置初始Agent实现

bash ./setup_initial.sh

undefined

bash ./setup_initial.sh

undefined

Core Concepts

核心概念

Architecture

架构

Task Agent: Solves domain-specific tasks (code generation, math, etc.)
Meta Agent: Observes task agent performance and generates improvements
Generation Loop: Iteratively evolves agents through self-improvement cycles

任务Agent：解决特定领域任务（代码生成、数学等）
元Agent：观察任务Agent的性能并生成改进方案
生成循环：通过自我改进周期迭代进化Agent

File Structure

文件结构

HyperAgents/
├── agent/              # Foundation model interfaces
├── domains/            # Task-specific implementations
├── utils/              # Common utilities
├── meta_agent.py       # Meta-agent implementation
├── task_agent.py       # Task-agent implementation
├── generate_loop.py    # Main entry point
└── run_meta_agent.py   # Meta-agent execution script

HyperAgents/
├── agent/              # 基础模型接口
├── domains/            # 任务特定实现
├── utils/              # 通用工具
├── meta_agent.py       # 元Agent实现
├── task_agent.py       # 任务Agent实现
├── generate_loop.py    # 主入口
└── run_meta_agent.py   # 元Agent执行脚本

Usage

使用方法

Running the Self-Improvement Loop

运行自我改进循环

bash

undefined

bash

undefined

Basic usage with default settings

使用默认设置的基础用法

python generate_loop.py --domains code_generation

Multiple domains

多领域运行

python generate_loop.py --domains math reasoning

Custom configuration

自定义配置

python generate_loop.py
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4

undefined

python generate_loop.py
--domains code_generation
--max_iterations 10
--output_dir ./my_outputs
--model_name gpt-4

undefined

Key Command-Line Arguments

关键命令行参数

python

undefined

python

undefined

Common arguments for generate_loop.py

generate_loop.py的常用参数

--domains # Domain(s) to optimize (code_generation, math, reasoning, etc.) --max_iterations # Maximum improvement iterations --output_dir # Directory for outputs (default: outputs/) --model_name # Foundation model to use --baseline # Baseline agent to compare against --temperature # Sampling temperature for generation --num_samples # Number of samples per iteration

undefined

--domains # 要优化的领域（code_generation、math、reasoning等） --max_iterations # 最大改进迭代次数 --output_dir # 输出目录（默认：outputs/） --model_name # 使用的基础模型 --baseline # 用于对比的基准Agent --temperature # 生成时的采样温度 --num_samples # 每次迭代的样本数量

undefined

Working with Task Agents

任务Agent使用指南

Creating a Custom Task Agent

创建自定义任务Agent

python

undefined

python

undefined

task_agent.py - Basic structure

task_agent.py - 基础结构

from typing import Any, Dict, List from agent.base_agent import BaseAgent

class MyTaskAgent(BaseAgent): """Custom task agent for specific domain."""

def __init__(self, config: Dict[str, Any]):
    super().__init__(config)
    self.domain = config.get('domain', 'custom')
    
def solve_task(self, task_input: str) -> str:
    """
    Main method to solve a task.
    
    Args:
        task_input: Input task specification
        
    Returns:
        Solution to the task
    """
    # Generate prompt for the model
    prompt = self._create_prompt(task_input)
    
    # Get model response
    response = self.model.generate(
        prompt=prompt,
        temperature=self.config.get('temperature', 0.7),
        max_tokens=self.config.get('max_tokens', 2048)
    )
    
    # Post-process response
    solution = self._parse_solution(response)
    return solution

def _create_prompt(self, task_input: str) -> str:
    """Create prompt for the model."""
    return f"""Solve the following task:

Task: {task_input}

Solution:"""

def _parse_solution(self, response: str) -> str:
    """Extract solution from model response."""
    # Custom parsing logic
    return response.strip()

def evaluate(self, task_input: str, solution: str) -> float:
    """
    Evaluate solution quality.
    
    Returns:
        Score between 0 and 1
    """
    # Domain-specific evaluation
    return self._compute_score(task_input, solution)

undefined

from typing import Any, Dict, List from agent.base_agent import BaseAgent

class MyTaskAgent(BaseAgent): """针对特定领域的自定义任务Agent。"""

def __init__(self, config: Dict[str, Any]):
    super().__init__(config)
    self.domain = config.get('domain', 'custom')
    
def solve_task(self, task_input: str) -> str:
    """
    解决任务的核心方法。
    
    参数:
        task_input: 输入任务描述
        
    返回:
        任务解决方案
    """
    # 为模型生成提示词
    prompt = self._create_prompt(task_input)
    
    # 获取模型响应
    response = self.model.generate(
        prompt=prompt,
        temperature=self.config.get('temperature', 0.7),
        max_tokens=self.config.get('max_tokens', 2048)
    )
    
    # 后处理响应
    solution = self._parse_solution(response)
    return solution

def _create_prompt(self, task_input: str) -> str:
    """为模型创建提示词。"""
    return f"""解决以下任务:

任务: {task_input}

解决方案:"""

def _parse_solution(self, response: str) -> str:
    """从模型响应中提取解决方案。"""
    # 自定义解析逻辑
    return response.strip()

def evaluate(self, task_input: str, solution: str) -> float:
    """
    评估解决方案质量。
    
    返回:
        0到1之间的分数
    """
    # 领域特定的评估逻辑
    return self._compute_score(task_input, solution)

undefined

Using the Task Agent

使用任务Agent

python

from task_agent import MyTaskAgent

python

from task_agent import MyTaskAgent

Initialize agent

初始化Agent

config = { 'domain': 'custom', 'model_name': 'gpt-4', 'temperature': 0.7, 'max_tokens': 2048 }

agent = MyTaskAgent(config)

config = { 'domain': 'custom', 'model_name': 'gpt-4', 'temperature': 0.7, 'max_tokens': 2048 }

agent = MyTaskAgent(config)

Solve a task

解决任务

task = "Write a function to compute Fibonacci numbers" solution = agent.solve_task(task) score = agent.evaluate(task, solution)

print(f"Solution: {solution}") print(f"Score: {score}")

undefined

task = "编写一个计算斐波那契数列的函数" solution = agent.solve_task(task) score = agent.evaluate(task, solution)

print(f"解决方案: {solution}") print(f"分数: {score}")

undefined

Working with Meta Agents

元Agent使用指南

Meta Agent Structure

元Agent结构

python

undefined

python

undefined

meta_agent.py - Core implementation

meta_agent.py - 核心实现

from typing import Dict, List, Any import difflib

class MetaAgent: """Meta-agent that improves task agents."""

def __init__(self, config: Dict[str, Any]):
    self.config = config
    self.model = self._initialize_model()
    self.history = []
    
def generate_improvement(
    self,
    current_code: str,
    performance_data: List[Dict[str, Any]]
) -> str:
    """
    Generate improved version of task agent.
    
    Args:
        current_code: Current task agent implementation
        performance_data: Performance metrics from recent runs
        
    Returns:
        Improved code implementation
    """
    # Analyze performance
    insights = self._analyze_performance(performance_data)
    
    # Generate improvement prompt
    prompt = self._create_meta_prompt(current_code, insights)
    
    # Generate new code
    improved_code = self.model.generate(
        prompt=prompt,
        temperature=self.config.get('meta_temperature', 0.8),
        max_tokens=self.config.get('meta_max_tokens', 4096)
    )
    
    # Validate and extract code
    validated_code = self._validate_code(improved_code)
    
    # Store in history
    self.history.append({
        'original': current_code,
        'improved': validated_code,
        'insights': insights
    })
    
    return validated_code

def _analyze_performance(
    self,
    performance_data: List[Dict[str, Any]]
) -> Dict[str, Any]:
    """Analyze performance metrics to identify improvement areas."""
    # Compute statistics
    scores = [d['score'] for d in performance_data]
    avg_score = sum(scores) / len(scores)
    
    # Identify failure patterns
    failures = [d for d in performance_data if d['score'] < 0.5]
    
    return {
        'average_score': avg_score,
        'num_failures': len(failures),
        'failure_patterns': self._extract_patterns(failures)
    }

def _create_meta_prompt(
    self,
    current_code: str,
    insights: Dict[str, Any]
) -> str:
    """Create prompt for meta-level improvement."""
    return f"""You are a meta-agent tasked with improving an AI task agent.

Current Implementation:

python

{current_code}

Performance Analysis:

Average Score: {insights['average_score']:.2f}
Failures: {insights['num_failures']}
Common Issues: {insights.get('failure_patterns', 'None identified')}

Generate an improved version that addresses these issues. Output only the complete improved code.

Improved Implementation:

python

    
    def _validate_code(self, code: str) -> str:
        """Validate and extract code from response."""
        # Extract code block
        if '```python' in code:
            code = code.split('```python')[1].split('```')[0]
        
        # Basic syntax validation
        try:
            compile(code, '<string>', 'exec')
        except SyntaxError as e:
            raise ValueError(f"Generated code has syntax error: {e}")
        
        return code.strip()
    
    def compute_diff(self, old_code: str, new_code: str) -> List[str]:
        """Compute diff between code versions."""
        diff = difflib.unified_diff(
            old_code.splitlines(keepends=True),
            new_code.splitlines(keepends=True),
            fromfile='old_agent.py',
            tofile='new_agent.py'
        )
        return list(diff)

from typing import Dict, List, Any import difflib

class MetaAgent: """用于改进任务Agent的元Agent。"""

def __init__(self, config: Dict[str, Any]):
    self.config = config
    self.model = self._initialize_model()
    self.history = []
    
def generate_improvement(
    self,
    current_code: str,
    performance_data: List[Dict[str, Any]]
) -> str:
    """
    生成任务Agent的改进版本。
    
    参数:
        current_code: 当前任务Agent的实现代码
        performance_data: 最近运行的性能指标
        
    返回:
        改进后的代码实现
    """
    # 分析性能
    insights = self._analyze_performance(performance_data)
    
    # 生成改进提示词
    prompt = self._create_meta_prompt(current_code, insights)
    
    # 生成新代码
    improved_code = self.model.generate(
        prompt=prompt,
        temperature=self.config.get('meta_temperature', 0.8),
        max_tokens=self.config.get('meta_max_tokens', 4096)
    )
    
    # 验证并提取代码
    validated_code = self._validate_code(improved_code)
    
    # 存入历史记录
    self.history.append({
        'original': current_code,
        'improved': validated_code,
        'insights': insights
    })
    
    return validated_code

def _analyze_performance(
    self,
    performance_data: List[Dict[str, Any]]
) -> Dict[str, Any]:
    """分析性能指标以确定改进方向。"""
    # 计算统计数据
    scores = [d['score'] for d in performance_data]
    avg_score = sum(scores) / len(scores)
    
    # 识别失败模式
    failures = [d for d in performance_data if d['score'] < 0.5]
    
    return {
        'average_score': avg_score,
        'num_failures': len(failures),
        'failure_patterns': self._extract_patterns(failures)
    }

def _create_meta_prompt(
    self,
    current_code: str,
    insights: Dict[str, Any]
) -> str:
    """创建元级改进的提示词。"""
    return f"""你是一个负责改进AI任务Agent的元Agent。

当前实现:

python

{current_code}

性能分析:

平均分数: {insights['average_score']:.2f}
失败次数: {insights['num_failures']}
常见问题: {insights.get('failure_patterns', '未识别到')}

生成一个解决这些问题的改进版本。仅输出完整的改进代码。

改进后的实现:

python

    
    def _validate_code(self, code: str) -> str:
        """验证并从响应中提取代码。"""
        # 提取代码块
        if '```python' in code:
            code = code.split('```python')[1].split('```')[0]
        
        # 基础语法验证
        try:
            compile(code, '<string>', 'exec')
        except SyntaxError as e:
            raise ValueError(f"生成的代码存在语法错误: {e}")
        
        return code.strip()
    
    def compute_diff(self, old_code: str, new_code: str) -> List[str]:
        """计算代码版本之间的差异。"""
        diff = difflib.unified_diff(
            old_code.splitlines(keepends=True),
            new_code.splitlines(keepends=True),
            fromfile='old_agent.py',
            tofile='new_agent.py'
        )
        return list(diff)

Running Meta Agent

运行元Agent

python

undefined

python

undefined

run_meta_agent.py - Example usage

run_meta_agent.py - 使用示例

from meta_agent import MetaAgent from task_agent import MyTaskAgent import json

def run_meta_improvement_cycle( initial_agent_code: str, test_tasks: List[str], num_iterations: int = 5 ): """Run multiple iterations of meta-improvement."""

# Initialize meta-agent
meta_config = {
    'model_name': 'gpt-4',
    'meta_temperature': 0.8,
    'meta_max_tokens': 4096
}
meta_agent = MetaAgent(meta_config)

current_code = initial_agent_code

for iteration in range(num_iterations):
    print(f"\n=== Iteration {iteration + 1} ===")
    
    # Evaluate current agent
    performance_data = evaluate_agent(current_code, test_tasks)
    
    avg_score = sum(d['score'] for d in performance_data) / len(performance_data)
    print(f"Current Performance: {avg_score:.3f}")
    
    # Generate improvement
    improved_code = meta_agent.generate_improvement(
        current_code,
        performance_data
    )
    
    # Show diff
    diff = meta_agent.compute_diff(current_code, improved_code)
    print("Changes:")
    print(''.join(diff[:20]))  # Show first 20 lines
    
    # Update current code
    current_code = improved_code
    
    # Save checkpoint
    with open(f'agent_iteration_{iteration}.py', 'w') as f:
        f.write(current_code)

return current_code

def evaluate_agent(agent_code: str, test_tasks: List[str]) -> List[Dict[str, Any]]: """Evaluate agent on test tasks.""" # Create agent from code namespace = {} exec(agent_code, namespace) AgentClass = namespace['MyTaskAgent']

agent = AgentClass({'model_name': 'gpt-4'})

results = []
for task in test_tasks:
    solution = agent.solve_task(task)
    score = agent.evaluate(task, solution)
    results.append({
        'task': task,
        'solution': solution,
        'score': score
    })

return results

from meta_agent import MetaAgent from task_agent import MyTaskAgent import json

def run_meta_improvement_cycle( initial_agent_code: str, test_tasks: List[str], num_iterations: int = 5 ): """运行多轮元改进循环。"""

# 初始化元Agent
meta_config = {
    'model_name': 'gpt-4',
    'meta_temperature': 0.8,
    'meta_max_tokens': 4096
}
meta_agent = MetaAgent(meta_config)

current_code = initial_agent_code

for iteration in range(num_iterations):
    print(f"\n=== 迭代 {iteration + 1} ===")
    
    # 评估当前Agent
    performance_data = evaluate_agent(current_code, test_tasks)
    
    avg_score = sum(d['score'] for d in performance_data) / len(performance_data)
    print(f"当前性能: {avg_score:.3f}")
    
    # 生成改进方案
    improved_code = meta_agent.generate_improvement(
        current_code,
        performance_data
    )
    
    # 显示差异
    diff = meta_agent.compute_diff(current_code, improved_code)
    print("变更:")
    print(''.join(diff[:20]))  # 显示前20行
    
    # 更新当前代码
    current_code = improved_code
    
    # 保存检查点
    with open(f'agent_iteration_{iteration}.py', 'w') as f:
        f.write(current_code)

return current_code

def evaluate_agent(agent_code: str, test_tasks: List[str]) -> List[Dict[str, Any]]: """在测试任务上评估Agent。""" # 从代码创建Agent namespace = {} exec(agent_code, namespace) AgentClass = namespace['MyTaskAgent']

agent = AgentClass({'model_name': 'gpt-4'})

results = []
for task in test_tasks:
    solution = agent.solve_task(task)
    score = agent.evaluate(task, solution)
    results.append({
        'task': task,
        'solution': solution,
        'score': score
    })

return results

Usage

使用示例

if name == 'main': # Read initial agent code with open('initial_agent.py', 'r') as f: initial_code = f.read()

# Define test tasks
test_tasks = [
    "Implement binary search",
    "Write a function to reverse a linked list",
    "Create a trie data structure"
]

# Run improvement loop
final_code = run_meta_improvement_cycle(
    initial_code,
    test_tasks,
    num_iterations=5
)

print("\nFinal agent saved!")

undefined

if name == 'main': # 读取初始Agent代码 with open('initial_agent.py', 'r') as f: initial_code = f.read()

# 定义测试任务
test_tasks = [
    "实现二分查找",
    "编写一个反转链表的函数",
    "创建前缀树数据结构"
]

# 运行改进循环
final_code = run_meta_improvement_cycle(
    initial_code,
    test_tasks,
    num_iterations=5
)

print("\n最终Agent已保存!")

undefined

Domain-Specific Implementation

特定领域实现

Code Generation Domain

代码生成领域

python

undefined

python

undefined

domains/code_generation/agent.py

from typing import Dict, Any, List import ast import subprocess

class CodeGenerationAgent: """Agent specialized for code generation tasks."""

def generate_code(self, specification: str) -> str:
    """Generate code from specification."""
    prompt = f"""Generate Python code for the following specification:

{specification}

Requirements:

Include proper error handling
Add docstrings
Follow PEP 8 style guide

Code:

python

        
        code = self.model.generate(prompt)
        return self._extract_code(code)
    
    def test_code(self, code: str, test_cases: List[Dict[str, Any]]) -> float:
        """Test generated code against test cases."""
        try:
            # Create temporary module
            namespace = {}
            exec(code, namespace)
            
            passed = 0
            for test in test_cases:
                func_name = test['function']
                inputs = test['inputs']
                expected = test['expected']
                
                func = namespace[func_name]
                result = func(*inputs)
                
                if result == expected:
                    passed += 1
            
            return passed / len(test_cases)
            
        except Exception as e:
            print(f"Test error: {e}")
            return 0.0
    
    def _extract_code(self, response: str) -> str:
        """Extract code from model response."""
        if '```python' in response:
            code = response.split('```python')[1].split('```')[0]
        else:
            code = response
        
        # Validate syntax
        try:
            ast.parse(code)
        except SyntaxError:
            raise ValueError("Generated code has syntax errors")
        
        return code.strip()

from typing import Dict, Any, List import ast import subprocess

class CodeGenerationAgent: """专门用于代码生成任务的Agent。"""

def generate_code(self, specification: str) -> str:
    """根据描述生成代码。"""
    prompt = f"""为以下描述生成Python代码:

{specification}

要求:

包含适当的错误处理
添加文档字符串
遵循PEP 8风格指南

代码:

python

        
        code = self.model.generate(prompt)
        return self._extract_code(code)
    
    def test_code(self, code: str, test_cases: List[Dict[str, Any]]) -> float:
        """根据测试用例测试生成的代码。"""
        try:
            # 创建临时模块
            namespace = {}
            exec(code, namespace)
            
            passed = 0
            for test in test_cases:
                func_name = test['function']
                inputs = test['inputs']
                expected = test['expected']
                
                func = namespace[func_name]
                result = func(*inputs)
                
                if result == expected:
                    passed += 1
            
            return passed / len(test_cases)
            
        except Exception as e:
            print(f"测试错误: {e}")
            return 0.0
    
    def _extract_code(self, response: str) -> str:
        """从模型响应中提取代码。"""
        if '```python' in response:
            code = response.split('```python')[1].split('```')[0]
        else:
            code = response
        
        # 验证语法
        try:
            ast.parse(code)
        except SyntaxError:
            raise ValueError("生成的代码存在语法错误")
        
        return code.strip()

Math Reasoning Domain

数学推理领域

python

undefined

python

undefined

domains/math/agent.py

import re from typing import Optional

class MathReasoningAgent: """Agent for mathematical reasoning tasks."""

def solve_math_problem(self, problem: str) -> Dict[str, Any]:
    """Solve a math problem with step-by-step reasoning."""
    prompt = f"""Solve the following math problem step by step:

Problem: {problem}

Show your work clearly. Format your final answer as: ANSWER: <value>

Solution:"""

    response = self.model.generate(prompt)
    
    return {
        'reasoning': response,
        'answer': self._extract_answer(response)
    }

def _extract_answer(self, response: str) -> Optional[str]:
    """Extract final answer from reasoning."""
    # Look for ANSWER: pattern
    match = re.search(r'ANSWER:\s*([^\n]+)', response, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    
    # Look for boxed answer (LaTeX)
    match = re.search(r'\\boxed\{([^}]+)\}', response)
    if match:
        return match.group(1).strip()
    
    # Try to find number at end
    numbers = re.findall(r'-?\d+\.?\d*', response)
    if numbers:
        return numbers[-1]
    
    return None

def evaluate_answer(
    self,
    predicted: str,
    ground_truth: str,
    tolerance: float = 1e-5
) -> bool:
    """Evaluate if answer is correct."""
    try:
        pred_val = float(predicted)
        true_val = float(ground_truth)
        return abs(pred_val - true_val) < tolerance
    except (ValueError, TypeError):
        # Fallback to string comparison
        return predicted.strip() == ground_truth.strip()

undefined

import re from typing import Optional

class MathReasoningAgent: """用于数学推理任务的Agent。"""

def solve_math_problem(self, problem: str) -> Dict[str, Any]:
    """逐步解决数学问题。"""
    prompt = f"""逐步解决以下数学问题:

问题: {problem}

清晰展示解题步骤。最终答案格式为: ANSWER: <数值>

解决方案:"""

    response = self.model.generate(prompt)
    
    return {
        'reasoning': response,
        'answer': self._extract_answer(response)
    }

def _extract_answer(self, response: str) -> Optional[str]:
    """从推理过程中提取最终答案。"""
    # 查找ANSWER:模式
    match = re.search(r'ANSWER:\s*([^\n]+)', response, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    
    # 查找boxed格式的答案(LaTeX)
    match = re.search(r'\\boxed\{([^}]+)\}', response)
    if match:
        return match.group(1).strip()
    
    # 尝试查找末尾的数字
    numbers = re.findall(r'-?\d+\.?\d*', response)
    if numbers:
        return numbers[-1]
    
    return None

def evaluate_answer(
    self,
    predicted: str,
    ground_truth: str,
    tolerance: float = 1e-5
) -> bool:
    """评估答案是否正确。"""
    try:
        pred_val = float(predicted)
        true_val = float(ground_truth)
        return abs(pred_val - true_val) < tolerance
    except (ValueError, TypeError):
        # 退回到字符串比较
        return predicted.strip() == ground_truth.strip()

undefined

Configuration Patterns

配置模式

Agent Configuration

Agent配置

python

undefined

python

undefined

config.py - Common configuration patterns

config.py - 通用配置模式

from dataclasses import dataclass from typing import Optional

@dataclass class AgentConfig: """Configuration for task agents.""" model_name: str = 'gpt-4' temperature: float = 0.7 max_tokens: int = 2048 top_p: float = 1.0 frequency_penalty: float = 0.0 presence_penalty: float = 0.0 timeout: int = 60 max_retries: int = 3

@dataclass class MetaAgentConfig: """Configuration for meta-agents.""" model_name: str = 'gpt-4' meta_temperature: float = 0.8 meta_max_tokens: int = 4096 improvement_iterations: int = 5 min_improvement_threshold: float = 0.05 use_reflection: bool = True

@dataclass class ExperimentConfig: """Configuration for experiments.""" domain: str num_iterations: int = 10 num_eval_samples: int = 100 output_dir: str = './outputs' save_checkpoints: bool = True checkpoint_interval: int = 1 seed: Optional[int] = None

from dataclasses import dataclass from typing import Optional

@dataclass class AgentConfig: """任务Agent的配置。""" model_name: str = 'gpt-4' temperature: float = 0.7 max_tokens: int = 2048 top_p: float = 1.0 frequency_penalty: float = 0.0 presence_penalty: float = 0.0 timeout: int = 60 max_retries: int = 3

@dataclass class MetaAgentConfig: """元Agent的配置。""" model_name: str = 'gpt-4' meta_temperature: float = 0.8 meta_max_tokens: int = 4096 improvement_iterations: int = 5 min_improvement_threshold: float = 0.05 use_reflection: bool = True

@dataclass class ExperimentConfig: """实验配置。""" domain: str num_iterations: int = 10 num_eval_samples: int = 100 output_dir: str = './outputs' save_checkpoints: bool = True checkpoint_interval: int = 1 seed: Optional[int] = None

Usage

使用示例

agent_config = AgentConfig( model_name='gpt-4-turbo', temperature=0.5, max_tokens=4096 )

meta_config = MetaAgentConfig( improvement_iterations=10, min_improvement_threshold=0.02 )

undefined

agent_config = AgentConfig( model_name='gpt-4-turbo', temperature=0.5, max_tokens=4096 )

meta_config = MetaAgentConfig( improvement_iterations=10, min_improvement_threshold=0.02 )

undefined

Loading Models

模型加载

python

undefined

python

undefined

agent/base_agent.py - Model initialization

agent/base_agent.py - 模型初始化

from typing import Dict, Any import os from dotenv import load_dotenv

class BaseAgent: """Base class for all agents."""

def __init__(self, config: Dict[str, Any]):
    load_dotenv()
    self.config = config
    self.model = self._initialize_model()

def _initialize_model(self):
    """Initialize the foundation model."""
    model_name = self.config.get('model_name', 'gpt-4')
    
    if 'gpt' in model_name.lower():
        from openai import OpenAI
        api_key = os.getenv('OPENAI_API_KEY')
        client = OpenAI(api_key=api_key)
        return OpenAIModel(client, model_name)
        
    elif 'claude' in model_name.lower():
        from anthropic import Anthropic
        api_key = os.getenv('ANTHROPIC_API_KEY')
        client = Anthropic(api_key=api_key)
        return AnthropicModel(client, model_name)
        
    elif 'gemini' in model_name.lower():
        import google.generativeai as genai
        api_key = os.getenv('GEMINI_API_KEY')
        genai.configure(api_key=api_key)
        return GeminiModel(model_name)
    
    else:
        raise ValueError(f"Unsupported model: {model_name}")

class OpenAIModel: """Wrapper for OpenAI models."""

def __init__(self, client, model_name: str):
    self.client = client
    self.model_name = model_name

def generate(
    self,
    prompt: str,
    temperature: float = 0.7,
    max_tokens: int = 2048,
    **kwargs
) -> str:
    """Generate response from OpenAI model."""
    response = self.client.chat.completions.create(
        model=self.model_name,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens,
        **kwargs
    )
    return response.choices[0].message.content

undefined

from typing import Dict, Any import os from dotenv import load_dotenv

class BaseAgent: """所有Agent的基类。"""

def __init__(self, config: Dict[str, Any]):
    load_dotenv()
    self.config = config
    self.model = self._initialize_model()

def _initialize_model(self):
    """初始化基础模型。"""
    model_name = self.config.get('model_name', 'gpt-4')
    
    if 'gpt' in model_name.lower():
        from openai import OpenAI
        api_key = os.getenv('OPENAI_API_KEY')
        client = OpenAI(api_key=api_key)
        return OpenAIModel(client, model_name)
        
    elif 'claude' in model_name.lower():
        from anthropic import Anthropic
        api_key = os.getenv('ANTHROPIC_API_KEY')
        client = Anthropic(api_key=api_key)
        return AnthropicModel(client, model_name)
        
    elif 'gemini' in model_name.lower():
        import google.generativeai as genai
        api_key = os.getenv('GEMINI_API_KEY')
        genai.configure(api_key=api_key)
        return GeminiModel(model_name)
    
    else:
        raise ValueError(f"不支持的模型: {model_name}")

class OpenAIModel: """OpenAI模型的包装类。"""

def __init__(self, client, model_name: str):
    self.client = client
    self.model_name = model_name

def generate(
    self,
    prompt: str,
    temperature: float = 0.7,
    max_tokens: int = 2048,
    **kwargs
) -> str:
    """从OpenAI模型生成响应。"""
    response = self.client.chat.completions.create(
        model=self.model_name,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens,
        **kwargs
    )
    return response.choices[0].message.content

undefined

Advanced Patterns

高级模式

Batched Evaluation

批量评估

python

undefined

python

undefined

utils/evaluation.py

from typing import List, Dict, Any from concurrent.futures import ThreadPoolExecutor, as_completed import numpy as np

class BatchEvaluator: """Efficiently evaluate agents on multiple tasks."""

def __init__(self, max_workers: int = 10):
    self.max_workers = max_workers

def evaluate_batch(
    self,
    agent,
    tasks: List[str],
    ground_truths: List[Any]
) -> Dict[str, Any]:
    """Evaluate agent on batch of tasks in parallel."""
    results = []
    
    with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
        futures = {
            executor.submit(self._evaluate_single, agent, task, truth): idx
            for idx, (task, truth) in enumerate(zip(tasks, ground_truths))
        }
        
        for future in as_completed(futures):
            idx = futures[future]
            try:
                result = future.result()
                results.append((idx, result))
            except Exception as e:
                print(f"Task {idx} failed: {e}")
                results.append((idx, {'score': 0.0, 'error': str(e)}))
    
    # Sort by original order
    results.sort(key=lambda x: x[0])
    results = [r[1] for r in results]
    
    return {
        'results': results,
        'mean_score': np.mean([r['score'] for r in results]),
        'std_score': np.std([r['score'] for r in results]),
        'success_rate': sum(1 for r in results if r['score'] > 0.5) / len(results)
    }

def _evaluate_single(self, agent, task: str, ground_truth: Any) -> Dict[str, Any]:
    """Evaluate single task."""
    solution = agent.solve_task(task)
    score = agent.evaluate(task, solution, ground_truth)
    
    return {
        'task': task,
        'solution': solution,
        'score': score,
        'correct': score > 0.5
    }

undefined

from typing import List, Dict, Any from concurrent.futures import ThreadPoolExecutor, as_completed import numpy as np

class BatchEvaluator: """高效评估Agent在多个任务上的表现。"""

def __init__(self, max_workers: int = 10):
    self.max_workers = max_workers

def evaluate_batch(
    self,
    agent,
    tasks: List[str],
    ground_truths: List[Any]
) -> Dict[str, Any]:
    """并行评估Agent在批量任务上的表现。"""
    results = []
    
    with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
        futures = {
            executor.submit(self._evaluate_single, agent, task, truth): idx
            for idx, (task, truth) in enumerate(zip(tasks, ground_truths))
        }
        
        for future in as_completed(futures):
            idx = futures[future]
            try:
                result = future.result()
                results.append((idx, result))
            except Exception as e:
                print(f"任务 {idx} 失败: {e}")
                results.append((idx, {'score': 0.0, 'error': str(e)}))
    
    # 按原始顺序排序
    results.sort(key=lambda x: x[0])
    results = [r[1] for r in results]
    
    return {
        'results': results,
        'mean_score': np.mean([r['score'] for r in results]),
        'std_score': np.std([r['score'] for r in results]),
        'success_rate': sum(1 for r in results if r['score'] > 0.5) / len(results)
    }

def _evaluate_single(self, agent, task: str, ground_truth: Any) -> Dict[str, Any]:
    """评估单个任务。"""
    solution = agent.solve_task(task)
    score = agent.evaluate(task, solution, ground_truth)
    
    return {
        'task': task,
        'solution': solution,
        'score': score,
        'correct': score > 0.5
    }

undefined

Checkpointing

检查点管理

python

undefined

python

undefined

utils/checkpointing.py

import json import pickle from pathlib import Path from typing import Any, Dict

class CheckpointManager: """Manage experiment checkpoints."""

def __init__(self, output_dir: str):
    self.output_dir = Path(output_dir)
    self.output_dir.mkdir(parents=True, exist_ok=True)

def save_checkpoint(
    self,
    iteration: int,
    agent_code: str,
    performance_data: Dict[str, Any],
    meta_data: Dict[str, Any]
):
    """Save checkpoint for an iteration."""
    checkpoint_dir = self.output_dir / f'iteration_{iteration}'
    checkpoint_dir.mkdir(exist_ok=True)
    
    # Save agent code
    with open(checkpoint_dir / 'agent.py', 'w') as f:
        f.write(agent_code)
    
    # Save performance data
    with open(checkpoint_dir / 'performance.json', 'w') as f:
        json.dump(performance_data, f, indent=2)
    
    # Save metadata
    with open(checkpoint_dir / 'metadata.pkl', 'wb') as f:
        pickle.dump(meta_data, f)
    
    print(f"Checkpoint saved to {checkpoint_dir}")

def load_checkpoint(self, iteration: int) -> Dict[str, Any]:
    """Load checkpoint from an iteration."""
    checkpoint_dir = self.output_dir / f'iteration_{iteration}'
    
    with open(checkpoint_dir / 'agent.py', 'r') as f:
        agent_code = f.read()
    
    with open(checkpoint_dir / 'performance.json', 'r') as f:
        performance_data = json.load(f)
    
    with open(checkpoint_dir / 'metadata.pkl', 'rb') as f:
        meta_data = pickle.load(f)
    
    return {
        'agent_code': agent_code,
        'performance_data': performance_data,
        'meta_data': meta_data
    }

def list_checkpoints(self) -> List[int]:
    """List available checkpoints."""
    iterations = []
    for path in self.output_dir.glob('iteration_*'):
        if path.is_dir():
            iteration = int(path.name.split('_')[1])
            iterations.append(iteration)
    return sorted(iterations)

undefined

import json import pickle from pathlib import Path from typing import Any, Dict

class CheckpointManager: """管理实验检查点。"""

def __init__(self, output_dir: str):
    self.output_dir = Path(output_dir)
    self.output_dir.mkdir(parents=True, exist_ok=True)

def save_checkpoint(
    self,
    iteration: int,
    agent_code: str,
    performance_data: Dict[str, Any],
    meta_data: Dict[str, Any]
):
    """为迭代保存检查点。"""
    checkpoint_dir = self.output_dir / f'iteration_{iteration}'
    checkpoint_dir.mkdir(exist_ok=True)
    
    # 保存Agent代码
    with open(checkpoint_dir / 'agent.py', 'w') as f:
        f.write(agent_code)
    
    # 保存性能数据
    with open(checkpoint_dir / 'performance.json', 'w') as f:
        json.dump(performance_data, f, indent=2)
    
    # 保存元数据
    with open(checkpoint_dir / 'metadata.pkl', 'wb') as f:
        pickle.dump(meta_data, f)
    
    print(f"检查点已保存到 {checkpoint_dir}")

def load_checkpoint(self, iteration: int) -> Dict[str, Any]:
    """加载指定迭代的检查点。"""
    checkpoint_dir = self.output_dir / f'iteration_{iteration}'
    
    with open(checkpoint_dir / 'agent.py', 'r') as f:
        agent_code = f.read()
    
    with open(checkpoint_dir / 'performance.json', 'r') as f:
        performance_data = json.load(f)
    
    with open(checkpoint_dir / 'metadata.pkl', 'rb') as f:
        meta_data = pickle.load(f)
    
    return {
        'agent_code': agent_code,
        'performance_data': performance_data,
        'meta_data': meta_data
    }

def list_checkpoints(self) -> List[int]:
    """列出可用的检查点。"""
    iterations = []
    for path in self.output_dir.glob('iteration_*'):
        if path.is_dir():
            iteration = int(path.name.split('_')[1])
            iterations.append(iteration)
    return sorted(iterations)

undefined

Troubleshooting

故障排除

Common Issues

常见问题

1. API Key Errors

python

undefined

1. API密钥错误

python

undefined

Verify environment variables

验证环境变量

import os from dotenv import load_dotenv

load_dotenv()

required_keys = ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GEMINI_API_KEY'] for key in required_keys: value = os.getenv(key) if value: print(f"{key}: {'*' * 20} (set)") else: print(f"{key}: NOT SET")


**2. Docker Execution Errors**
```bash

import os from dotenv import load_dotenv

load_dotenv()

required_keys = ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GEMINI_API_KEY'] for key in required_keys: value = os.getenv(key) if value: print(f"{key}: {'*' * 20} (已设置)") else: print(f"{key}: 未设置")


**2. Docker执行错误**
```bash

Verify Docker is running

验证Docker是否运行

docker ps

Rebuild container if needed

必要时重新构建容器

docker build --no-cache --network=host -t hyperagents .

Check container logs

查看容器日志

docker logs <container_id>


**3. Code Generation Syntax Errors**
```python

docker logs <container_id>


**3. 代码生成语法错误**
```python

Add validation wrapper

添加验证包装

import ast

def validate_generated_code(code: str) -> bool: """Validate Python syntax before execution.""" try: ast.parse(code) return True except SyntaxError as e: print(f"Syntax error at line {e.lineno}: {e.msg}") print(f"Text: {e.text}") return False

import ast

def validate_generated_code(code: str) -> bool: """执行前验证Python语法。""" try: ast.parse(code) return True except SyntaxError as e: print(f"第 {e.lineno} 行语法错误: {e.msg}") print(f"代码文本: {e.text}") return False

Use in meta-agent

在元Agent中使用

if validate_generated_code(improved_code): current_code = improved_code else: print("Generated code has errors, keeping current version")


**4. Performance Degradation**
```python

if validate_generated_code(improved_code): current_code = improved_code else: print("生成的代码存在错误，保留当前版本")


**4. 性能下降**
```python

Track performance over iterations

跟踪迭代过程中的性能

def monitor_performance(history: List[Dict[str, float]]): """Monitor for performance degradation.""" if len(history) < 3: return

recent_scores = [h['score'] for h in history[-3:]]
if all(recent_scores[i] < recent_scores[i-1] for i in range(1, len(recent_scores))):
    print("WARNING: Performance degrading for 3 consecutive iterations")
    print("Consider:")
    print("  - Reducing temperature")
    print("  - Changing meta-agent prompt")
    print("  - Rolling back to earlier checkpoint")


**5. Memory Issues with Large Contexts**
```python

def monitor_performance(history: List[Dict[str, float]]): """监控性能是否下降。""" if len(history) < 3: return

recent_scores = [h['score'] for h in history[-3:]]
if all(recent_scores[i] < recent_scores[i-1] for i in range(1, len(recent_scores))):
    print("警告: 连续3次迭代性能下降")
    print("建议:")
    print("  - 降低采样温度")
    print("  - 修改元Agent提示词")
    print("  - 回滚到之前的检查点")


**5. 大上下文内存问题**
```python

Implement context truncation

实现上下文截断

def truncate_context( context: str, max_tokens: int = 8000, tokenizer=None ) -> str: """Truncate context to fit within token limit.""" if tokenizer is None: # Rough approximation: 4 chars per token max_chars = max_tokens * 4 if len(context) > max_chars: return context[:max_chars] + "\n... (truncated)" else: tokens = tokenizer.encode(context) if len(tokens) > max_tokens: truncated = tokenizer.decode(tokens[:max_tokens]) return truncated + "\n... (truncated)"

return context

undefined

def truncate_context( context: str, max_tokens: int = 8000, tokenizer=None ) -> str: """截断上下文以适应令牌限制。""" if tokenizer is None: # 粗略估算: 每个令牌对应4个字符 max_chars = max_tokens * 4 if len(context) > max_chars: return context[:max_chars] + "\n... (已截断)" else: tokens = tokenizer.encode(context) if len(tokens) > max_tokens: truncated = tokenizer.decode(tokens[:max_tokens]) return truncated + "\n... (已截断)"

return context

undefined

Debugging Tips

调试技巧

python

undefined

python

undefined

Enable verbose logging

启用详细日志

import logging

logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('hyperagents.log'), logging.StreamHandler() ] )

logger = logging.getLogger('hyperagents')

import logging

logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('hyperagents.log'), logging.StreamHandler() ] )

logger = logging.getLogger('hyperagents')

Use in code

在代码中使用

logger.debug(f"Generating improvement for iteration {iteration}") logger.info(f"Current score: {current_score:.3f}") logger.warning(f"Low performance detected: {score:.3f}") logger.error(f"Failed to generate valid code: {error}")

undefined

logger.debug(f"为迭代 {iteration} 生成改进方案") logger.info(f"当前分数: {current_score:.3f}") logger.warning(f"检测到性能低下: {score:.3f}") logger.error(f"生成有效代码失败: {error}")

undefined

Safety Checks

安全检查

python

undefined

python

undefined

Implement safety checks before executing generated code

执行生成的代码前实现安全检查

import re

def safety_check(code: str) -> Dict[str, bool]: """Check for potentially dangerous operations.""" checks = { 'no_file_deletion': 'os.remove' not in code and 'shutil.rmtree' not in code, 'no_system_calls': 'os.system' not in code and 'subprocess.call' not in code, 'no_network': 'requests.' not in code and 'urllib' not in code, 'no_eval': 'eval(' not in code and 'exec(' not in code, }

all_safe = all(checks.values())

return {
    'safe': all_safe,
    'checks': checks
}

import re

def safety_check(code: str) -> Dict[str, bool]: """检查潜在危险操作。""" checks = { 'no_file_deletion': 'os.remove' not in code and 'shutil.rmtree' not in code, 'no_system_calls': 'os.system' not in code and 'subprocess.call' not in code, 'no_network': 'requests.' not in code and 'urllib' not in code, 'no_eval': 'eval(' not in code and 'exec(' not in code, }

all_safe = all(checks.values())

return {
    'safe': all_safe,
    'checks': checks
}

Use before execution

执行前使用

safety_result = safety_check(generated_code) if not safety_result['safe']: print("WARNING: Potentially unsafe code detected!") print(f"Failed checks: {[k for k, v in safety_result['checks'].items() if not v]}") # Decide whether to proceed

undefined

safety_result = safety_check(generated_code) if not safety_result['safe']: print("警告: 检测到潜在不安全代码!") print(f"未通过检查项: {[k for k, v in safety_result['checks'].items() if not v]}") # 决定是否继续执行

undefined

Best Practices

最佳实践

Always use environment variables for API keys, never hardcode
Checkpoint frequently to avoid losing progress
Validate generated code before execution
Monitor performance across iterations to detect degradation
Use Docker containers for safe code execution
Implement timeouts for long-running operations
Log extensively for debugging and analysis
Test on small batches before full-scale runs

始终使用环境变量存储API密钥，切勿硬编码
频繁保存检查点避免进度丢失
执行前验证生成的代码
跨迭代监控性能以检测性能下降
使用Docker容器实现安全代码执行
为长时间运行的操作设置超时
广泛记录日志用于调试和分析
全量运行前先测试小批量任务