symbolic-equation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Symbolic Equation Discovery

符号方程发现

Discover interpretable scientific equations from data using LLM-guided evolutionary search.
使用LLM引导的进化搜索从数据中发现可解释的科学方程。

Input

输入

  • $0
    — Dataset description, variable names, and physical context
  • $0
    — 数据集描述、变量名称和物理背景

References

参考文献

  • LLM-SR patterns (prompts, evolution, sampling):
    ~/.claude/skills/symbolic-equation/references/llmsr-patterns.md
  • LLM-SR模式(提示词、进化、采样):
    ~/.claude/skills/symbolic-equation/references/llmsr-patterns.md

Workflow (from LLM-SR)

工作流程(源自LLM-SR)

Step 1: Define Problem Specification

步骤1:定义问题规范

Create a specification with:
  1. Input variables: Physical quantities with types (e.g.,
    x: np.ndarray
    ,
    v: np.ndarray
    )
  2. Output variable: Target quantity to predict
  3. Evaluation function: Fitness metric (typically negative MSE with parameter optimization)
  4. Physical context: Domain knowledge to guide equation discovery
python
undefined
创建包含以下内容的规范:
  1. 输入变量:带类型的物理量(例如
    x: np.ndarray
    ,
    v: np.ndarray
  2. 输出变量:需要预测的目标量
  3. 评估函数:适应度指标(通常是带参数优化的负均方误差)
  4. 物理背景:用于指导方程发现的领域知识
python
undefined

Example specification

示例规范

@equation.evolve def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray: """Describe the acceleration of a damped nonlinear oscillator.""" return params[0] * x
undefined
@equation.evolve def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray: """描述阻尼非线性振荡器的加速度。""" return params[0] * x
undefined

Step 2: Initialize Multi-Island Buffer

步骤2:初始化多岛屿缓冲区

  • Create N islands (default: 10) for population diversity
  • Each island maintains independent clusters of equations
  • Clusters group equations by performance signature
  • 创建N个岛屿(默认值:10)以保证种群多样性
  • 每个岛屿维护独立的方程聚类
  • 聚类根据性能特征对方程进行分组

Step 3: Evolutionary Search Loop

步骤3:进化搜索循环

Repeat until convergence or max samples:
  1. Select island: Random island selection
  2. Build prompt: Sample top equations from clusters (softmax-weighted by score)
  3. LLM proposes: Generate new equation as improved version
  4. Evaluate: Execute on test data, compute fitness score
  5. Register: Add to island's cluster if valid
重复执行直至收敛或达到最大样本数:
  1. 选择岛屿:随机选择岛屿
  2. 构建提示词:从聚类中采样顶级方程(按分数进行softmax加权)
  3. LLM生成:生成作为改进版本的新方程
  4. 评估:在测试数据上执行,计算适应度分数
  5. 注册:如果有效则添加到岛屿的聚类中

Step 4: Prompt Construction

步骤4:提示词构建

Present previous equations as versioned sequence:
python
def equation_v0(x, v, params):
    """Initial version."""
    return params[0] * x

def equation_v1(x, v, params):
    """Improved version of equation_v0."""
    return params[0] * x + params[1] * v

def equation_v2(x, v, params):
    """Improved version of equation_v1."""
    # LLM completes this
将先前的方程作为版本化序列呈现:
python
def equation_v0(x, v, params):
    """初始版本。"""
    return params[0] * x

def equation_v1(x, v, params):
    """equation_v0的改进版本。"""
    return params[0] * x + params[1] * v

def equation_v2(x, v, params):
    """equation_v1的改进版本。"""
    # LLM完成此部分

Step 5: Island Reset (Diversity Maintenance)

步骤5:岛屿重置(多样性维护)

Periodically (default: every 4 hours):
  1. Sort islands by best score
  2. Reset bottom 50% of islands
  3. Seed each reset island with best equation from a surviving island
  4. Restart cluster sampling temperature
定期执行(默认:每4小时):
  1. 按最佳分数对岛屿排序
  2. 重置排名后50%的岛屿
  3. 用存活岛屿中的最佳方程为每个重置岛屿播种
  4. 重启聚类采样温度

Step 6: Extract Best Equations

步骤6:提取最佳方程

After search completes:
  1. Collect best equation from each island
  2. Rank by fitness score
  3. Simplify if possible (algebraic simplification)
  4. Report with physical interpretation
搜索完成后:
  1. 收集每个岛屿的最佳方程
  2. 按适应度分数排名
  3. 尽可能简化(代数简化)
  4. 结合物理解释进行报告

Cluster Sampling

聚类采样

Temperature-scheduled softmax over cluster scores:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)
  • Higher temperature → more exploration
  • Lower temperature → more exploitation of best clusters
  • Within clusters: shorter programs are preferred (Occam's razor)
基于温度调度的softmax聚类分数:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)
  • 温度越高 → 探索性越强
  • 温度越低 → 对最佳聚类的利用性越强
  • 聚类内部:优先选择更简短的方程(奥卡姆剃刀原则)

Rules

规则

  • Equations must use only standard mathematical operations
  • Parameter optimization via scipy BFGS or Adam
  • Fitness = negative MSE (higher is better)
  • Timeout protection for equation evaluation
  • No recursive equations allowed
  • Physical interpretability is preferred over pure fit
  • 方程只能使用标准数学运算
  • 通过scipy BFGS或Adam进行参数优化
  • 适应度 = 负均方误差(分数越高越好)
  • 方程评估的超时保护
  • 不允许递归方程
  • 优先考虑物理解释性而非纯拟合效果

Related Skills

相关技能

  • Upstream: data-analysis, math-reasoning
  • Downstream: paper-writing-section
  • See also: algorithm-design
  • 上游:数据分析, 数学推理
  • 下游:论文写作章节
  • 另见:算法设计