symbolic-equation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSymbolic Equation Discovery
符号方程发现
Discover interpretable scientific equations from data using LLM-guided evolutionary search.
使用LLM引导的进化搜索从数据中发现可解释的科学方程。
Input
输入
- — Dataset description, variable names, and physical context
$0
- — 数据集描述、变量名称和物理背景
$0
References
参考文献
- LLM-SR patterns (prompts, evolution, sampling):
~/.claude/skills/symbolic-equation/references/llmsr-patterns.md
- LLM-SR模式(提示词、进化、采样):
~/.claude/skills/symbolic-equation/references/llmsr-patterns.md
Workflow (from LLM-SR)
工作流程(源自LLM-SR)
Step 1: Define Problem Specification
步骤1:定义问题规范
Create a specification with:
- Input variables: Physical quantities with types (e.g., ,
x: np.ndarray)v: np.ndarray - Output variable: Target quantity to predict
- Evaluation function: Fitness metric (typically negative MSE with parameter optimization)
- Physical context: Domain knowledge to guide equation discovery
python
undefined创建包含以下内容的规范:
- 输入变量:带类型的物理量(例如 ,
x: np.ndarray)v: np.ndarray - 输出变量:需要预测的目标量
- 评估函数:适应度指标(通常是带参数优化的负均方误差)
- 物理背景:用于指导方程发现的领域知识
python
undefinedExample specification
示例规范
@equation.evolve
def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray:
"""Describe the acceleration of a damped nonlinear oscillator."""
return params[0] * x
undefined@equation.evolve
def equation(x: np.ndarray, v: np.ndarray, params: np.ndarray) -> np.ndarray:
"""描述阻尼非线性振荡器的加速度。"""
return params[0] * x
undefinedStep 2: Initialize Multi-Island Buffer
步骤2:初始化多岛屿缓冲区
- Create N islands (default: 10) for population diversity
- Each island maintains independent clusters of equations
- Clusters group equations by performance signature
- 创建N个岛屿(默认值:10)以保证种群多样性
- 每个岛屿维护独立的方程聚类
- 聚类根据性能特征对方程进行分组
Step 3: Evolutionary Search Loop
步骤3:进化搜索循环
Repeat until convergence or max samples:
- Select island: Random island selection
- Build prompt: Sample top equations from clusters (softmax-weighted by score)
- LLM proposes: Generate new equation as improved version
- Evaluate: Execute on test data, compute fitness score
- Register: Add to island's cluster if valid
重复执行直至收敛或达到最大样本数:
- 选择岛屿:随机选择岛屿
- 构建提示词:从聚类中采样顶级方程(按分数进行softmax加权)
- LLM生成:生成作为改进版本的新方程
- 评估:在测试数据上执行,计算适应度分数
- 注册:如果有效则添加到岛屿的聚类中
Step 4: Prompt Construction
步骤4:提示词构建
Present previous equations as versioned sequence:
python
def equation_v0(x, v, params):
"""Initial version."""
return params[0] * x
def equation_v1(x, v, params):
"""Improved version of equation_v0."""
return params[0] * x + params[1] * v
def equation_v2(x, v, params):
"""Improved version of equation_v1."""
# LLM completes this将先前的方程作为版本化序列呈现:
python
def equation_v0(x, v, params):
"""初始版本。"""
return params[0] * x
def equation_v1(x, v, params):
"""equation_v0的改进版本。"""
return params[0] * x + params[1] * v
def equation_v2(x, v, params):
"""equation_v1的改进版本。"""
# LLM完成此部分Step 5: Island Reset (Diversity Maintenance)
步骤5:岛屿重置(多样性维护)
Periodically (default: every 4 hours):
- Sort islands by best score
- Reset bottom 50% of islands
- Seed each reset island with best equation from a surviving island
- Restart cluster sampling temperature
定期执行(默认:每4小时):
- 按最佳分数对岛屿排序
- 重置排名后50%的岛屿
- 用存活岛屿中的最佳方程为每个重置岛屿播种
- 重启聚类采样温度
Step 6: Extract Best Equations
步骤6:提取最佳方程
After search completes:
- Collect best equation from each island
- Rank by fitness score
- Simplify if possible (algebraic simplification)
- Report with physical interpretation
搜索完成后:
- 收集每个岛屿的最佳方程
- 按适应度分数排名
- 尽可能简化(代数简化)
- 结合物理解释进行报告
Cluster Sampling
聚类采样
Temperature-scheduled softmax over cluster scores:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)- Higher temperature → more exploration
- Lower temperature → more exploitation of best clusters
- Within clusters: shorter programs are preferred (Occam's razor)
基于温度调度的softmax聚类分数:
temperature = T_init * (1 - (num_programs % period) / period)
probabilities = softmax(cluster_scores / temperature)- 温度越高 → 探索性越强
- 温度越低 → 对最佳聚类的利用性越强
- 聚类内部:优先选择更简短的方程(奥卡姆剃刀原则)
Rules
规则
- Equations must use only standard mathematical operations
- Parameter optimization via scipy BFGS or Adam
- Fitness = negative MSE (higher is better)
- Timeout protection for equation evaluation
- No recursive equations allowed
- Physical interpretability is preferred over pure fit
- 方程只能使用标准数学运算
- 通过scipy BFGS或Adam进行参数优化
- 适应度 = 负均方误差(分数越高越好)
- 方程评估的超时保护
- 不允许递归方程
- 优先考虑物理解释性而非纯拟合效果
Related Skills
相关技能
- Upstream: data-analysis, math-reasoning
- Downstream: paper-writing-section
- See also: algorithm-design
- 上游:数据分析, 数学推理
- 下游:论文写作章节
- 另见:算法设计