Loading...
Loading...
Compare original and translation side by side
undefinedundefined
**Python training loop:**
```python
import pufferlib
from pufferlib import PuffeRL
**Python训练循环:**
```python
import pufferlib
from pufferlib import PuffeRL
**For comprehensive training guidance**, read `references/training.md` for:
- Complete training workflow and CLI options
- Hyperparameter tuning with Protein
- Distributed multi-GPU/multi-node training
- Logger integration (Weights & Biases, Neptune)
- Checkpointing and resume training
- Performance optimization tips
- Curriculum learning patterns
**完整训练指南**,请阅读`references/training.md`了解:
- 完整训练流程和CLI选项
- 使用Protein进行超参数调优
- 分布式多GPU/多节点训练
- 日志集成(Weights & Biases、Neptune)
- 检查点与训练恢复
- 性能优化技巧
- 课程学习模式import numpy as np
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
super().__init__(buf)
# Define spaces
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# Reset state and return initial observation
return np.zeros(4, dtype=np.float32)
def step(self, action):
# Execute action, compute reward, check done
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, infoscripts/env_template.pyreferences/environments.mdimport numpy as np
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
super().__init__(buf)
# Define spaces
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# Reset state and return initial observation
return np.zeros(4, dtype=np.float32)
def step(self, action):
# Execute action, compute reward, check done
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, infoscripts/env_template.pyreferences/environments.mdimport pufferlibimport pufferlib
**Key optimizations:**
- Shared memory buffers for zero-copy observation passing
- Busy-wait flags instead of pipes/queues
- Surplus environments for async returns
- Multiple environments per worker
**For vectorization optimization**, read `references/vectorization.md` for:
- Architecture and performance characteristics
- Worker and batch size configuration
- Serial vs multiprocessing vs async modes
- Shared memory and zero-copy patterns
- Hierarchical vectorization for large scale
- Multi-agent vectorization strategies
- Performance profiling and troubleshooting
**关键优化:**
- 共享内存缓冲区实现零拷贝观测传递
- 忙等待标志替代管道/队列
- 冗余环境实现异步返回
- 每个 worker 对应多个环境
**向量化优化指南**,请阅读`references/vectorization.md`了解:
- 架构与性能特征
- Worker和批量大小配置
- 串行 vs 多进程 vs 异步模式
- 共享内存与零拷贝模式
- 大规模分层向量化
- 多智能体向量化策略
- 性能分析与故障排除import torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)references/policies.mdimport torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)references/policies.mdimport gymnasium as gym
import pufferlibimport gymnasium as gym
import pufferlib
**PettingZoo multi-agent:**
```python
**PettingZoo多智能体:**
```python
**Supported frameworks:**
- Gymnasium / OpenAI Gym
- PettingZoo (parallel and AEC)
- Atari (ALE)
- Procgen
- NetHack / MiniHack
- Minigrid
- Neural MMO
- Crafter
- GPUDrive
- MicroRTS
- Griddly
- And more...
**For integration details**, read `references/integration.md` for:
- Complete integration examples for each framework
- Custom wrappers (observation, reward, frame stacking, action repeat)
- Space flattening and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Integration debugging
**支持的框架:**
- Gymnasium / OpenAI Gym
- PettingZoo(并行和AEC)
- Atari (ALE)
- Procgen
- NetHack / MiniHack
- Minigrid
- Neural MMO
- Crafter
- GPUDrive
- MicroRTS
- Griddly
- 以及更多...
**集成细节**,请阅读`references/integration.md`了解:
- 各框架的完整集成示例
- 自定义包装器(观测、奖励、帧堆叠、动作重复)
- 空间扁平化与还原
- 环境注册
- 兼容模式
- 性能考量
- 集成调试scripts/train_template.pyreferences/training.mdscripts/train_template.pyreferences/training.mdscripts/env_template.pyreset()step()pufferlib.emulate()make()references/environments.mdreferences/vectorization.mdscripts/env_template.pyreset()step()pufferlib.emulate()make()references/environments.mdreferences/vectorization.mdlayer_initreferences/policies.mdlayer_initreferences/policies.mdreferences/vectorization.mdreferences/vectorization.mdscripts/train_template.pyscripts/env_template.pylayer_initpufferlib.pytorchscripts/train_template.pyscripts/env_template.pypufferlib.pytorchlayer_initundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefineduv pip install pufferlibuv pip install pufferlib