Loading...
Loading...
Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.
npx skill4agent add wanshuiyin/auto-claude-code-research-in-sleep run-experimentCLAUDE.mdCLAUDE.mdssh <server> nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheadernvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader
# or for Mac MPS:
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"CLAUDE.mdcode_syncrsyncrsync -avz --include='*.py' --exclude='*' <local_src>/ <server>:<remote_dst>/code_sync: git# 1. Push from local
git add -A && git commit -m "sync: experiment deployment" && git push
# 2. Pull on server
ssh <server> "cd <remote_dst> && git pull"wandb: truewandbfalseimport wandbwandb.initimport wandb
wandb.init(project=WANDB_PROJECT, name=EXP_NAME, config={...hyperparams...})
# Inside training loop:
wandb.log({"train/loss": loss, "train/lr": lr, "step": step})
# After eval:
wandb.log({"eval/loss": eval_loss, "eval/ppl": ppl, "eval/accuracy": acc})
# At end:
wandb.finish()train/losstrain/lreval/losseval/ppleval/accuracygpu/memory_usedtorch.cuda.max_memory_allocated()speed/samples_per_secssh <server> "wandb status" # should show logged in
# If not logged in:
ssh <server> "wandb login <WANDB_API_KEY>"The W&B project name and API key come from(see example below). The experiment name is auto-generated from the script name + timestamp.CLAUDE.md
ssh <server> "screen -dmS <exp_name> bash -c '\
eval \"\$(<conda_path>/conda shell.bash hook)\" && \
conda activate <env> && \
CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>'"# Linux with CUDA
CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>
# Mac with MPS (PyTorch uses MPS automatically)
python <script> <args> 2>&1 | tee <log_file>run_in_background: truessh <server> "screen -ls"~/.claude/feishu.jsonexperiment_done"off"teerun_in_background: trueCLAUDE.md## Remote Server
- SSH: `ssh my-gpu-server`
- GPU: 4x A100 (80GB each)
- Conda: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code dir: `/home/user/experiments/`
- code_sync: rsync # default. Or set to "git" for git push/pull workflow
- wandb: false # set to "true" to auto-add W&B logging to experiment scripts
- wandb_project: my-project # W&B project name (required if wandb: true)
- wandb_entity: my-team # W&B team/user (optional, uses default if omitted)
## Local Environment
- Mac MPS / Linux CUDA
- Conda env: `ml` (Python 3.10 + PyTorch)W&B setup: Runon your server once (or setwandb loginenv var). The skill reads project/entity from CLAUDE.md and addsWANDB_API_KEY+wandb.init()to your training scripts automatically. Dashboard:wandb.log().https://wandb.ai/<entity>/<project>