One skill for the full Railway operator loop:
status → debug → fix → deploy → verify. It wraps the
CLI in a non-interactive, JSON-first style that an agent can drive without prompts, and it leans on
from the environment instead of an interactive
.
This skill is
repo-agnostic. It assumes the project is hosted on Railway (railway.com) and that a
is exported in the environment. It makes no assumptions about the stack (Node, Python, Go, Docker, Nixpacks/Railpack — Railway's builder figures it out).
这是一款覆盖Railway完整运维流程的技能:
状态查看 → 调试 → 修复 → 部署 → 验证。它以非交互式、优先JSON输出的方式封装
CLI,支持Agent无需提示即可驱动操作,并且依赖环境中的
而非交互式的
完成认证。
本技能
与仓库无关。它假设项目托管在Railway(railway.com)上,且环境中已导出
,不对技术栈做任何假设(Node、Python、Go、Docker、Nixpacks/Railpack——Railway的构建器会自动识别)。
When this skill triggers
触发场景
Phrases that should route here:
- Deploy / build
- "deploy this to railway"
- "push to railway", "ship to railway", "railway up"
- "build is failing on railway", "why did my build fail"
- Logs / debugging
- "show me the railway logs", "tail the logs", "railway logs --since 1h"
- "why is my service crashing on railway"
- "show me the 500s on railway", "show http logs", "show slow requests"
- "find the request id abc123 in railway logs"
- Ops
- "redeploy on railway", "restart the api service", "roll back the last deploy"
- "scale my railway service", "remove the latest deployment"
- State / discovery
- "list my railway projects", "what services are in this project", "list deployments"
- "what's the status of my railway project", "is my service healthy"
- Variables
- "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
- Run / connect
- "run this script with railway production env", "open a shell with railway env"
- "ssh into my railway service", "connect to my railway postgres / redis / mongo"
- Metrics
- "what's the cpu/memory on railway", "is my service hitting limits"
- "p95 latency on railway", "request rate on /api"
Skip when:
- The host is not Railway (Fly, Render, Vercel, AWS, …). This skill knows the CLI; it does not generalise.
- The fix is a code change with no operational lever — let the normal dev-process skills handle the code; come back here once it's time to deploy or read logs.
以下语句应路由至本技能:
- 部署/构建
- "deploy this to railway"
- "push to railway", "ship to railway", "railway up"
- "build is failing on railway", "why did my build fail"
- 日志/调试
- "show me the railway logs", "tail the logs", "railway logs --since 1h"
- "why is my service crashing on railway"
- "show me the 500s on railway", "show http logs", "show slow requests"
- "find the request id abc123 in railway logs"
- 运维操作
- "redeploy on railway", "restart the api service", "roll back the last deploy"
- "scale my railway service", "remove the latest deployment"
- 状态/发现
- "list my railway projects", "what services are in this project", "list deployments"
- "what's the status of my railway project", "is my service healthy"
- 变量管理
- "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
- 运行/连接
- "run this script with railway production env", "open a shell with railway env"
- "ssh into my railway service", "connect to my railway postgres / redis / mongo"
- 指标查看
- "what's the cpu/memory on railway", "is my service hitting limits"
- "p95 latency on railway", "request rate on /api"
以下情况无需使用本技能:
- 应用托管在非Railway平台(Fly、Render、Vercel、AWS等)。本技能仅适配 CLI,不支持通用场景。
- 修复需求仅涉及代码变更而无运维操作——让常规开发流程技能处理代码变更,待需要部署或查看日志时再使用本技能。
- CLI on PATH. should resolve. If not, install:
npm install -g @railway/cli
(or use the official installer at https://docs.railway.com/guides/cli). Minimum version: 4.x (this skill assumes the modern subcommand layout — , , , on most commands).
- Auth via env var.
echo "${RAILWAY_TOKEN:0:8}…"
should print a non-empty prefix. The CLI reads directly — do not run in agent sessions. Two token shapes exist:
- Account / personal token (created at https://railway.com/account/tokens) — works across every workspace, project, and environment the user has access to. Required for , , and any cross-project view.
- Project token (created in a project's Settings → Tokens, scoped to one project+environment) — works for that single project/env. and other workspace-level commands return with this kind; , , , all work.
First call that returns / is the signal to ask the user which shape they configured and whether they need to widen scope.
- No interactive prompts. Always pass explicit
--project / --service / --environment
flags (and , , where they exist) instead of relying on linked state. Linked state is a directory and survives across CLI calls, but in fresh agent sessions there is no link yet — set the scope every call until the user explicitly asks to link.
The skill is a small state machine. Pick the smallest entry point that answers the user's question; don't run discovery they didn't ask for.
┌──────────┐
│ Discover │ list projects / services / deployments / env / vars
└────┬─────┘
▼
┌──────────┐ ┌──────────┐
│ Debug │◄──────►│ Metrics │ logs (build/deploy/http) + cpu/mem/p95
└────┬─────┘ └──────────┘
▼
┌──────────┐
│ Fix │ variables set / code edit / config change
└────┬─────┘
▼
┌──────────┐ ┌──────────┐
│ Deploy │───────►│ Verify │ up / redeploy / restart / down / roll back
└──────────┘ └──────────┘
本技能是一个小型状态机。选择能响应用户问题的最小入口点,不要执行用户未请求的发现操作。
┌──────────┐
│ 发现资源 │ 列出项目/服务/部署/环境/变量
└────┬─────┘
▼
┌──────────┐ ┌──────────┐
│ 调试 │◄──────►│ 指标查看 │ 日志(构建/部署/HTTP)+ CPU/内存/P95延迟
└────┬─────┘ └──────────┘
▼
┌──────────┐
│ 修复 │ 设置变量/编辑代码/修改配置
└────┬─────┘
▼
┌──────────┐ ┌──────────┐
│ 部署 │───────►│ 验证 │ 上传部署/重新部署/重启/移除部署/回滚
└──────────┘ └──────────┘
Step 1 — discover
步骤1 — 发现资源
Always start by capturing the IDs you need, so subsequent calls are explicit. Prefer JSON output for parsing.
始终先获取所需的ID,以便后续调用明确目标。优先使用JSON输出以便解析。
Workspace-wide view (requires an account token).
工作区全局视图(需要账户令牌)。
A single project's structure (project token works here too if you pass --project).
单个项目的结构(若传递--project参数,项目令牌也可使用)。
railway status --json --project "$PROJECT_ID"
railway status --json --project "$PROJECT_ID"
Services and environments inside that project.
项目内的服务和环境。
railway service list --json --project "$PROJECT_ID" --environment production
railway environment list --json --project "$PROJECT_ID"
railway service list --json --project "$PROJECT_ID" --environment production
railway environment list --json --project "$PROJECT_ID"
Deployments for a specific service (most recent first).
指定服务的部署记录(按时间倒序,最新在前)。
railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json
Capture from the JSON: project id, environment id/name (typically `production`, `staging`, plus PR preview envs), service ids/names, the latest deployment id and its `status` (`SUCCESS`, `FAILED`, `BUILDING`, `DEPLOYING`, `CRASHED`, `REMOVED`). The deployment status is what tells you whether the symptom is a build problem, a startup problem, or a steady-state runtime problem — pick the right log stream accordingly.
railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json
从JSON输出中提取:项目ID、环境ID/名称(通常为`production`、`staging`,以及PR预览环境)、服务ID/名称、最新部署ID及其`status`(`SUCCESS`、`FAILED`、`BUILDING`、`DEPLOYING`、`CRASHED`、`REMOVED`)。部署状态可帮助判断问题类型是构建失败、启动失败还是运行时稳态问题——据此选择对应的日志流。
Step 2 — debug (logs first)
步骤2 — 调试(优先查看日志)
Railway exposes three log streams. Pick the one that matches the failure mode; mixing them makes the tail unreadable.
| Stream | Flag | Use when |
|---|
| Deploy / runtime | (default) | The app is up but misbehaving, or it crashed after starting. |
| Build | railway logs --build [DEPLOYMENT_ID]
| Deployment is and never reached runtime. |
| HTTP | | The symptom is a status code, a latency spike, a specific request. |
Default to
historical, non-streaming queries in agent sessions — streaming hangs the shell. Any of
,
, or
disables streaming.
Railway提供三种日志流。选择与故障模式匹配的日志流;混合查看会导致日志难以阅读。
| 日志流 | 参数 | 使用场景 |
|---|
| 部署/运行时 | (默认) | 应用已启动但行为异常,或启动后崩溃。 |
| 构建 | railway logs --build [DEPLOYMENT_ID]
| 部署状态为且未进入运行阶段。 |
| HTTP | | 问题表现为状态码异常、延迟飙升或特定请求异常。 |
在Agent会话中默认使用
历史非流式查询——流式查询会导致Shell挂起。使用
、
或
参数均可禁用流式输出。
Last 200 deploy log lines, JSON for parsing.
最近200条部署日志,JSON格式以便解析。
railway logs
--project "$PROJECT_ID" --service api --environment production
--lines 200 --json
railway logs
--project "$PROJECT_ID" --service api --environment production
--lines 200 --json
Build logs for the failed deployment specifically.
特定失败部署的构建日志。
railway logs --build "$DEPLOYMENT_ID"
--project "$PROJECT_ID" --service api --environment production
--lines 500
railway logs --build "$DEPLOYMENT_ID"
--project "$PROJECT_ID" --service api --environment production
--lines 500
Errors only in the last hour.
最近1小时内的错误日志。
railway logs --since 1h --filter "@level:error" --lines 100
--project "$PROJECT_ID" --service api --environment production
railway logs --since 1h --filter "@level:error" --lines 100
--project "$PROJECT_ID" --service api --environment production
All 5xx HTTP responses in the last 30 minutes.
最近30分钟内的所有5xx HTTP响应日志。
railway logs --http --since 30m --status ">=500" --lines 200
--project "$PROJECT_ID" --service api --environment production --json
railway logs --http --since 30m --status ">=500" --lines 200
--project "$PROJECT_ID" --service api --environment production --json
Slow GETs on a specific path.
指定路径上的慢GET请求日志。
railway logs --http --method GET --path /api/users
--filter "@totalDuration:>=1000" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production
railway logs --http --method GET --path /api/users
--filter "@totalDuration:>=1000" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production
Trace one request end-to-end.
追踪单个请求的完整链路日志。
railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production
Filter syntax (Railway query language, also accepted in the dashboard):
- **Text**: bare words → substring match; `"two words"` → phrase.
- **Level** (deploy/build only): `@level:error`, `@level:warn`, `@level:info`.
- **HTTP fields**: `@httpStatus`, `@method`, `@path`, `@host`, `@requestId`, `@clientUa`, `@srcIp`, `@edgeRegion`, `@upstreamAddress`, `@upstreamProto`, `@downstreamProto`, `@responseDetails`, `@deploymentId`, `@deploymentInstanceId`, `@totalDuration`, `@responseTime`, `@upstreamRqDuration`, `@txBytes`, `@rxBytes`, `@upstreamErrors`.
- **Operators**: `> >= < <= ..` (range, e.g. `@httpStatus:200..299`); `AND`, `OR`, `-` (negation), parentheses.
If the user asks for "logs from the latest deployment even if it failed", add `--latest` — otherwise `railway logs` walks back to the most recent **successful** deployment, which is usually not what you want when debugging a regression.
railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production
过滤语法(Railway查询语言,同样适用于控制台):
- **文本**:单个单词 → 子串匹配;`"two words"` → 短语匹配。
- **日志级别**(仅部署/构建日志):`@level:error`、`@level:warn`、`@level:info`。
- **HTTP字段**:`@httpStatus`、`@method`、`@path`、`@host`、`@requestId`、`@clientUa`、`@srcIp`、`@edgeRegion`、`@upstreamAddress`、`@upstreamProto`、`@downstreamProto`、`@responseDetails`、`@deploymentId`、`@deploymentInstanceId`、`@totalDuration`、`@responseTime`、`@upstreamRqDuration`、`@txBytes`、`@rxBytes`、`@upstreamErrors`。
- **操作符**:`> >= < <= ..`(范围,例如`@httpStatus:200..299`);`AND`、`OR`、`-`(否定)、括号。
如果用户要求“查看最新部署的日志,即使部署失败”,需添加`--latest`参数——否则`railway logs`会默认返回最近一次**成功**部署的日志,这通常不是调试回归问题时需要的内容。
Step 3 — metrics (sanity check resource state)
步骤3 — 指标查看(验证资源状态)
Pair logs with metrics when the symptom could be a resource ceiling (OOM kills, CPU throttling, egress bursts, volume full).
当问题可能由资源上限导致时(OOM终止、CPU节流、出口流量突增、存储卷已满),需结合日志和指标进行分析。
Compact summary for the linked service, last hour.
关联服务的紧凑摘要,最近1小时数据。
railway metrics --json
--project "$PROJECT_ID" --service api --environment production
railway metrics --json
--project "$PROJECT_ID" --service api --environment production
Specific dimensions.
指定维度的指标数据。
railway metrics --cpu --memory --since 6h --json
--project "$PROJECT_ID" --service api --environment production
railway metrics --cpu --memory --since 6h --json
--project "$PROJECT_ID" --service api --environment production
HTTP percentiles + RPS for a path.
指定路径的HTTP百分位数+请求速率。
railway metrics --http --path /api/users --json --since 1h
--project "$PROJECT_ID" --service api --environment production
railway metrics --http --path /api/users --json --since 1h
--project "$PROJECT_ID" --service api --environment production
Table across every service in the project.
项目内所有服务的指标表格。
railway metrics --all --json
--project "$PROJECT_ID" --environment production
Read these together with the deploy log: a memory line climbing into the service's limit followed by a sudden gap is an OOM; sustained CPU at 100% with growing p95 is a throttle. Don't editorialise beyond what the numbers show.
railway metrics --all --json
--project "$PROJECT_ID" --environment production
结合部署日志分析指标:内存占用攀升至服务上限后突然出现断层,说明发生了OOM;CPU持续100%且P95延迟不断增加,说明存在CPU节流。仅基于数据输出结论,不要添加主观推断。
Step 4 — variables (read-only first, write only on confirmation)
步骤4 — 变量管理(优先只读操作,修改需确认)
Variables are usually where misconfiguration hides.
Listing variables prints secret values — treat the output as confidential, never echo raw values back into the chat, and use
so you can summarise (key names + value lengths) instead of pasting plaintext secrets.
配置错误通常隐藏在变量中。
列出变量会显示敏感值——需将输出视为机密信息,切勿直接在聊天中回显原始值,使用
参数以便汇总(仅显示键名+值长度)而非粘贴明文敏感信息。
Read — JSON includes raw values; redact before surfacing.
读取变量——JSON输出包含原始值;展示前需脱敏。
railway variable list --json
--project "$PROJECT_ID" --service api --environment production
railway variable list --json
--project "$PROJECT_ID" --service api --environment production
Write — explicit confirmation required before running.
设置变量——执行前需明确确认。
railway variable set "FEATURE_FLAG=on"
--project "$PROJECT_ID" --service api --environment production
railway variable set "FEATURE_FLAG=on"
--project "$PROJECT_ID" --service api --environment production
Setting a variable triggers a redeploy by default; add --skip-deploys
默认情况下,设置变量会触发重新部署;若需设置后不重新部署,需添加--skip-deploys参数(位于子命令之前的顶级参数)。
(top-level, before the subcommand) to set without redeploying.
删除变量。
railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production
Default to listing first ("here are the keys configured on production; which one do you want to change?") and only run `set` / `delete` after the user picks a target. For new secrets, prefer reading from stdin so the plaintext never enters the agent's argv buffer (visible in `ps`): pipe the value into `railway variable set --stdin KEY` (a top-level option on the legacy `variable` form; the modern flow is `railway variable set "KEY=$(< file)"` from a local file the user controls).
railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production
默认先执行列出操作(“以下是生产环境配置的变量键名;您需要修改哪一个?”),仅在用户指定目标后再执行`set`/`delete`操作。对于新的敏感值,优先从标准输入读取,避免明文进入Agent的argv缓冲区(可通过`ps`命令查看):将值通过管道传入`railway variable set --stdin KEY`(旧版`variable`命令的顶级选项;新版流程是从用户控制的本地文件读取:`railway variable set "KEY=$(< file)"`)。
Step 5 — fix and deploy
步骤5 — 修复与部署
Three deploy verbs, in increasing order of intent:
| Verb | Effect | Use when |
|---|
| Restart the latest deployment without rebuilding. | Process is wedged but the build artefact is fine. |
| Re-run the latest deployment (or to pull the newest commit / image). | A transient failure or you want to redeploy the same artefact. Use to pick up new commits without uploading. |
| Upload the current working directory and deploy it. | The fix is a code change in this repo. |
Non-interactive defaults:
部署操作分为三种,按操作意图强度递增排序:
| 命令 | 效果 | 使用场景 |
|---|
| 重启最新部署,不重新构建。 | 进程僵死但构建产物正常。 |
| 重新执行最新部署(或使用拉取最新提交/镜像)。 | 临时故障,或需重新部署同一构建产物。使用参数可在不上传本地代码的情况下拉取最新提交。 |
| 上传当前工作目录并部署。 | 修复内容为当前仓库的代码变更。 |
非交互式默认配置:
Restart (no rebuild). -y skips the confirmation dialog.
重启(不重新构建)。-y参数跳过确认对话框。
railway restart -y --json
--project "$PROJECT_ID" --service api --environment production
railway restart -y --json
--project "$PROJECT_ID" --service api --environment production
Redeploy the latest deployment.
重新部署最新部署记录。
railway redeploy -y --json
--project "$PROJECT_ID" --service api --environment production
railway redeploy -y --json
--project "$PROJECT_ID" --service api --environment production
Redeploy and pull the newest commit / image from the configured source.
重新部署并从配置源拉取最新提交/镜像。
railway redeploy -y --from-source --json
--project "$PROJECT_ID" --service api --environment production
railway redeploy -y --from-source --json
--project "$PROJECT_ID" --service api --environment production
Upload and deploy this directory. --ci streams build logs only, then exits;
上传当前目录并部署。--ci参数仅流式输出构建日志,然后退出;非常适合Agent会话(无交互式日志附加)。
perfect for agent sessions (no interactive log attach).
railway up --ci
--project "$PROJECT_ID" --service api --environment production
--message "fix: bump httpx to 0.27 to pick up TLS bug fix"
railway up --ci
--project "$PROJECT_ID" --service api --environment production
--message "fix: bump httpx to 0.27 to pick up TLS bug fix"
Remove the most recent deployment (rollback to whatever was before it).
删除最近一次部署(回滚至上一次部署状态)。
railway down -y
--project "$PROJECT_ID" --service api --environment production
`railway up --ci` is the agent-friendly form: it implies `CI=true`, streams build logs to stdout, and exits with non-zero on build failure. Without `--ci` the CLI tries to attach a live log pager; in an automation context that hangs.
After deploy, **always verify** by sampling the new deployment's logs and a tiny metrics window — don't just trust the exit code. The Railway build can succeed and the runtime can still crashloop on startup.
```bash
railway down -y
--project "$PROJECT_ID" --service api --environment production
`railway up --ci`是Agent友好的形式:它隐含`CI=true`,将构建日志流式输出到标准输出,构建失败时返回非零退出码。若不使用`--ci`参数,CLI会尝试附加实时日志分页器,在自动化环境中会导致挂起。
部署完成后,**务必进行验证**——采样新部署的日志和一小段时间的指标数据,不要仅依赖退出码。Railway构建可能成功,但运行时仍可能在启动时崩溃循环。
```bash
Quick verification loop.
快速验证流程。
railway deployment list --json --limit 3
--project "$PROJECT_ID" --service api --environment production
railway logs --lines 50 --since 2m
--project "$PROJECT_ID" --service api --environment production
railway deployment list --json --limit 3
--project "$PROJECT_ID" --service api --environment production
railway logs --lines 50 --since 2m
--project "$PROJECT_ID" --service api --environment production
Step 6 — run, shell, ssh, db connect
步骤6 — 运行、Shell、SSH、数据库连接
For development workflows that need production env vars locally, or a shell on the live container:
适用于需要在本地使用生产环境变量的开发工作流,或需要进入实时容器Shell的场景:
Run a one-shot command with the linked service's variables injected.
注入关联服务的变量并执行一次性命令。
railway run --service api --environment production -- node scripts/migrate.js
railway run --service api --environment production -- node scripts/migrate.js
Open a subshell with the same env (interactive — only run when the user is at the terminal).
打开包含相同环境变量的子Shell(交互式——仅当用户在终端前时执行)。
railway shell --service api --environment production --silent
railway shell --service api --environment production --silent
SSH into the running container of a service. -i picks an identity file if Railway
SSH进入服务的运行容器。若Railway无法在~/.ssh中找到可用密钥,使用-i参数指定身份文件。
can't find a usable key in ~/.ssh.
railway ssh
--project "$PROJECT_ID" --service api --environment production
railway ssh
--project "$PROJECT_ID" --service api --environment production
One-shot remote command (non-interactive).
执行一次性远程命令(非交互式)。
railway ssh
--project "$PROJECT_ID" --service api --environment production
-- ls /app
railway ssh
--project "$PROJECT_ID" --service api --environment production
-- ls /app
Open a database shell against a Railway-managed DB service.
打开Railway托管数据库服务的Shell。
railway connect postgres
--project "$PROJECT_ID" --environment production
`railway run env` and `railway run printenv` will print every secret variable for that service — treat the output as you would `railway variable list --json` and never paste it back.
railway connect postgres
--project "$PROJECT_ID" --environment production
`railway run env`和`railway run printenv`会输出该服务的所有敏感变量——需像对待`railway variable list --json`一样处理输出,切勿直接粘贴到聊天中。
Common failure shapes
常见故障场景
Unauthorized. Please check that your RAILWAY_TOKEN is valid
Unauthorized. Please check that your RAILWAY_TOKEN is valid
Either no token, an expired one, or a
project-scoped token being used against a workspace-level command (
,
). Ask the user which token shape they configured; if they need workspace-level commands, they need an account token from
https://railway.com/account/tokens.
The failure is in
logs, not the default deploy stream:
bash
railway logs --build "$DEPLOYMENT_ID" --lines 500 \
--project "$PROJECT_ID" --service api --environment production
If the deployment id is unknown,
railway deployment list --json --limit 5
gives you the most recent failed one.
bash
railway logs --build "$DEPLOYMENT_ID" --lines 500 \
--project "$PROJECT_ID" --service api --environment production
若未知部署ID,执行
railway deployment list --json --limit 5
可获取最近的失败部署记录。
deployment, deploy logs end with the start command
App is dying during startup. Read the tail of
for the actual exception. Common shapes:
- Missing env var (something like or
panic: required environment variable …
) → railway variable list --json
to confirm, then .
- Port binding wrong — Railway sets ; the service must bind to , not a hardcoded port.
- DB connection refused — check the linked DB service is in the same environment and the private network DNS (e.g.
postgres.railway.internal
) is what the app expects.
应用在启动过程中终止。查看
的末尾内容获取实际异常信息。常见场景:
- 缺少环境变量(例如或
panic: required environment variable …
)→ 执行railway variable list --json
确认,然后执行设置变量。
- 端口绑定错误——Railway会设置;服务必须绑定到,而非硬编码端口。
- 数据库连接被拒绝——检查关联的数据库服务是否在同一环境中,且应用使用的私有网络DNS(例如
postgres.railway.internal
)正确。
Build succeeds, runtime 502 / Bad Gateway from the edge
构建成功,运行时边缘返回502 / Bad Gateway
The app didn't bind to
in time (default healthcheck window). Either the app is slow to start (raise
in
/
, or fix the slow startup), or it's binding to
instead of
. Cross-check with
railway logs --http --status 502 --lines 50
to confirm the edge is the source.
应用未在健康检查窗口内绑定到
。可能是应用启动缓慢(在
/
中提高
,或修复启动缓慢问题),或绑定到
而非
。执行
railway logs --http --status 502 --lines 50
确认是否由边缘节点导致。
Pair the deploy log with
railway metrics --memory --since 30m --json
. If memory climbs into the service limit and the gap aligns with the kill, raise the service's memory cap (dashboard or
). Don't silently raise it without telling the user — call out that you saw the ceiling hit.
结合部署日志和
railway metrics --memory --since 30m --json
分析。若内存占用攀升至服务上限且断层时间与终止时间一致,需提高服务的内存上限(通过控制台或
中的
配置)。请勿在未告知用户的情况下擅自修改——需明确告知用户已触发内存上限。
hangs in an agent session
You forgot
. The default mode attaches a live pager that doesn't exit. Kill it, re-run with
.
忘记添加
参数。默认模式会附加实时分页器,不会自动退出。终止进程后,重新添加
参数执行。
Variable changes "didn't take effect"
变量变更“未生效”
triggers a redeploy by default — but if
was passed, the variable is staged and the running deployment still has the old value. Either redeploy explicitly (
) or rerun the set without
.
默认会触发重新部署——但若使用了
参数,变量会被暂存,运行中的部署仍使用旧值。需显式重新部署(
),或不使用
参数重新执行变量设置。
- JSON-first. Add to every command that supports it, and parse with rather than scraping human-readable output. Layouts change; the JSON keys are stable.
- Explicit scope every call. Pass , , on every command in an agent session. Don't rely on linked state — it's invisible to the user and confusing when it drifts.
- Non-streaming logs by default. Always combine with , , or . Streaming is for humans at a terminal, not agents.
- Never paste secrets. , , and all surface plaintext secrets. Summarise (key names, value lengths) instead. If the user explicitly asks for a value, paste it in a code block and remind them it's a secret.
- Confirm before destructive ops. , , , ,
railway environment delete
, , (the project!) all change live state. Repeat the scope back to the user ("restart in of project ?") and wait for explicit confirmation, even if is technically available.
- Verify after deploy. Don't end on a success line. Pull the latest deployment's status and a 50-line log sample so the user sees the actual runtime state, not just the build outcome.
- One failure mode per investigation. Build vs. crashloop vs. 5xx vs. OOM are distinct shapes with distinct log streams. Don't blend their tails in one report.
- 优先JSON输出。为所有支持参数的命令添加该参数,使用解析而非提取人类可读输出。输出格式可能变化,但JSON键名稳定。
- 每次调用明确范围。在Agent会话中执行每个命令时都传递、、参数。不要依赖链接状态——用户无法看到该状态,状态漂移时会造成混淆。
- 默认使用非流式日志。始终结合、或参数。流式日志适用于终端前的人类用户,不适用于Agent。
- 切勿粘贴敏感信息。、和都会显示明文敏感信息。仅汇总显示(键名、值长度)。若用户明确要求查看值,需放在代码块中并提醒用户这是敏感信息。
- 破坏性操作前确认。、、、、
railway environment delete
、、(删除项目!)都会修改实时状态。需向用户重复操作范围(“是否重启环境中项目的服务?”)并等待明确确认,即使技术上可使用参数。
- 部署后验证。不要以成功作为结束。获取最新部署的状态和50行日志样本,让用户看到实际运行状态,而非仅构建结果。
- 每次调查针对单一故障模式。构建失败、崩溃循环、5xx错误、OOM是不同的故障类型,对应不同的日志流。不要在一份报告中混合展示不同类型的日志。
Onsager-bundled scripts (optional, repo-specific)
Onsager专属捆绑脚本(可选,仓库特定)
The
directory ships three shell wrappers tuned for the
monorepo. They are pinned to that repo's
deployment shape (service name
, env var
, production URL
https://onsager-production.up.railway.app
,
targets).
Repos other than Onsager can ignore these or fork them; the
generic operating procedure above covers every project.
When in the Onsager repo:
| Task | Command |
|---|
| Pre-deploy check | |
| Diagnose failure | sh scripts/debug.sh [service]
|
| Verify live deploy | sh scripts/smoke.sh [url]
|
- — runs before any deploy or while triaging a build
failure. Checks lockfiles (, ) are
tracked in git, Dockerfile COPY sources resolve, Railway vars don't
leak , and points at the Railway Postgres
plugin. Exits non-zero on any failure; skips Railway variable
checks if is not set.
- — one-shot diagnostics for a failed or
stuck deploy: service status, build logs (40 lines), deploy/runtime
logs (40), error-only logs (20), HTTP 4xx/5xx (10), env vars.
Default service . Requires .
- — post-deploy verification: API checks
via (, , ,
) and optional UI checks via (,
, , ). Default URL
https://onsager-production.up.railway.app
. UI checks skip
gracefully if is not on PATH.
These scripts demonstrate the wrapping pattern; another repo
adopting this skill should fork the directory and re-shape the
script bodies for its own deployment.
目录包含三个针对
单仓库优化的Shell封装脚本。这些脚本针对该仓库的部署形态(服务名称
、环境变量
、生产URL
https://onsager-production.up.railway.app
、
目标)进行了定制。
非Onsager仓库可忽略或复刻这些脚本;上述通用操作流程适用于所有项目。
在Onsager仓库中使用:
| 任务 | 命令 |
|---|
| 部署前检查 | |
| 故障诊断 | sh scripts/debug.sh [service]
|
| 部署验证 | sh scripts/smoke.sh [url]
|
- — 在部署前或排查构建失败时执行。检查锁文件(、)是否已纳入git跟踪、Dockerfile的COPY源是否有效、Railway变量是否泄露、是否指向Railway Postgres插件。若检查失败则返回非零退出码;若未设置则跳过Railway变量检查。
- — 针对失败或卡住的部署执行一次性诊断:服务状态、构建日志(40行)、部署/运行时日志(40行)、仅错误日志(20行)、HTTP 4xx/5xx日志(10行)、环境变量。默认服务为。需要。
- — 部署后验证:通过进行API检查(、、、),可选通过进行UI检查(、、、)。默认URL为
https://onsager-production.up.railway.app
。若未加入PATH,UI检查会自动跳过。
这些脚本展示了封装模式;其他仓库采用本技能时,可复刻该目录并根据自身部署形态修改脚本内容。
- The repo's spec-driven dev-process skill — when the fix is a code change, not just an ops lever; that's where the spec / branch / PR loop lives. This skill picks up at the deploy step.
- — when the operator question is really "what's still blocking the deploy?" rather than "deploy this thing".
- Railway's own AI surfaces —
railway agent -p "<question>"
runs an interactive assistant inside the CLI, and wires Railway's MCP server into Claude Code / Cursor / Codex. Useful as a fallback when this skill's scripted flow isn't enough, but they're not a substitute for the explicit JSON-first loop above — they're for exploratory questions, not for reproducible automation.
- 仓库的规范驱动开发流程技能——当修复需求仅涉及代码变更而非运维操作时使用;该技能处理规范/分支/PR流程。本技能在部署阶段接手。
- — 当运维问题实际是“部署仍有哪些阻塞项?”而非“部署此内容”时使用。
- Railway官方AI工具 —
railway agent -p "<question>"
在CLI内运行交互式助手,将Railway的MCP服务器接入Claude Code/Cursor/Codex。当本技能的脚本化流程无法满足需求时,可作为备选方案,但无法替代上述明确的优先JSON输出流程——这些工具适用于探索性问题,而非可复现的自动化操作。