railway

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

railway

Railway

One skill for the full Railway operator loop: status → debug → fix → deploy → verify. It wraps the

railway

CLI in a non-interactive, JSON-first style that an agent can drive without prompts, and it leans on

RAILWAY_TOKEN

from the environment instead of an interactive

railway login

This skill is repo-agnostic. It assumes the project is hosted on Railway (railway.com) and that a

RAILWAY_TOKEN

is exported in the environment. It makes no assumptions about the stack (Node, Python, Go, Docker, Nixpacks/Railpack — Railway's builder figures it out).

这是一款覆盖Railway完整运维流程的技能：状态查看 → 调试 → 修复 → 部署 → 验证。它以非交互式、优先JSON输出的方式封装

railway

CLI，支持Agent无需提示即可驱动操作，并且依赖环境中的

RAILWAY_TOKEN

而非交互式的

railway login

完成认证。

本技能与仓库无关。它假设项目托管在Railway（railway.com）上，且环境中已导出

RAILWAY_TOKEN

，不对技术栈做任何假设（Node、Python、Go、Docker、Nixpacks/Railpack——Railway的构建器会自动识别）。

When this skill triggers

触发场景

Phrases that should route here:

Deploy / build
- "deploy this to railway"
- "push to railway", "ship to railway", "railway up"
- "build is failing on railway", "why did my build fail"
Logs / debugging
- "show me the railway logs", "tail the logs", "railway logs --since 1h"
- "why is my service crashing on railway"
- "show me the 500s on railway", "show http logs", "show slow requests"
- "find the request id abc123 in railway logs"
Ops
- "redeploy on railway", "restart the api service", "roll back the last deploy"
- "scale my railway service", "remove the latest deployment"
State / discovery
- "list my railway projects", "what services are in this project", "list deployments"
- "what's the status of my railway project", "is my service healthy"
Variables
- "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
Run / connect
- "run this script with railway production env", "open a shell with railway env"
- "ssh into my railway service", "connect to my railway postgres / redis / mongo"
Metrics
- "what's the cpu/memory on railway", "is my service hitting limits"
- "p95 latency on railway", "request rate on /api"

Skip when:

The host is not Railway (Fly, Render, Vercel, AWS, …). This skill knows the
```
railway
```
CLI; it does not generalise.
The fix is a code change with no operational lever — let the normal dev-process skills handle the code; come back here once it's time to deploy or read logs.

以下语句应路由至本技能：

部署/构建
- "deploy this to railway"
- "push to railway", "ship to railway", "railway up"
- "build is failing on railway", "why did my build fail"
日志/调试
- "show me the railway logs", "tail the logs", "railway logs --since 1h"
- "why is my service crashing on railway"
- "show me the 500s on railway", "show http logs", "show slow requests"
- "find the request id abc123 in railway logs"
运维操作
- "redeploy on railway", "restart the api service", "roll back the last deploy"
- "scale my railway service", "remove the latest deployment"
状态/发现
- "list my railway projects", "what services are in this project", "list deployments"
- "what's the status of my railway project", "is my service healthy"
变量管理
- "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
运行/连接
- "run this script with railway production env", "open a shell with railway env"
- "ssh into my railway service", "connect to my railway postgres / redis / mongo"
指标查看
- "what's the cpu/memory on railway", "is my service hitting limits"
- "p95 latency on railway", "request rate on /api"

以下情况无需使用本技能：

应用托管在非Railway平台（Fly、Render、Vercel、AWS等）。本技能仅适配
```
railway
```
CLI，不支持通用场景。
修复需求仅涉及代码变更而无运维操作——让常规开发流程技能处理代码变更，待需要部署或查看日志时再使用本技能。

Prerequisites

前置条件

CLI on PATH.
```
which railway
```
should resolve. If not, install:
```
npm install -g @railway/cli
```
(or use the official installer at https://docs.railway.com/guides/cli). Minimum version: 4.x (this skill assumes the modern subcommand layout —
```
service list
```
,
```
deployment list
```
,
```
logs --filter
```
,
```
--json
```
on most commands).
Auth via env var.
```
echo "${RAILWAY_TOKEN:0:8}…"
```
should print a non-empty prefix. The CLI reads
```
RAILWAY_TOKEN
```
directly — do not run
```
railway login
```
in agent sessions. Two token shapes exist:
- Account / personal token (created at https://railway.com/account/tokens) — works across every workspace, project, and environment the user has access to. Required for
```
railway list
```
  ,
```
railway link --workspace
```
  , and any cross-project view.
- Project token (created in a project's Settings → Tokens, scoped to one project+environment) — works for that single project/env.
```
railway list
```
  and other workspace-level commands return
```
Unauthorized
```
  with this kind;
```
railway status
```
  ,
```
railway logs
```
  ,
```
railway up
```
  ,
```
railway variable …
```
  all work. First call that returns
```
Unauthorized
```
  /
```
Invalid RAILWAY_TOKEN
```
  is the signal to ask the user which shape they configured and whether they need to widen scope.
No interactive prompts. Always pass explicit
```
--project / --service / --environment
```
flags (and
```
--json
```
,
```
-y
```
,
```
--ci
```
where they exist) instead of relying on linked state. Linked state is a
```
.railway/
```
directory and survives across CLI calls, but in fresh agent sessions there is no link yet — set the scope every call until the user explicitly asks to link.

CLI已加入PATH。执行
```
which railway
```
应能找到命令。若未安装，可执行：
```
npm install -g @railway/cli
```
（或使用官方安装工具：https://docs.railway.com/guides/cli）。最低版本要求：4.x（本技能假设使用现代子命令布局——`service list
```
、
```
deployment list
```
、
```
logs --filter
```
、多数命令支持
```
--json`参数）。
通过环境变量认证。执行
```
echo "${RAILWAY_TOKEN:0:8}…"
```
应能输出非空前缀。CLI会直接读取
```
RAILWAY_TOKEN
```
——请勿在Agent会话中执行
```
railway login
```
。令牌分为两种类型：
- 账户/个人令牌（在https://railway.com/account/tokens创建）——适用于用户有权访问的所有工作区、项目和环境。执行`railway list
```
、
```
  railway link --workspace`等跨项目视图命令时需要此令牌。
- 项目令牌（在项目设置→令牌中创建，仅对单个项目+环境生效）——仅适用于指定的项目/环境。使用此令牌执行
```
railway list
```
  等工作区级命令会返回
```
Unauthorized
```
  ，但
```
railway status
```
  、
```
railway logs
```
  、
```
railway up
```
  、
```
railway variable …
```
  等命令均可正常使用。若首次调用返回
```
Unauthorized
```
  /
```
Invalid RAILWAY_TOKEN
```
  ，则需询问用户配置的令牌类型，以及是否需要扩大权限范围。
无交互式提示。始终传递明确的
```
--project / --service / --environment
```
参数（以及
```
--json
```
、
```
-y
```
、
```
--ci
```
等可用参数），而非依赖链接状态。链接状态存储在
```
.railway/
```
目录中并在CLI调用间保留，但在新的Agent会话中尚无链接状态——每次调用都需设置范围，直到用户明确要求建立链接。

Operating procedure

操作流程

The skill is a small state machine. Pick the smallest entry point that answers the user's question; don't run discovery they didn't ask for.

       ┌──────────┐
       │ Discover │  list projects / services / deployments / env / vars
       └────┬─────┘
            ▼
       ┌──────────┐        ┌──────────┐
       │  Debug   │◄──────►│ Metrics  │   logs (build/deploy/http) + cpu/mem/p95
       └────┬─────┘        └──────────┘
            ▼
       ┌──────────┐
       │   Fix    │  variables set / code edit / config change
       └────┬─────┘
            ▼
       ┌──────────┐        ┌──────────┐
       │  Deploy  │───────►│  Verify  │   up / redeploy / restart / down / roll back
       └──────────┘        └──────────┘

本技能是一个小型状态机。选择能响应用户问题的最小入口点，不要执行用户未请求的发现操作。

       ┌──────────┐
       │ 发现资源 │  列出项目/服务/部署/环境/变量
       └────┬─────┘
            ▼
       ┌──────────┐        ┌──────────┐
       │  调试    │◄──────►│  指标查看 │   日志（构建/部署/HTTP）+ CPU/内存/P95延迟
       └────┬─────┘        └──────────┘
            ▼
       ┌──────────┐
       │  修复    │  设置变量/编辑代码/修改配置
       └────┬─────┘
            ▼
       ┌──────────┐        ┌──────────┐
       │  部署    │───────►│  验证    │   上传部署/重新部署/重启/移除部署/回滚
       └──────────┘        └──────────┘

Step 1 — discover

步骤1 — 发现资源

Always start by capturing the IDs you need, so subsequent calls are explicit. Prefer JSON output for parsing.

bash

undefined

始终先获取所需的ID，以便后续调用明确目标。优先使用JSON输出以便解析。

bash

undefined

Workspace-wide view (requires an account token).

工作区全局视图（需要账户令牌）。

railway list --json

A single project's structure (project token works here too if you pass --project).

单个项目的结构（若传递--project参数，项目令牌也可使用）。

railway status --json --project "$PROJECT_ID"

Services and environments inside that project.

项目内的服务和环境。

railway service list --json --project "$PROJECT_ID" --environment production railway environment list --json --project "$PROJECT_ID"

Deployments for a specific service (most recent first).

指定服务的部署记录（按时间倒序，最新在前）。

railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json


Capture from the JSON: project id, environment id/name (typically `production`, `staging`, plus PR preview envs), service ids/names, the latest deployment id and its `status` (`SUCCESS`, `FAILED`, `BUILDING`, `DEPLOYING`, `CRASHED`, `REMOVED`). The deployment status is what tells you whether the symptom is a build problem, a startup problem, or a steady-state runtime problem — pick the right log stream accordingly.

railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json


从JSON输出中提取：项目ID、环境ID/名称（通常为`production`、`staging`，以及PR预览环境）、服务ID/名称、最新部署ID及其`status`（`SUCCESS`、`FAILED`、`BUILDING`、`DEPLOYING`、`CRASHED`、`REMOVED`）。部署状态可帮助判断问题类型是构建失败、启动失败还是运行时稳态问题——据此选择对应的日志流。

Step 2 — debug (logs first)

步骤2 — 调试（优先查看日志）

Railway exposes three log streams. Pick the one that matches the failure mode; mixing them makes the tail unreadable.

Stream	Flag	Use when
Deploy / runtime	`railway logs` (default)	The app is up but misbehaving, or it crashed after starting.
Build	`railway logs --build [DEPLOYMENT_ID]`	Deployment is `FAILED` and never reached runtime.
HTTP	`railway logs --http`	The symptom is a status code, a latency spike, a specific request.

Default to historical, non-streaming queries in agent sessions — streaming hangs the shell. Any of

--lines

--since

, or

--until

disables streaming.

bash

undefined

Railway提供三种日志流。选择与故障模式匹配的日志流；混合查看会导致日志难以阅读。

日志流	参数	使用场景
部署/运行时	`railway logs` （默认）	应用已启动但行为异常，或启动后崩溃。
构建	`railway logs --build [DEPLOYMENT_ID]`	部署状态为 `FAILED` 且未进入运行阶段。
HTTP	`railway logs --http`	问题表现为状态码异常、延迟飙升或特定请求异常。

在Agent会话中默认使用历史非流式查询——流式查询会导致Shell挂起。使用

--lines

、

--since

或

--until

参数均可禁用流式输出。

bash

undefined

Last 200 deploy log lines, JSON for parsing.

最近200条部署日志，JSON格式以便解析。

railway logs
--project "$PROJECT_ID" --service api --environment production
--lines 200 --json

Build logs for the failed deployment specifically.

特定失败部署的构建日志。

railway logs --build "$DEPLOYMENT_ID"
--project "$PROJECT_ID" --service api --environment production
--lines 500

Errors only in the last hour.

最近1小时内的错误日志。

railway logs --since 1h --filter "@level:error" --lines 100
--project "$PROJECT_ID" --service api --environment production

All 5xx HTTP responses in the last 30 minutes.

最近30分钟内的所有5xx HTTP响应日志。

railway logs --http --since 30m --status ">=500" --lines 200
--project "$PROJECT_ID" --service api --environment production --json

Slow GETs on a specific path.

指定路径上的慢GET请求日志。

railway logs --http --method GET --path /api/users
--filter "@totalDuration:>=1000" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production

Trace one request end-to-end.

追踪单个请求的完整链路日志。

railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production


Filter syntax (Railway query language, also accepted in the dashboard):

- **Text**: bare words → substring match; `"two words"` → phrase.
- **Level** (deploy/build only): `@level:error`, `@level:warn`, `@level:info`.
- **HTTP fields**: `@httpStatus`, `@method`, `@path`, `@host`, `@requestId`, `@clientUa`, `@srcIp`, `@edgeRegion`, `@upstreamAddress`, `@upstreamProto`, `@downstreamProto`, `@responseDetails`, `@deploymentId`, `@deploymentInstanceId`, `@totalDuration`, `@responseTime`, `@upstreamRqDuration`, `@txBytes`, `@rxBytes`, `@upstreamErrors`.
- **Operators**: `> >= < <= ..` (range, e.g. `@httpStatus:200..299`); `AND`, `OR`, `-` (negation), parentheses.

If the user asks for "logs from the latest deployment even if it failed", add `--latest` — otherwise `railway logs` walks back to the most recent **successful** deployment, which is usually not what you want when debugging a regression.

railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production


过滤语法（Railway查询语言，同样适用于控制台）：

- **文本**：单个单词 → 子串匹配；`"two words"` → 短语匹配。
- **日志级别**（仅部署/构建日志）：`@level:error`、`@level:warn`、`@level:info`。
- **HTTP字段**：`@httpStatus`、`@method`、`@path`、`@host`、`@requestId`、`@clientUa`、`@srcIp`、`@edgeRegion`、`@upstreamAddress`、`@upstreamProto`、`@downstreamProto`、`@responseDetails`、`@deploymentId`、`@deploymentInstanceId`、`@totalDuration`、`@responseTime`、`@upstreamRqDuration`、`@txBytes`、`@rxBytes`、`@upstreamErrors`。
- **操作符**：`> >= < <= ..`（范围，例如`@httpStatus:200..299`）；`AND`、`OR`、`-`（否定）、括号。

如果用户要求“查看最新部署的日志，即使部署失败”，需添加`--latest`参数——否则`railway logs`会默认返回最近一次**成功**部署的日志，这通常不是调试回归问题时需要的内容。

Step 3 — metrics (sanity check resource state)

步骤3 — 指标查看（验证资源状态）

Pair logs with metrics when the symptom could be a resource ceiling (OOM kills, CPU throttling, egress bursts, volume full).

bash

undefined

当问题可能由资源上限导致时（OOM终止、CPU节流、出口流量突增、存储卷已满），需结合日志和指标进行分析。

bash

undefined

Compact summary for the linked service, last hour.

关联服务的紧凑摘要，最近1小时数据。

railway metrics --json
--project "$PROJECT_ID" --service api --environment production

Specific dimensions.

指定维度的指标数据。

railway metrics --cpu --memory --since 6h --json
--project "$PROJECT_ID" --service api --environment production

HTTP percentiles + RPS for a path.

指定路径的HTTP百分位数+请求速率。

railway metrics --http --path /api/users --json --since 1h
--project "$PROJECT_ID" --service api --environment production

Table across every service in the project.

项目内所有服务的指标表格。

railway metrics --all --json
--project "$PROJECT_ID" --environment production


Read these together with the deploy log: a memory line climbing into the service's limit followed by a sudden gap is an OOM; sustained CPU at 100% with growing p95 is a throttle. Don't editorialise beyond what the numbers show.

railway metrics --all --json
--project "$PROJECT_ID" --environment production


结合部署日志分析指标：内存占用攀升至服务上限后突然出现断层，说明发生了OOM；CPU持续100%且P95延迟不断增加，说明存在CPU节流。仅基于数据输出结论，不要添加主观推断。

Step 4 — variables (read-only first, write only on confirmation)

步骤4 — 变量管理（优先只读操作，修改需确认）

Variables are usually where misconfiguration hides. Listing variables prints secret values — treat the output as confidential, never echo raw values back into the chat, and use

--json

so you can summarise (key names + value lengths) instead of pasting plaintext secrets.

bash

undefined

配置错误通常隐藏在变量中。列出变量会显示敏感值——需将输出视为机密信息，切勿直接在聊天中回显原始值，使用

--json

参数以便汇总（仅显示键名+值长度）而非粘贴明文敏感信息。

bash

undefined

Read — JSON includes raw values; redact before surfacing.

读取变量——JSON输出包含原始值；展示前需脱敏。

railway variable list --json
--project "$PROJECT_ID" --service api --environment production

Write — explicit confirmation required before running.

设置变量——执行前需明确确认。

railway variable set "FEATURE_FLAG=on"
--project "$PROJECT_ID" --service api --environment production

Setting a variable triggers a redeploy by default; add --skip-deploys

默认情况下，设置变量会触发重新部署；若需设置后不重新部署，需添加--skip-deploys参数（位于子命令之前的顶级参数）。

(top-level, before the subcommand) to set without redeploying.

删除变量。

Delete.

—

railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production


Default to listing first ("here are the keys configured on production; which one do you want to change?") and only run `set` / `delete` after the user picks a target. For new secrets, prefer reading from stdin so the plaintext never enters the agent's argv buffer (visible in `ps`): pipe the value into `railway variable set --stdin KEY` (a top-level option on the legacy `variable` form; the modern flow is `railway variable set "KEY=$(< file)"` from a local file the user controls).

railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production


默认先执行列出操作（“以下是生产环境配置的变量键名；您需要修改哪一个？”），仅在用户指定目标后再执行`set`/`delete`操作。对于新的敏感值，优先从标准输入读取，避免明文进入Agent的argv缓冲区（可通过`ps`命令查看）：将值通过管道传入`railway variable set --stdin KEY`（旧版`variable`命令的顶级选项；新版流程是从用户控制的本地文件读取：`railway variable set "KEY=$(< file)"`）。

Step 5 — fix and deploy

步骤5 — 修复与部署

Three deploy verbs, in increasing order of intent:

Verb	Effect	Use when
`railway restart`	Restart the latest deployment without rebuilding.	Process is wedged but the build artefact is fine.
`railway redeploy`	Re-run the latest deployment (or `--from-source` to pull the newest commit / image).	A transient failure or you want to redeploy the same artefact. Use `--from-source` to pick up new commits without uploading.
`railway up`	Upload the current working directory and deploy it.	The fix is a code change in this repo.

Non-interactive defaults:

bash

undefined

部署操作分为三种，按操作意图强度递增排序：

命令	效果	使用场景
`railway restart`	重启最新部署，不重新构建。	进程僵死但构建产物正常。
`railway redeploy`	重新执行最新部署（或使用 `--from-source` 拉取最新提交/镜像）。	临时故障，或需重新部署同一构建产物。使用 `--from-source` 参数可在不上传本地代码的情况下拉取最新提交。
`railway up`	上传当前工作目录并部署。	修复内容为当前仓库的代码变更。

非交互式默认配置：

bash

undefined

Restart (no rebuild). -y skips the confirmation dialog.

重启（不重新构建）。-y参数跳过确认对话框。

railway restart -y --json
--project "$PROJECT_ID" --service api --environment production

Redeploy the latest deployment.

重新部署最新部署记录。

railway redeploy -y --json
--project "$PROJECT_ID" --service api --environment production

Redeploy and pull the newest commit / image from the configured source.

重新部署并从配置源拉取最新提交/镜像。

railway redeploy -y --from-source --json
--project "$PROJECT_ID" --service api --environment production

Upload and deploy this directory. --ci streams build logs only, then exits;

上传当前目录并部署。--ci参数仅流式输出构建日志，然后退出；非常适合Agent会话（无交互式日志附加）。

perfect for agent sessions (no interactive log attach).

—

railway up --ci
--project "$PROJECT_ID" --service api --environment production
--message "fix: bump httpx to 0.27 to pick up TLS bug fix"

Remove the most recent deployment (rollback to whatever was before it).

删除最近一次部署（回滚至上一次部署状态）。

railway down -y
--project "$PROJECT_ID" --service api --environment production


`railway up --ci` is the agent-friendly form: it implies `CI=true`, streams build logs to stdout, and exits with non-zero on build failure. Without `--ci` the CLI tries to attach a live log pager; in an automation context that hangs.

After deploy, **always verify** by sampling the new deployment's logs and a tiny metrics window — don't just trust the exit code. The Railway build can succeed and the runtime can still crashloop on startup.

```bash

railway down -y
--project "$PROJECT_ID" --service api --environment production


`railway up --ci`是Agent友好的形式：它隐含`CI=true`，将构建日志流式输出到标准输出，构建失败时返回非零退出码。若不使用`--ci`参数，CLI会尝试附加实时日志分页器，在自动化环境中会导致挂起。

部署完成后，**务必进行验证**——采样新部署的日志和一小段时间的指标数据，不要仅依赖退出码。Railway构建可能成功，但运行时仍可能在启动时崩溃循环。

```bash

Quick verification loop.

快速验证流程。

railway deployment list --json --limit 3
--project "$PROJECT_ID" --service api --environment production railway logs --lines 50 --since 2m
--project "$PROJECT_ID" --service api --environment production

undefined

undefined

Step 6 — run, shell, ssh, db connect

步骤6 — 运行、Shell、SSH、数据库连接

For development workflows that need production env vars locally, or a shell on the live container:

bash

undefined

适用于需要在本地使用生产环境变量的开发工作流，或需要进入实时容器Shell的场景：

bash

undefined

Run a one-shot command with the linked service's variables injected.

注入关联服务的变量并执行一次性命令。

railway run --service api --environment production -- node scripts/migrate.js

Open a subshell with the same env (interactive — only run when the user is at the terminal).

打开包含相同环境变量的子Shell（交互式——仅当用户在终端前时执行）。

railway shell --service api --environment production --silent

SSH into the running container of a service. -i picks an identity file if Railway

SSH进入服务的运行容器。若Railway无法在~/.ssh中找到可用密钥，使用-i参数指定身份文件。

can't find a usable key in ~/.ssh.

—

railway ssh
--project "$PROJECT_ID" --service api --environment production

One-shot remote command (non-interactive).

执行一次性远程命令（非交互式）。

railway ssh
--project "$PROJECT_ID" --service api --environment production
-- ls /app

Open a database shell against a Railway-managed DB service.

打开Railway托管数据库服务的Shell。

railway connect postgres
--project "$PROJECT_ID" --environment production


`railway run env` and `railway run printenv` will print every secret variable for that service — treat the output as you would `railway variable list --json` and never paste it back.

railway connect postgres
--project "$PROJECT_ID" --environment production


`railway run env`和`railway run printenv`会输出该服务的所有敏感变量——需像对待`railway variable list --json`一样处理输出，切勿直接粘贴到聊天中。

Common failure shapes

常见故障场景

Unauthorized. Please check that your RAILWAY_TOKEN is valid

Unauthorized. Please check that your RAILWAY_TOKEN is valid

Either no token, an expired one, or a project-scoped token being used against a workspace-level command (

railway list

railway link --workspace

). Ask the user which token shape they configured; if they need workspace-level commands, they need an account token from https://railway.com/account/tokens.

可能是未设置令牌、令牌已过期，或使用项目范围令牌执行工作区级命令（

railway list

、

railway link --workspace

）。询问用户配置的令牌类型；若需要执行工作区级命令，需使用从https://railway.com/account/tokens获取的账户令牌。

Build

FAILED

, deploy log empty

构建状态

FAILED

，部署日志为空

The failure is in

--build

logs, not the default deploy stream:

bash

railway logs --build "$DEPLOYMENT_ID" --lines 500 \
  --project "$PROJECT_ID" --service api --environment production

If the deployment id is unknown,

railway deployment list --json --limit 5

gives you the most recent failed one.

故障信息在

--build

日志流中，而非默认的部署日志流：

bash

railway logs --build "$DEPLOYMENT_ID" --lines 500 \
  --project "$PROJECT_ID" --service api --environment production

若未知部署ID，执行

railway deployment list --json --limit 5

可获取最近的失败部署记录。

CRASHED

deployment, deploy logs end with the start command

部署状态

CRASHED

，部署日志以启动命令结尾

App is dying during startup. Read the tail of

railway logs --lines 200

for the actual exception. Common shapes:

Missing env var (something like

KeyError: 'DATABASE_URL'

panic: required environment variable …

) →

railway variable list --json

to confirm, then

railway variable set …

Port binding wrong — Railway sets
```
$PORT
```
; the service must bind to
```
0.0.0.0:$PORT
```
, not a hardcoded port.
DB connection refused — check the linked DB service is in the same environment and the private network DNS (e.g.
```
postgres.railway.internal
```
) is what the app expects.

应用在启动过程中终止。查看

railway logs --lines 200

的末尾内容获取实际异常信息。常见场景：

缺少环境变量（例如

KeyError: 'DATABASE_URL'

或

panic: required environment variable …

）→ 执行

railway variable list --json

确认，然后执行

railway variable set …

设置变量。

端口绑定错误——Railway会设置
```
$PORT
```
；服务必须绑定到
```
0.0.0.0:$PORT
```
，而非硬编码端口。
数据库连接被拒绝——检查关联的数据库服务是否在同一环境中，且应用使用的私有网络DNS（例如
```
postgres.railway.internal
```
）正确。

Build succeeds, runtime 502 / Bad Gateway from the edge

构建成功，运行时边缘返回502 / Bad Gateway

The app didn't bind to

$PORT

in time (default healthcheck window). Either the app is slow to start (raise

healthcheckTimeout

railway.json

railway.toml

, or fix the slow startup), or it's binding to

127.0.0.1

instead of

0.0.0.0

. Cross-check with

railway logs --http --status 502 --lines 50

to confirm the edge is the source.

应用未在健康检查窗口内绑定到

$PORT

。可能是应用启动缓慢（在

railway.json

railway.toml

中提高

healthcheckTimeout

，或修复启动缓慢问题），或绑定到

127.0.0.1

而非

0.0.0.0

。执行

railway logs --http --status 502 --lines 50

确认是否由边缘节点导致。

Sudden OOM (

SIGKILL

out of memory

)

突发OOM（

SIGKILL

out of memory

）

Pair the deploy log with

railway metrics --memory --since 30m --json

. If memory climbs into the service limit and the gap aligns with the kill, raise the service's memory cap (dashboard or

railway.json

resources.memory

). Don't silently raise it without telling the user — call out that you saw the ceiling hit.

结合部署日志和

railway metrics --memory --since 30m --json

分析。若内存占用攀升至服务上限且断层时间与终止时间一致，需提高服务的内存上限（通过控制台或

railway.json

中的

resources.memory

配置）。请勿在未告知用户的情况下擅自修改——需明确告知用户已触发内存上限。

railway up

hangs in an agent session

railway up

在Agent会话中挂起

You forgot

--ci

. The default mode attaches a live pager that doesn't exit. Kill it, re-run with

--ci

忘记添加

--ci

参数。默认模式会附加实时分页器，不会自动退出。终止进程后，重新添加

--ci

参数执行。

Variable changes "didn't take effect"

变量变更“未生效”

railway variable set

triggers a redeploy by default — but if

--skip-deploys

was passed, the variable is staged and the running deployment still has the old value. Either redeploy explicitly (

railway redeploy -y

) or rerun the set without

--skip-deploys

railway variable set

默认会触发重新部署——但若使用了

--skip-deploys

参数，变量会被暂存，运行中的部署仍使用旧值。需显式重新部署（

railway redeploy -y

），或不使用

--skip-deploys

参数重新执行变量设置。

Conventions

约定规范

JSON-first. Add
```
--json
```
to every command that supports it, and parse with
```
jq
```
rather than scraping human-readable output. Layouts change; the JSON keys are stable.
Explicit scope every call. Pass
```
--project
```
,
```
--service
```
,
```
--environment
```
on every command in an agent session. Don't rely on
```
.railway/
```
linked state — it's invisible to the user and confusing when it drifts.
Non-streaming logs by default. Always combine with
```
--lines
```
,
```
--since
```
, or
```
--until
```
. Streaming is for humans at a terminal, not agents.
Never paste secrets.
```
railway variable list
```
,
```
railway run env
```
, and
```
railway shell
```
all surface plaintext secrets. Summarise (key names, value lengths) instead. If the user explicitly asks for a value, paste it in a code block and remind them it's a secret.
Confirm before destructive ops.
```
railway down
```
,
```
railway restart
```
,
```
railway redeploy
```
,
```
railway variable delete
```
,
```
railway environment delete
```
,
```
railway volume delete
```
,
```
railway delete
```
(the project!) all change live state. Repeat the scope back to the user ("restart
```
api
```
in
```
production
```
of project
```
…
```
?") and wait for explicit confirmation, even if
```
-y
```
is technically available.
Verify after deploy. Don't end on a
```
railway up --ci
```
success line. Pull the latest deployment's status and a 50-line log sample so the user sees the actual runtime state, not just the build outcome.
One failure mode per investigation. Build vs. crashloop vs. 5xx vs. OOM are distinct shapes with distinct log streams. Don't blend their tails in one report.

优先JSON输出。为所有支持
```
--json
```
参数的命令添加该参数，使用
```
jq
```
解析而非提取人类可读输出。输出格式可能变化，但JSON键名稳定。
每次调用明确范围。在Agent会话中执行每个命令时都传递
```
--project
```
、
```
--service
```
、
```
--environment
```
参数。不要依赖
```
.railway/
```
链接状态——用户无法看到该状态，状态漂移时会造成混淆。
默认使用非流式日志。始终结合
```
--lines
```
、
```
--since
```
或
```
--until
```
参数。流式日志适用于终端前的人类用户，不适用于Agent。
切勿粘贴敏感信息。
```
railway variable list
```
、
```
railway run env
```
和
```
railway shell
```
都会显示明文敏感信息。仅汇总显示（键名、值长度）。若用户明确要求查看值，需放在代码块中并提醒用户这是敏感信息。
破坏性操作前确认。
```
railway down
```
、
```
railway restart
```
、
```
railway redeploy
```
、
```
railway variable delete
```
、
```
railway environment delete
```
、
```
railway volume delete
```
、
```
railway delete
```
（删除项目！）都会修改实时状态。需向用户重复操作范围（“是否重启
```
production
```
环境中项目
```
…
```
的
```
api
```
服务？”）并等待明确确认，即使技术上可使用
```
-y
```
参数。
部署后验证。不要以
```
railway up --ci
```
成功作为结束。获取最新部署的状态和50行日志样本，让用户看到实际运行状态，而非仅构建结果。
每次调查针对单一故障模式。构建失败、崩溃循环、5xx错误、OOM是不同的故障类型，对应不同的日志流。不要在一份报告中混合展示不同类型的日志。

Onsager-bundled scripts (optional, repo-specific)

Onsager专属捆绑脚本（可选，仓库特定）

The

scripts/

directory ships three shell wrappers tuned for the

onsager-ai/onsager

monorepo. They are pinned to that repo's deployment shape (service name

onsager

, env var

ONSAGER_RAILWAY_TOKEN

, production URL

https://onsager-production.up.railway.app

justfile

targets). Repos other than Onsager can ignore these or fork them; the generic operating procedure above covers every project.

When in the Onsager repo:

Task	Command
Pre-deploy check	`sh scripts/preflight.sh`
Diagnose failure	`sh scripts/debug.sh [service]`
Verify live deploy	`sh scripts/smoke.sh [url]`

preflight.sh
— runs before any deploy or while triaging a build failure. Checks lockfiles (
```
Cargo.lock
```
,
```
pnpm-lock.yaml
```
) are tracked in git, Dockerfile COPY sources resolve, Railway vars don't leak
```
localhost
```
, and
```
DATABASE_URL
```
points at the Railway Postgres plugin. Exits non-zero on any failure; skips Railway variable checks if
```
ONSAGER_RAILWAY_TOKEN
```
is not set.
debug.sh [service]
— one-shot diagnostics for a failed or stuck deploy: service status, build logs (40 lines), deploy/runtime logs (40), error-only logs (20), HTTP 4xx/5xx (10), env vars. Default service
```
onsager
```
. Requires
```
ONSAGER_RAILWAY_TOKEN
```
.

smoke.sh [base_url]
— post-deploy verification: API checks via

curl

(

/api/health

/api/auth/me

/api/nodes

/api/sessions

) and optional UI checks via

agent-browser

(

/sessions

/nodes

/settings

). Default URL

https://onsager-production.up.railway.app

. UI checks skip gracefully if

agent-browser

is not on PATH.

These scripts demonstrate the wrapping pattern; another repo adopting this skill should fork the directory and re-shape the script bodies for its own deployment.

scripts/

目录包含三个针对

onsager-ai/onsager

单仓库优化的Shell封装脚本。这些脚本针对该仓库的部署形态（服务名称

onsager

、环境变量

ONSAGER_RAILWAY_TOKEN

、生产URL

https://onsager-production.up.railway.app

、

justfile

目标）进行了定制。非Onsager仓库可忽略或复刻这些脚本；上述通用操作流程适用于所有项目。

在Onsager仓库中使用：

任务	命令
部署前检查	`sh scripts/preflight.sh`
故障诊断	`sh scripts/debug.sh [service]`
部署验证	`sh scripts/smoke.sh [url]`

preflight.sh
— 在部署前或排查构建失败时执行。检查锁文件（
```
Cargo.lock
```
、
```
pnpm-lock.yaml
```
）是否已纳入git跟踪、Dockerfile的COPY源是否有效、Railway变量是否泄露
```
localhost
```
、
```
DATABASE_URL
```
是否指向Railway Postgres插件。若检查失败则返回非零退出码；若未设置
```
ONSAGER_RAILWAY_TOKEN
```
则跳过Railway变量检查。
debug.sh [service]
— 针对失败或卡住的部署执行一次性诊断：服务状态、构建日志（40行）、部署/运行时日志（40行）、仅错误日志（20行）、HTTP 4xx/5xx日志（10行）、环境变量。默认服务为
```
onsager
```
。需要
```
ONSAGER_RAILWAY_TOKEN
```
。
smoke.sh [base_url]
— 部署后验证：通过
```
curl
```
进行API检查（
```
/api/health
```
、
```
/api/auth/me
```
、
```
/api/nodes
```
、
```
/api/sessions
```
），可选通过
```
agent-browser
```
进行UI检查（
```
/
```
、
```
/sessions
```
、
```
/nodes
```
、
```
/settings
```
）。默认URL为
```
https://onsager-production.up.railway.app
```
。若
```
agent-browser
```
未加入PATH，UI检查会自动跳过。

这些脚本展示了封装模式；其他仓库采用本技能时，可复刻该目录并根据自身部署形态修改脚本内容。