truefoundry-ssh-server

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>
路由提示:用户意图不明确时,使用references/intent-clarification.md中的通用澄清模板。
<objective>

SSH Server

SSH Server

Launch an SSH server on TrueFoundry for remote development. Write a YAML manifest and apply with
tfy apply
. REST API fallback when CLI unavailable. Connect with VS Code Remote-SSH or any SSH client, with full GPU access and persistent storage.
在TrueFoundry上启动SSH服务器用于远程开发。编写YAML清单并使用
tfy apply
执行部署,CLI不可用时可回退使用REST API。可通过VS Code Remote-SSH或任意SSH客户端连接,支持完整GPU访问和持久化存储。

When to Use

适用场景

  • User asks "launch ssh server", "start ssh server", "remote dev environment"
  • User wants to connect VS Code remotely to cloud GPUs
  • User needs SSH access for development/debugging
  • User asks about remote development environments
  • 用户询问「启动ssh服务器」、「创建远程开发环境」相关问题
  • 用户需要通过VS Code远程连接云GPU资源
  • 用户需要SSH权限进行开发/调试
  • 用户询问远程开发环境相关方案

When NOT to Use

不适用场景

  • User wants Jupyter notebooks → prefer
    notebooks
    skill; ask if the user wants another valid path
  • User wants to deploy a service → prefer
    deploy
    skill; ask if the user wants another valid path
  • User wants to deploy a model → prefer
    llm-deploy
    skill; ask if the user wants another valid path
</objective> <context>
  • 用户需要Jupyter notebooks → 优先使用
    notebooks
    skill,询问用户是否需要切换到对应路径
  • 用户需要部署服务 → 优先使用
    deploy
    skill,询问用户是否需要切换到对应路径
  • 用户需要部署模型 → 优先使用
    llm-deploy
    skill,询问用户是否需要切换到对应路径
</objective> <context>

Prerequisites

前置条件

Always verify before launching an SSH server:
  1. Credentials
    TFY_BASE_URL
    and
    TFY_API_KEY
    must be set (env or
    .env
    )
  2. Workspace
    TFY_WORKSPACE_FQN
    required. Never auto-pick. Ask the user if missing.
  3. CLI — Check
    tfy --version
    . Install if missing:
    pip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
For credential check commands and .env setup, see
references/prerequisites.md
.
启动SSH服务器前必须确认以下条件:
  1. 凭证 — 必须配置
    TFY_BASE_URL
    TFY_API_KEY
    (环境变量或
    .env
    文件)
  2. 工作空间 — 必须提供
    TFY_WORKSPACE_FQN
    禁止自动选择,缺失时必须询问用户
  3. CLI — 检查
    tfy --version
    ,缺失时执行安装:
    pip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
凭证检查命令和.env配置方法可参考
references/prerequisites.md

CLI Detection

CLI检测

bash
tfy --version
CLI OutputStatusAction
tfy version X.Y.Z
(>= 0.5.0)
CurrentUse
tfy apply
as documented below.
tfy version X.Y.Z
(0.3.x-0.4.x)
OutdatedUpgrade: install a pinned version (e.g.
pip install 'truefoundry==0.5.0'
). Core
tfy apply
should still work.
Command not foundNot installedInstall:
pip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
CLI unavailable (no pip/Python)FallbackUse REST API via
tfy-api.sh
. See
references/cli-fallback.md
.
bash
tfy --version
CLI输出状态操作
tfy version X.Y.Z
(>= 0.5.0)
最新版本按照下文文档使用
tfy apply
即可
tfy version X.Y.Z
(0.3.x-0.4.x)
版本过旧升级到指定版本:例如执行
pip install 'truefoundry==0.5.0'
,核心
tfy apply
功能仍可正常使用
找不到命令未安装执行安装:
pip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
CLI不可用(无pip/Python环境)回退方案通过
tfy-api.sh
使用REST API,参考
references/cli-fallback.md

Launch SSH Server via UI

通过UI启动SSH服务器

  1. Go to Deployments → New Deployment → SSH Server
  2. Add your SSH public key
  3. Select workspace and configure resources
  4. Click Deploy
</context> <instructions>
  1. 进入 部署 → 新建部署 → SSH Server
  2. 上传你的SSH公钥
  3. 选择工作空间并配置资源
  4. 点击部署
</context> <instructions>

Launch SSH Server via
tfy apply
(CLI — Recommended)

通过
tfy apply
启动SSH服务器(CLI — 推荐方案)

Configuration Questions

配置确认问题

Before generating the manifest, ask the user:
  1. Name — What to call the SSH server
  2. GPU needed? — CPU server (default) or GPU server (for ML development). If GPU, use the CUDA image variant.
  3. Home directory size — Persistent storage in GB (default: 20)
  4. Image variant — CPU (
    ssh-server:0.4.5-py3.12.12
    ) or CUDA (
    ssh-server:0.4.5-cu129-py3.12.12
    )
生成部署清单前,需要询问用户以下信息:
  1. 名称 — SSH服务器的命名
  2. 是否需要GPU? — CPU服务器(默认)或GPU服务器(用于ML开发),选择GPU时需要使用CUDA镜像版本
  3. 家目录大小 — 持久化存储容量(单位GB,默认20GB)
  4. 镜像版本 — CPU版(
    ssh-server:0.4.5-py3.12.12
    )或CUDA版(
    ssh-server:0.4.5-cu129-py3.12.12
    )

CPU SSH Server

CPU版SSH服务器

1. Generate the manifest:
yaml
undefined
1. 生成部署清单:
yaml
undefined

tfy-manifest.yaml — SSH Server (CPU)

tfy-manifest.yaml — SSH Server (CPU)

name: my-ssh-server type: ssh-server image: image_uri: public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12 home_directory_size: 20 resources: node: type: node_selector capacity_type: on_demand cpu_request: 1 cpu_limit: 3 memory_request: 4000 memory_limit: 6000 ephemeral_storage_request: 5000 ephemeral_storage_limit: 10000 workspace_fqn: "YOUR_WORKSPACE_FQN"

**2. Preview:**

```bash
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
3. Apply:
bash
tfy apply -f tfy-manifest.yaml
name: my-ssh-server type: ssh-server image: image_uri: public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12 home_directory_size: 20 resources: node: type: node_selector capacity_type: on_demand cpu_request: 1 cpu_limit: 3 memory_request: 4000 memory_limit: 6000 ephemeral_storage_request: 5000 ephemeral_storage_limit: 10000 workspace_fqn: "YOUR_WORKSPACE_FQN"

**2. 预览变更:**

```bash
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
3. 执行部署:
bash
tfy apply -f tfy-manifest.yaml

GPU SSH Server

GPU版SSH服务器

yaml
undefined
yaml
undefined

tfy-manifest.yaml — GPU SSH Server (CUDA)

tfy-manifest.yaml — GPU SSH Server (CUDA)

name: gpu-dev-server type: ssh-server image: image_uri: public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-cu129-py3.12.12 home_directory_size: 20 resources: node: type: node_selector capacity_type: on_demand cpu_request: 4 cpu_limit: 8 memory_request: 16000 memory_limit: 32000 ephemeral_storage_request: 10000 ephemeral_storage_limit: 20000 devices: - type: nvidia_gpu name: A10_24GB count: 1 workspace_fqn: "YOUR_WORKSPACE_FQN"
undefined
name: gpu-dev-server type: ssh-server image: image_uri: public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-cu129-py3.12.12 home_directory_size: 20 resources: node: type: node_selector capacity_type: on_demand cpu_request: 4 cpu_limit: 8 memory_request: 16000 memory_limit: 32000 ephemeral_storage_request: 10000 ephemeral_storage_limit: 20000 devices: - type: nvidia_gpu name: A10_24GB count: 1 workspace_fqn: "YOUR_WORKSPACE_FQN"
undefined

Launch SSH Server via REST API (Fallback)

通过REST API启动SSH服务器(回退方案)

When CLI is not available, use
tfy-api.sh
. Set
TFY_API_SH
to the full path of this skill's
scripts/tfy-api.sh
. See
references/tfy-api-setup.md
for paths per agent.
CLI不可用时使用
tfy-api.sh
,将
TFY_API_SH
设置为当前skill下
scripts/tfy-api.sh
的完整路径,各agent的路径参考
references/tfy-api-setup.md

Create SSH Server

创建CPU版SSH服务器

bash
TFY_API_SH=~/.claude/skills/truefoundry-ssh-server/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps -d '{
  "name": "my-ssh-server",
  "type": "ssh-server",
  "image": {
    "image_uri": "public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12"
  },
  "home_directory_size": 20,
  "resources": {
    "node": {"type": "node_selector", "capacity_type": "on_demand"},
    "cpu_request": 1,
    "cpu_limit": 3,
    "memory_request": 4000,
    "memory_limit": 6000,
    "ephemeral_storage_request": 5000,
    "ephemeral_storage_limit": 10000
  },
  "workspace_fqn": "WORKSPACE_FQN"
}'
bash
TFY_API_SH=~/.claude/skills/truefoundry-ssh-server/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps -d '{
  "name": "my-ssh-server",
  "type": "ssh-server",
  "image": {
    "image_uri": "public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12"
  },
  "home_directory_size": 20,
  "resources": {
    "node": {"type": "node_selector", "capacity_type": "on_demand"},
    "cpu_request": 1,
    "cpu_limit": 3,
    "memory_request": 4000,
    "memory_limit": 6000,
    "ephemeral_storage_request": 5000,
    "ephemeral_storage_limit": 10000
  },
  "workspace_fqn": "WORKSPACE_FQN"
}'

GPU SSH Server (REST API)

创建GPU版SSH服务器(REST API)

bash
$TFY_API_SH PUT /api/svc/v1/apps -d '{
  "name": "gpu-dev-server",
  "type": "ssh-server",
  "image": {
    "image_uri": "public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-cu129-py3.12.12"
  },
  "home_directory_size": 20,
  "resources": {
    "node": {"type": "node_selector", "capacity_type": "on_demand"},
    "cpu_request": 4,
    "cpu_limit": 8,
    "memory_request": 16000,
    "memory_limit": 32000,
    "ephemeral_storage_request": 10000,
    "ephemeral_storage_limit": 20000,
    "devices": [
      {"type": "nvidia_gpu", "name": "A10_24GB", "count": 1}
    ]
  },
  "workspace_fqn": "WORKSPACE_FQN"
}'
bash
$TFY_API_SH PUT /api/svc/v1/apps -d '{
  "name": "gpu-dev-server",
  "type": "ssh-server",
  "image": {
    "image_uri": "public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-cu129-py3.12.12"
  },
  "home_directory_size": 20,
  "resources": {
    "node": {"type": "node_selector", "capacity_type": "on_demand"},
    "cpu_request": 4,
    "cpu_limit": 8,
    "memory_request": 16000,
    "memory_limit": 32000,
    "ephemeral_storage_request": 10000,
    "ephemeral_storage_limit": 20000,
    "devices": [
      {"type": "nvidia_gpu", "name": "A10_24GB", "count": 1}
    ]
  },
  "workspace_fqn": "WORKSPACE_FQN"
}'

SSH Key Setup

SSH密钥配置

Prerequisites

前置条件

You need an SSH key pair. Check for existing keys:
bash
undefined
你需要一个SSH密钥对,检查是否已有现成密钥:
bash
undefined

macOS/Linux

macOS/Linux

cat ~/.ssh/id_rsa.pub
cat ~/.ssh/id_rsa.pub

Windows PowerShell

Windows PowerShell

type $home.ssh\id_rsa.pub
undefined
type $home.ssh\id_rsa.pub
undefined

Generate a New Key (if needed)

生成新密钥(如果需要)

bash
ssh-keygen -t rsa
bash
ssh-keygen -t rsa

Add Key to SSH Server

向SSH服务器添加密钥

Add your public key during deployment configuration (preferred). Avoid manual key-file edits unless strictly necessary.
Security: SSH public keys authorize remote access to this container only. Never add keys you don't recognize. The user must confirm their public key before it is added. Each key grants full shell access to the deployed dev environment — not to the host machine or any other system.
Use the dashboard key-management flow so keys are audited and less likely to be misconfigured.
优先在部署配置阶段添加你的公钥,除非必要不要手动编辑密钥文件。
安全提示: SSH公钥仅授权访问当前容器,不要添加你不信任的密钥。添加前必须确认用户的公钥有效性,每个密钥都拥有部署开发环境的完整shell权限,但无权访问宿主机或其他系统。
建议使用控制台的密钥管理流程,方便审计且降低配置错误概率。

Multi-User Access

多用户访问

For multi-user access, add each teammate key via the platform UI and review ownership before enabling access.
如果需要多用户访问,通过平台UI添加每个团队成员的密钥,启用访问前确认密钥归属。

VS Code Remote-SSH Setup

VS Code Remote-SSH配置

  1. Install Remote-SSH extension in VS Code
  2. Open Command Palette → "Remote-SSH: Connect to Host"
  3. Enter the SSH connection string from TrueFoundry dashboard
  4. Authenticate with your SSH key
  1. 在VS Code中安装Remote-SSH扩展
  2. 打开命令面板 → 选择「Remote-SSH: Connect to Host」
  3. 输入TrueFoundry控制台提供的SSH连接字符串
  4. 使用SSH密钥完成认证

ProxyTunnel Installation

ProxyTunnel安装

Required for SSH tunneling through TrueFoundry:
Security: Privileged Operations — The
sudo
commands below install a networking tool on the user's local machine to enable SSH tunneling. The user should confirm they want to install this package before proceeding.
PlatformCommand
macOS
brew install proxytunnel
UbuntuInstall
proxy-tunnel
with your distro package manager (run manually by a trusted admin)
AlternativeUse
nc
(netcat) for proxy without proxytunnel
通过TrueFoundry进行SSH隧道需要安装该工具:
安全提示:特权操作 — 下方的
sudo
命令会在用户本地机器安装网络工具来实现SSH隧道,执行前必须获得用户确认。
平台命令
macOS
brew install proxytunnel
Ubuntu使用发行版包管理器安装
proxy-tunnel
(由可信管理员手动执行)
替代方案使用
nc
(netcat)实现代理,无需安装proxytunnel

File Transfer

文件传输

SCP (Secure Copy)

SCP(安全拷贝)

bash
undefined
bash
undefined

Download from server

从服务器下载

scp -r <deploymentName>:<remote-path> <local-path>
scp -r <deploymentName>:<remote-path> <local-path>

Upload to server

上传到服务器

scp -r <local-path> <deploymentName>:<remote-path>
undefined
scp -r <local-path> <deploymentName>:<remote-path>
undefined

rsync (Incremental Sync)

rsync(增量同步)

bash
undefined
bash
undefined

Upload

上传

rsync -avz <local-path> <deploymentName>:<remote-path>
rsync -avz <local-path> <deploymentName>:<remote-path>

Download

下载

rsync -avz <deploymentName>:<remote-path> <local-path>
undefined
rsync -avz <deploymentName>:<remote-path> <local-path>
undefined

Scale-to-Zero

缩容至零

SSH servers may auto-stop after inactivity to save costs.
Activity detection: Active SSH connections and foreground applications. Not detected: Background processes.
Requires SSH server image v0.3.10+.
SSH服务器闲置时会自动停止以节约成本。
活动检测范围: 活跃SSH连接、前台运行的应用 不检测范围: 后台进程
该功能需要SSH服务器镜像v0.3.10及以上版本支持。

Persistent Storage

持久化存储

  • Home directory (
    /home/jovyan/
    ) persists across restarts
  • APT packages do NOT persist — use Build Scripts
  • Pip packages in home directory persist
  • Conda environments persist
  • 家目录(
    /home/jovyan/
    ) 重启后数据保留
  • APT安装的包 重启后不保留 — 建议使用构建脚本安装
  • 家目录下的Pip包 重启后保留
  • Conda环境 重启后保留

Custom Images

自定义镜像

Extend TrueFoundry's SSH server images:
Security: Privileged Operations — The Dockerfile commands below run as root during the container image build phase only (not on the user's local machine or the host system). The
USER root
/
USER jovyan
pattern is standard for installing system packages into a container image. Only install packages from trusted sources. Pin package versions where possible to prevent supply-chain attacks.
dockerfile
FROM public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12

USER jovyan

RUN python3 -m pip install --no-cache-dir torch numpy pandas
Critical: Do NOT modify ENTRYPOINT or CMD.
你可以基于TrueFoundry的SSH服务器镜像进行扩展:
安全提示:特权操作 — 下方Dockerfile命令仅在容器镜像构建阶段以root身份运行(不会在用户本地机器或宿主机执行),
USER root
/
USER jovyan
是容器镜像安装系统包的标准写法。仅安装可信来源的包,尽可能固定包版本避免供应链攻击。
dockerfile
FROM public.ecr.aws/truefoundrycloud/ssh-server:0.4.5-py3.12.12

USER jovyan

RUN python3 -m pip install --no-cache-dir torch numpy pandas
重要提示: 不要修改ENTRYPOINT或CMD。

Build Scripts

构建脚本

Install system packages through a reviewed image build process handled by your platform team instead of ad-hoc commands:
Security: Privileged Operations — The
sudo
commands below run inside the deployed container (not on the user's local machine). The SSH server container requires package installation for system tooling. These commands do not affect the host system. The user must confirm they want these packages installed.
bash
undefined
建议通过平台团队审核的镜像构建流程安装系统包,不要执行临时命令:
安全提示:特权操作 — 下方的
sudo
命令在部署的容器内部运行(不会在用户本地机器执行),SSH服务器容器需要安装系统工具,这些命令不会影响宿主机,执行前需要获得用户确认。
bash
undefined

Example: coordinate package additions through a reviewed Dockerfile PR.

示例:通过审核后的Dockerfile PR提交包安装需求

undefined
undefined

Python Environment Management

Python环境管理

Create isolated environments:
bash
conda create -y -n ml-env python=3.11
conda activate ml-env
pip install torch transformers
</instructions>
<success_criteria>
创建隔离环境:
bash
conda create -y -n ml-env python=3.11
conda activate ml-env
pip install torch transformers
</instructions>
<success_criteria>

Success Criteria

成功标准

  • The user can launch an SSH server on a specified TrueFoundry workspace
  • The user can connect to the server via VS Code Remote-SSH or a standard SSH client
  • The agent has configured SSH keys and verified connectivity
  • The user can transfer files to and from the remote server
  • The agent has set up auto-shutdown to avoid unnecessary costs
  • The user can access GPU resources from the remote environment if requested
</success_criteria>
<references>
  • 用户可以在指定的TrueFoundry工作空间启动SSH服务器
  • 用户可以通过VS Code Remote-SSH或标准SSH客户端连接到服务器
  • Agent已完成SSH密钥配置并验证连通性
  • 用户可以向远程服务器上传/下载文件
  • Agent已配置自动停机避免不必要的成本支出
  • 如果用户有需求,远程环境可以正常访问GPU资源
</success_criteria>
<references>

Composability

功能组合

  • Need workspace: Use
    workspaces
    skill to find target workspace
  • Need GPU info: Use
    workspaces
    skill to check available GPUs on cluster
  • Need persistent volumes: Use
    volumes
    skill to create and mount storage
  • Deploy after development: Use
    deploy
    or
    llm-deploy
    skill
  • Check status: Use
    applications
    skill to see SSH server status
</references> <troubleshooting>
  • 需要工作空间信息:使用
    workspaces
    skill查找目标工作空间
  • 需要GPU信息:使用
    workspaces
    skill查询集群可用GPU资源
  • 需要持久化卷:使用
    volumes
    skill创建并挂载存储
  • 开发完成后部署:使用
    deploy
    llm-deploy
    skill
  • 检查状态:使用
    applications
    skill查看SSH服务器运行状态
</references> <troubleshooting>

Error Handling

错误处理

CLI Errors

CLI错误

tfy: command not found
Install the TrueFoundry CLI:
  pip install 'truefoundry==0.5.0'
  tfy login --host "$TFY_BASE_URL"
Manifest validation failed.
Check:
- YAML syntax is valid
- Required fields: name, type, workspace_fqn
- Image URI exists and is accessible
- Resource values use correct units (memory in MB)
tfy: command not found
安装TrueFoundry CLI:
  pip install 'truefoundry==0.5.0'
  tfy login --host "$TFY_BASE_URL"
清单校验失败
检查:
- YAML语法是否正确
- 必填字段是否齐全:name, type, workspace_fqn
- 镜像URI存在且可访问
- 资源值单位是否正确(内存单位为MB)

Cannot Connect

无法连接

SSH connection failed. Check:
- SSH key is correctly configured
- ProxyTunnel is installed (macOS: brew install proxytunnel)
- SSH server is in Running state (check applications skill)
- Network/VPN connectivity
SSH连接失败,检查:
- SSH密钥配置是否正确
- 已安装ProxyTunnel(macOS:brew install proxytunnel)
- SSH服务器处于运行状态(可通过applications skill查询)
- 网络/VPN连通性正常

GPU Not Available

GPU不可用

GPU not accessible. Verify:
- Requested GPU type is available on cluster (check workspaces skill)
- Used the correct SSH server image with CUDA support
GPU无法访问,验证:
- 集群有可用的请求GPU类型(可通过workspaces skill查询)
- 使用了带CUDA支持的正确SSH服务器镜像

Server Stopped Unexpectedly

服务器意外停止

SSH server stopped. Possible causes:
- Auto-shutdown triggered (no active SSH connections)
- Check if auto-shutdown is configured on the server
- Resource limits exceeded (increase memory/CPU)
SSH服务器停止,可能原因:
- 触发自动停机(无活跃SSH连接)
- 检查服务器是否配置了自动停机
- 超出资源限制(增加内存/CPU配置)

REST API Fallback Errors

REST API回退错误

401 Unauthorized — Check TFY_API_KEY is valid
404 Not Found — Check TFY_BASE_URL and API endpoint path
422 Validation Error — Check manifest fields match expected schema
</troubleshooting>
401 Unauthorized — 检查TFY_API_KEY是否有效
404 Not Found — 检查TFY_BASE_URL和API端点路径是否正确
422 Validation Error — 检查清单字段是否符合预期schema
</troubleshooting>