truefoundry-jobs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.

路由说明：若用户意图不明确，请使用references/intent-clarification.md中的通用澄清模板。

Jobs

作业

Deploy, schedule, and monitor TrueFoundry job runs. Two paths:

CLI (
```
tfy apply
```
) -- Write a YAML manifest and apply it. Works everywhere.
REST API (fallback) -- When CLI unavailable, use
```
tfy-api.sh
```
.

部署、调度和监控TrueFoundry作业运行。支持两种实现路径：

CLI (
```
tfy apply
```
) -- 编写YAML清单并执行生效，全场景适用。
REST API（备选方案）-- 当CLI不可用时，使用
```
tfy-api.sh
```
实现。

When to Use

适用场景

User asks "deploy a job", "create a job", "run a batch task"
User asks "schedule a job", "run a cron job"
User asks "show job runs", "list runs for my job"
User asks "is my job running", "job status"
User wants to check a specific job run
Debugging a failed job run

用户询问「部署一个作业」、「创建一个作业」、「运行批量任务」
用户询问「调度一个作业」、「运行cron作业」
用户询问「查看作业运行记录」、「列出我的作业的运行记录」
用户询问「我的作业在运行吗」、「作业状态」
用户想要检查某一个特定的作业运行记录
调试失败的作业运行任务

When NOT to Use

不适用场景

User wants to list job applications -> prefer
```
applications
```
skill; ask if the user wants another valid path with
```
application_type: "job"
```

</objective> <context>

用户想要列出作业应用 -> 优先使用
```
applications
```
skill；询问用户是否需要使用
```
application_type: "job"
```
的其他有效路径

</objective> <context>

Prerequisites

前置要求

Always verify before deploying:

Credentials --
```
TFY_BASE_URL
```
and
```
TFY_API_KEY
```
must be set (env or
```
.env
```
)
Workspace --
```
TFY_WORKSPACE_FQN
```
required. Never auto-pick. Ask the user if missing.
CLI -- Check if
```
tfy
```
CLI is available:
```
tfy --version
```
. If not, install a pinned version (
```
pip install 'truefoundry==0.5.0'
```
).

For credential check commands and .env setup, see

references/prerequisites.md

</context> <instructions>

部署前请始终确认以下条件：

凭证 -- 必须配置
```
TFY_BASE_URL
```
和
```
TFY_API_KEY
```
（环境变量或
```
.env
```
文件中）
工作区 -- 需要
```
TFY_WORKSPACE_FQN
```
。绝对不要自动选择，缺失时询问用户。
CLI -- 检查
```
 tfy
```
CLI是否可用：
```
tfy --version
```
。如果不可用，安装指定版本(
```
pip install 'truefoundry==0.5.0'
```
)。

凭证检查命令和.env配置方法请查看

references/prerequisites.md

。

</context> <instructions>

Step 1: Analyze the Job

步骤1：分析作业需求

What does the job do? (training, batch processing, data pipeline, maintenance)
One-time or scheduled?
Resource requirements (CPU/GPU/memory)
Expected duration

Security requirements
Never request or print raw secret values in chat.
For sensitive env vars (tokens/passwords/keys), require
tfy-secret://...
references instead of inline values.
For
build_source.type: git
, use trusted repositories and prefer immutable refs (commit SHA or pinned tag) over floating branches.

作业的功能是什么？（训练、批量处理、数据管道、维护任务）
是一次性任务还是定时任务？
资源要求（CPU/GPU/内存）
预期运行时长

安全要求
绝对不要在聊天中请求或输出原始密钥值。
对于敏感环境变量（令牌/密码/密钥），要求使用
tfy-secret://...
引用，不要直接填写值。
对于
build_source.type: git
的场景，使用可信仓库，优先使用不可变引用（提交SHA或固定标签）而非浮动分支。

Step 2: Generate YAML Manifest

步骤2：生成YAML清单

Based on the job requirements, create a YAML manifest.

Security: Always confirm container image sources and git repository URLs with the user before deploying. Do not pull untrusted container images or clone unverified git repositories. Pin image tags to specific versions — avoid
:latest
in production.

根据作业需求创建YAML清单。

安全提示：部署前请始终与用户确认容器镜像来源和git仓库URL。不要拉取不受信任的容器镜像或克隆未验证的git仓库。将镜像标签固定到特定版本 -- 生产环境避免使用
:latest
。

Option A: Pre-built Image

选项A：预构建镜像

yaml

name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option B: Git Repo + Dockerfile

选项B：Git仓库 + Dockerfile

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option C: Git Repo + PythonBuild (No Dockerfile)

选项C：Git仓库 + PythonBuild（无Dockerfile）

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Scheduled Jobs (Cron)

定时作业（Cron）

Add a

trigger

section for scheduled execution:

yaml

name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Cron format:

minute hour day_of_month month day_of_week

Common schedules:

Schedule	Cron	Description
Every hour	`0 * * * *`	Top of every hour
Daily at 2 AM	`0 2 * * *`	Nightly jobs
Weekly Monday	`0 9 * * 1`	Weekly Monday 9 AM
Monthly 1st	`0 0 1 * *`	First of month midnight

添加

trigger

区块实现定时执行：

yaml

name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Cron格式：

分钟 小时 日 月 星期

常用调度配置：

调度规则	Cron表达式	说明
每小时	`0 * * * *`	每小时整点执行
每日凌晨2点	`0 2 * * *`	夜间任务
每周一	`0 9 * * 1`	每周一上午9点
每月1号	`0 0 1 * *`	每月1号零点

Manual Trigger with Retries

带重试的手动触发任务

yaml

name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

yaml

name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Concurrency Policies

并发策略

Three options for scheduled jobs when a run overlaps:

Forbid (default): Skip new run if previous still running
Allow: Run in parallel
Replace: Kill current, start new

定时作业运行重叠时支持三种处理选项：

禁止（默认）：如果上一次运行仍在执行，跳过新的运行
允许：并行运行
替换：终止当前运行，启动新的运行

Parameterized Jobs

参数化作业

python

import argparse

python

import argparse

In your job script, use argparse for dynamic params

在你的作业脚本中使用argparse实现动态传参

parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()


Then set command: `python train.py --epochs 50 --batch-size 64`

parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()


然后设置命令为：`python train.py --epochs 50 --batch-size 64`

GPU Jobs

GPU作业

yaml

name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name

yaml

name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name

Job with Volume Mounts

挂载存储卷的作业

yaml

name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name

yaml

name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name

Step 3: Write and Apply Manifest

步骤3：编写并生效清单

Write the manifest to

tfy-manifest.yaml

bash

undefined

将清单写入

tfy-manifest.yaml

：

bash

undefined

Preview

预览变更

tfy apply -f tfy-manifest.yaml --dry-run --show-diff

Apply after user confirms

用户确认后执行生效

tfy apply -f tfy-manifest.yaml

undefined

tfy apply -f tfy-manifest.yaml

undefined

Fallback: REST API

备选方案：REST API

tfy

CLI is not available, convert the YAML manifest to JSON and deploy via REST API. See

references/cli-fallback.md

for the conversion process.

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'

如果

tfy

CLI不可用，将YAML清单转换为JSON并通过REST API部署。转换流程请查看

references/cli-fallback.md

。

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'

Step 4: Trigger the Job

步骤4：触发作业

After deployment, trigger manually via API:

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'

部署完成后，通过API手动触发：

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'

After Deploy -- Report Status

部署后 -- 上报状态

CRITICAL: Always report the deployment status and job details to the user. Do this automatically after deploy, without asking an extra verification prompt.

重要提示：请始终向用户上报部署状态和作业详情。 部署完成后自动执行此步骤，不需要额外发起验证询问。

Check Job Status

检查作业状态

text

undefined

text

undefined

Preferred (MCP tool call)

优先使用（MCP工具调用）

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})


If MCP tool calls are unavailable, use API fallback:

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})


如果MCP工具调用不可用，使用API备选方案：

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

Get job application details

获取作业应用详情

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'

undefined

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'

undefined

Report to User

向用户上报信息

Always present this summary after deployment:

Job deployed successfully!

Job: {job-name}
Workspace: {workspace-fqn}
Status: Suspended (deployed, ready to trigger)
Schedule: {cron expression if scheduled, or "Manual trigger"}

To trigger the job:
  - Dashboard: Click "Run Job" on the job page
  - API: POST /api/svc/v1/jobs/{JOB_ID}/runs

To monitor runs:
  - Use the job monitoring commands below
  - Or check the TrueFoundry dashboard

For scheduled jobs, also show when the next run will execute. For manually triggered jobs, remind the user how to trigger them.

部署完成后请始终展示以下汇总信息：

作业部署成功!

作业名称: {job-name}
工作区: {workspace-fqn}
状态: 已暂停（部署完成，可触发运行）
触发方式: {如果是定时任务则展示cron表达式，否则展示「手动触发」}

触发作业方式:
  - 控制台: 在作业页面点击「Run Job」
  - API: POST /api/svc/v1/jobs/{JOB_ID}/runs

监控运行记录方式:
  - 使用下方的作业监控命令
  - 或查看TrueFoundry控制台

对于定时作业，还需要展示下一次运行的时间。 对于手动触发的作业，提醒用户触发方式。

.tfyignore

.tfyignore文件

Create a

.tfyignore

file (follows

.gitignore

syntax) to exclude files from the Docker build:

.git/
__pycache__/
*.pyc
.env
data/

创建

.tfyignore

文件（遵循

.gitignore

语法），在Docker构建时排除指定文件：

.git/
__pycache__/
*.pyc
.env
data/

List Job Runs

列出作业运行记录

When using direct API, set

TFY_API_SH

to the full path of this skill's

scripts/tfy-api.sh

. See

references/tfy-api-setup.md

for paths per agent.

使用直接API调用时，将

TFY_API_SH

设置为当前skill的

scripts/tfy-api.sh

的完整路径。不同Agent的路径请查看

references/tfy-api-setup.md

。

Via Tool Call

通过工具调用

tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})

tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # 获取特定运行记录
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})

Via Direct API

通过直接API调用

bash

undefined

bash

undefined

Set the path to tfy-api.sh for your agent (example for Claude Code):

为你的Agent设置tfy-api.sh路径（Claude Code示例）:

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

List runs for a job

列出作业的所有运行记录

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs

Get specific run

获取特定运行记录

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME

With filters

带过滤条件

$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'

undefined

$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'

undefined

Filter Parameters

过滤参数

Parameter	API Key	Description
`search_prefix`	`searchPrefix`	Filter runs by name prefix
`sort_by`	`sortBy`	Sort field (e.g. `createdAt` )
`triggered_by`	`triggeredBy`	Filter by who triggered

参数	API字段名	说明
`search_prefix`	`searchPrefix`	按名称前缀过滤运行记录
`sort_by`	`sortBy`	排序字段（例如 `createdAt` ）
`triggered_by`	`triggeredBy`	按触发者过滤

Presenting Job Runs

作业运行记录展示格式

Job Runs for data-pipeline:
| Run Name       | Status    | Started            | Duration |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | SUCCEEDED | 2026-02-10 09:00   | 5m 32s  |
| run-20260210-2 | FAILED    | 2026-02-10 10:00   | 1m 05s  |
| run-20260210-3 | RUNNING   | 2026-02-10 11:00   | --       |

</instructions>

<success_criteria>

data-pipeline的作业运行记录:
| 运行名称       | 状态    | 启动时间            | 运行时长 |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | 运行成功 | 2026-02-10 09:00   | 5m 32s  |
| run-20260210-2 | 运行失败    | 2026-02-10 10:00   | 1m 05s  |
| run-20260210-3 | 运行中   | 2026-02-10 11:00   | --       |

</instructions>

<success_criteria>

Success Criteria

成功标准

The job has been deployed to the target workspace and the user can see it in the TrueFoundry dashboard
The user has been provided the job ID and knows how to trigger runs (manually or via cron schedule)
The agent has reported the deployment status including job name, workspace, and trigger type
Deployment status is verified automatically immediately after apply/deploy (no extra prompt)
Job logs are accessible for monitoring via the
```
logs
```
skill or the dashboard
For scheduled jobs, the cron expression is confirmed and the user knows when the next run will execute

</success_criteria>

作业已部署到目标工作区，用户可以在TrueFoundry控制台中看到该作业
已为用户提供作业ID，且用户知晓如何触发运行（手动触发或通过cron调度）
Agent已上报部署状态，包括作业名称、工作区和触发类型
部署状态在apply/部署完成后立即自动验证（无额外询问）
可通过
```
logs
```
skill或控制台访问作业日志用于监控
对于定时作业，已确认cron表达式，且用户知晓下一次运行的时间

</success_criteria>

Composability

可组合性

Schedule jobs: Use cron trigger for automated scheduling
Monitor runs: Use the job runs monitoring sections below
Find job first: Use
```
applications
```
skill with
```
application_type: "job"
```
to get job app ID
Check logs: Use
```
logs
```
skill with
```
job_run_name
```
to see run output

</references> <troubleshooting>

调度作业：使用cron触发器实现自动调度
监控运行记录：使用下方的作业运行监控模块
先查找作业：使用
```
applications
```
skill配合
```
application_type: "job"
```
获取作业应用ID
查看日志：使用
```
logs
```
skill配合
```
job_run_name
```
查看运行输出

</references> <troubleshooting>

Error Handling

错误处理

Job Not Found

作业不存在

Job ID not found. Use applications skill to list jobs:
tfy_applications_list(filters={"application_type": "job"})

未找到对应作业ID。使用applications skill列出所有作业：
tfy_applications_list(filters={"application_type": "job"})

No Runs Found

未找到运行记录

No runs found for this job. The job may not have been triggered yet.

未找到该作业的运行记录。可能该作业尚未被触发。

CLI Errors

CLI错误

tfy: command not found

-- Install with

pip install 'truefoundry==0.5.0'

```
tfy apply
```
validation errors -- Check YAML syntax, ensure required fields (name, type, image, resources, workspace_fqn) are present

</troubleshooting> </output>

tfy: command not found

-- 执行

pip install 'truefoundry==0.5.0'

安装

```
tfy apply
```
验证错误 -- 检查YAML语法，确认必填字段（name、type、image、resources、workspace_fqn）已填写

</troubleshooting> </output>