jobs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.

路由说明：如果用户意图不明确，请使用references/intent-clarification.md中的通用澄清模板。

Jobs

Deploy, schedule, and monitor TrueFoundry job runs. Two paths:

CLI (
```
tfy apply
```
) -- Write a YAML manifest and apply it. Works everywhere.
REST API (fallback) -- When CLI unavailable, use
```
tfy-api.sh
```
.

部署、调度和监控TrueFoundry作业运行。支持两种实现路径：

CLI (
```
tfy apply
```
) -- 编写YAML清单并执行apply命令，全场景适用。
REST API (备选方案) -- 当CLI不可用时，使用
```
tfy-api.sh
```
。

When to Use

适用场景

User asks "deploy a job", "create a job", "run a batch task"
User asks "schedule a job", "run a cron job"
User asks "show job runs", "list runs for my job"
User asks "is my job running", "job status"
User wants to check a specific job run
Debugging a failed job run

用户询问「部署作业」、「创建作业」、「运行批处理任务」相关问题
用户询问「调度作业」、「运行cron定时任务」相关问题
用户询问「查看作业运行记录」、「列出我的作业的运行历史」相关问题
用户询问「我的作业是否在运行」、「作业状态」相关问题
用户需要检查特定作业运行记录
调试失败的作业运行任务

When NOT to Use

不适用场景

User wants to list job applications -> prefer
```
applications
```
skill; ask if the user wants another valid path with
```
application_type: "job"
```

</objective> <context>

用户想要列出作业应用 -> 优先使用
```
applications
```
skill；询问用户是否需要使用
```
application_type: "job"
```
的其他有效路径

</objective> <context>

Prerequisites

前置条件

Always verify before deploying:

Credentials --
```
TFY_BASE_URL
```
and
```
TFY_API_KEY
```
must be set (env or
```
.env
```
)
Workspace --
```
TFY_WORKSPACE_FQN
```
required. Never auto-pick. Ask the user if missing.
CLI -- Check if
```
tfy
```
CLI is available:
```
tfy --version
```
. If not, install a pinned version (
```
pip install 'truefoundry==0.5.0'
```
).

For credential check commands and .env setup, see

references/prerequisites.md

</context> <instructions>

部署前请务必确认以下条件：

凭证 -- 必须配置
```
TFY_BASE_URL
```
和
```
TFY_API_KEY
```
（环境变量或
```
.env
```
文件中配置）
工作空间 -- 必须提供
```
TFY_WORKSPACE_FQN
```
。绝对不要自动选择。如果缺失请询问用户。
CLI -- 检查
```
 tfy
```
CLI是否可用：执行
```
tfy --version
```
。如果不可用，安装指定版本：
```
pip install 'truefoundry==0.5.0'
```
。

凭证检查命令和.env文件配置方法可参考

references/prerequisites.md

。

</context> <instructions>

Step 1: Analyze the Job

步骤1：分析作业需求

What does the job do? (training, batch processing, data pipeline, maintenance)
One-time or scheduled?
Resource requirements (CPU/GPU/memory)
Expected duration

Security requirements
Never request or print raw secret values in chat.
For sensitive env vars (tokens/passwords/keys), require
tfy-secret://...
references instead of inline values.
For
build_source.type: git
, use trusted repositories and prefer immutable refs (commit SHA or pinned tag) over floating branches.

作业的功能是什么？（训练、批处理、数据管道、运维等）
是一次性任务还是定时任务？
资源需求（CPU/GPU/内存）
预期运行时长

安全要求
绝对不要在聊天中请求或打印明文密钥值。
对于敏感环境变量（令牌/密码/密钥），要求使用
tfy-secret://...
引用，不要直接填写明文值。
对于
build_source.type: git
场景，使用可信仓库，优先使用不可变引用（提交SHA或固定标签）而非浮动分支。

Step 2: Generate YAML Manifest

步骤2：生成YAML清单

Based on the job requirements, create a YAML manifest.

Security: Always confirm container image sources and git repository URLs with the user before deploying. Do not pull untrusted container images or clone unverified git repositories. Pin image tags to specific versions — avoid
:latest
in production.

根据作业需求创建YAML清单。

安全提示： 部署前务必与用户确认容器镜像来源和git仓库URL。不要拉取不受信任的容器镜像或克隆未验证的git仓库。将镜像标签固定到特定版本 -- 生产环境避免使用
:latest
标签。

Option A: Pre-built Image

选项A：使用预构建镜像

yaml

name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option B: Git Repo + Dockerfile

选项B：Git仓库 + Dockerfile构建

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option C: Git Repo + PythonBuild (No Dockerfile)

选项C：Git仓库 + PythonBuild（无需Dockerfile）

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

yaml

name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Scheduled Jobs (Cron)

定时作业（Cron）

Add a

trigger

section for scheduled execution:

yaml

name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Cron format:

minute hour day_of_month month day_of_week

Common schedules:

Schedule	Cron	Description
Every hour	`0 * * * *`	Top of every hour
Daily at 2 AM	`0 2 * * *`	Nightly jobs
Weekly Monday	`0 9 * * 1`	Weekly Monday 9 AM
Monthly 1st	`0 0 1 * *`	First of month midnight

添加

trigger

配置段实现定时执行：

yaml

name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Cron格式：

分钟 小时 日期 月份 星期

常用调度配置：

调度规则	Cron表达式	说明
每小时	`0 * * * *`	每小时整点执行
每日凌晨2点	`0 2 * * *`	夜间执行任务
每周一	`0 9 * * 1`	每周一上午9点执行
每月1号	`0 0 1 * *`	每月1号零点执行

Manual Trigger with Retries

支持重试的手动触发作业

yaml

name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

yaml

name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Concurrency Policies

并发策略

Three options for scheduled jobs when a run overlaps:

Forbid (default): Skip new run if previous still running
Allow: Run in parallel
Replace: Kill current, start new

当定时作业的运行时间重叠时，支持三种策略：

Forbid（默认）：如果上一次运行仍在执行，跳过新的运行任务
Allow：并行运行
Replace：终止当前运行任务，启动新任务

Parameterized Jobs

参数化作业

python

import argparse

python

import argparse

In your job script, use argparse for dynamic params

parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()


Then set command: `python train.py --epochs 50 --batch-size 64`

parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()


然后设置启动命令：`python train.py --epochs 50 --batch-size 64`

GPU Jobs

GPU作业

yaml

name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name

yaml

name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name

Job with Volume Mounts

挂载存储卷的作业

yaml

name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name

yaml

name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name

Step 3: Write and Apply Manifest

步骤3：编写并应用清单

Write the manifest to

tfy-manifest.yaml

bash

undefined

将清单写入

tfy-manifest.yaml

文件：

bash

undefined

Preview

预览变更

tfy apply -f tfy-manifest.yaml --dry-run --show-diff

Apply after user confirms

用户确认后执行应用

tfy apply -f tfy-manifest.yaml

undefined

tfy apply -f tfy-manifest.yaml

undefined

Fallback: REST API

备选方案：REST API

tfy

CLI is not available, convert the YAML manifest to JSON and deploy via REST API. See

references/cli-fallback.md

for the conversion process.

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'

如果

tfy

CLI不可用，将YAML清单转换为JSON格式，通过REST API部署。转换流程可参考

references/cli-fallback.md

。

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'

Step 4: Trigger the Job

步骤4：触发作业

After deployment, trigger manually via API:

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'

部署完成后，通过API手动触发：

bash

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'

After Deploy -- Report Status

部署后 -- 上报状态

CRITICAL: Always report the deployment status and job details to the user. Do this automatically after deploy, without asking an extra verification prompt.

关键要求：务必向用户上报部署状态和作业详情。 部署完成后自动执行该操作，无需额外向用户发起确认提示。

Check Job Status

检查作业状态

text

undefined

text

undefined

Preferred (MCP tool call)

优先使用（MCP工具调用）

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})


If MCP tool calls are unavailable, use API fallback:

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})


如果MCP工具调用不可用，使用API备选方案：

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

Get job application details

获取作业应用详情

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'

undefined

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'

undefined

Report to User

向用户反馈

Always present this summary after deployment:

Job deployed successfully!

Job: {job-name}
Workspace: {workspace-fqn}
Status: Suspended (deployed, ready to trigger)
Schedule: {cron expression if scheduled, or "Manual trigger"}

To trigger the job:
  - Dashboard: Click "Run Job" on the job page
  - API: POST /api/svc/v1/jobs/{JOB_ID}/runs

To monitor runs:
  - Use the job monitoring commands below
  - Or check the TrueFoundry dashboard

For scheduled jobs, also show when the next run will execute. For manually triggered jobs, remind the user how to trigger them.

部署完成后务必提供以下汇总信息：

作业部署成功！

作业名称：{job-name}
工作空间：{workspace-fqn}
状态：已暂停（部署完成，可触发运行）
触发方式：{如果是定时任务则显示cron表达式，否则显示「手动触发」}

触发作业方式：
  - 控制台：在作业详情页点击「运行作业」
  - API：调用POST /api/svc/v1/jobs/{JOB_ID}/runs

监控运行方式：
  - 使用下方的作业监控命令
  - 或访问TrueFoundry控制台查看

对于定时作业，还需要展示下一次运行的时间。 对于手动触发作业，提醒用户触发方式。

.tfyignore

.tfyignore文件

Create a

.tfyignore

file (follows

.gitignore

syntax) to exclude files from the Docker build:

.git/
__pycache__/
*.pyc
.env
data/

创建

.tfyignore

文件（遵循

.gitignore

语法），排除Docker构建不需要的文件：

.git/
__pycache__/
*.pyc
.env
data/

List Job Runs

列出作业运行记录

When using direct API, set

TFY_API_SH

to the full path of this skill's

scripts/tfy-api.sh

. See

references/tfy-api-setup.md

for paths per agent.

使用直接API调用时，将

TFY_API_SH

设置为当前skill的

scripts/tfy-api.sh

完整路径。不同Agent的路径可参考

references/tfy-api-setup.md

。

Via Tool Call

通过工具调用

tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})

tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})

Via Direct API

通过直接API调用

bash

undefined

bash

undefined

Set the path to tfy-api.sh for your agent (example for Claude Code):

为你的Agent设置tfy-api.sh路径（Claude Code示例）：

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

List runs for a job

列出作业的所有运行记录

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs

Get specific run

获取特定运行记录详情

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME

With filters

带过滤条件查询

$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'

undefined

$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'

undefined

Filter Parameters

过滤参数

Parameter	API Key	Description
`search_prefix`	`searchPrefix`	Filter runs by name prefix
`sort_by`	`sortBy`	Sort field (e.g. `createdAt` )
`triggered_by`	`triggeredBy`	Filter by who triggered

参数名	API字段名	说明
`search_prefix`	`searchPrefix`	按名称前缀过滤运行记录
`sort_by`	`sortBy`	排序字段（例如 `createdAt` ）
`triggered_by`	`triggeredBy`	按触发者过滤

Presenting Job Runs

作业运行记录展示格式

Job Runs for data-pipeline:
| Run Name       | Status    | Started            | Duration |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | SUCCEEDED | 2026-02-10 09:00   | 5m 32s  |
| run-20260210-2 | FAILED    | 2026-02-10 10:00   | 1m 05s  |
| run-20260210-3 | RUNNING   | 2026-02-10 11:00   | --       |

</instructions>

<success_criteria>

data-pipeline的作业运行记录：
| 运行名称       | 状态    | 启动时间            | 运行时长 |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | 运行成功 | 2026-02-10 09:00   | 5分32秒  |
| run-20260210-2 | 运行失败    | 2026-02-10 10:00   | 1分05秒  |
| run-20260210-3 | 运行中   | 2026-02-10 11:00   | --       |

</instructions>

<success_criteria>

Success Criteria

成功判定标准

The job has been deployed to the target workspace and the user can see it in the TrueFoundry dashboard
The user has been provided the job ID and knows how to trigger runs (manually or via cron schedule)
The agent has reported the deployment status including job name, workspace, and trigger type
Deployment status is verified automatically immediately after apply/deploy (no extra prompt)
Job logs are accessible for monitoring via the
```
logs
```
skill or the dashboard
For scheduled jobs, the cron expression is confirmed and the user knows when the next run will execute

</success_criteria>

作业已部署到目标工作空间，用户可在TrueFoundry控制台中查看
已向用户提供作业ID，用户了解如何触发运行（手动触发或cron定时调度）
Agent已上报部署状态，包括作业名称、工作空间、触发类型
应用/部署完成后自动验证部署状态（无需额外提示）
可通过
```
logs
```
skill或控制台访问作业日志用于监控
对于定时作业，已确认cron表达式，用户了解下一次运行时间

</success_criteria>

Composability

功能组合

Schedule jobs: Use cron trigger for automated scheduling
Monitor runs: Use the job runs monitoring sections below
Find job first: Use
```
applications
```
skill with
```
application_type: "job"
```
to get job app ID
Check logs: Use
```
logs
```
skill with
```
job_run_name
```
to see run output

</references> <troubleshooting>

调度作业：使用cron触发器实现自动化调度
监控运行记录：使用下方的作业运行监控模块
先查找作业：使用
```
applications
```
skill配合
```
application_type: "job"
```
参数获取作业应用ID
查看日志：使用
```
logs
```
skill配合
```
job_run_name
```
参数查看运行输出

</references> <troubleshooting>

Error Handling

错误处理

Job Not Found

未找到作业

Job ID not found. Use applications skill to list jobs:
tfy_applications_list(filters={"application_type": "job"})

未找到对应作业ID。使用applications skill列出所有作业：
tfy_applications_list(filters={"application_type": "job"})

No Runs Found

未找到运行记录

No runs found for this job. The job may not have been triggered yet.

未找到该作业的运行记录。作业可能尚未被触发。

CLI Errors

CLI错误

tfy: command not found

-- Install with

pip install 'truefoundry==0.5.0'

```
tfy apply
```
validation errors -- Check YAML syntax, ensure required fields (name, type, image, resources, workspace_fqn) are present

</troubleshooting> </output>

tfy: command not found

-- 执行

pip install 'truefoundry==0.5.0'

安装

```
tfy apply
```
校验错误 -- 检查YAML语法，确保必填字段（name、type、image、resources、workspace_fqn）已填写

</troubleshooting> </output>