jobs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<objective>Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>路由说明:如果用户意图不明确,请使用references/intent-clarification.md中的通用澄清模板。
Jobs
Jobs
Deploy, schedule, and monitor TrueFoundry job runs. Two paths:
- CLI () -- Write a YAML manifest and apply it. Works everywhere.
tfy apply - REST API (fallback) -- When CLI unavailable, use .
tfy-api.sh
部署、调度和监控TrueFoundry作业运行。支持两种实现路径:
- CLI () -- 编写YAML清单并执行apply命令,全场景适用。
tfy apply - REST API (备选方案) -- 当CLI不可用时,使用。
tfy-api.sh
When to Use
适用场景
- User asks "deploy a job", "create a job", "run a batch task"
- User asks "schedule a job", "run a cron job"
- User asks "show job runs", "list runs for my job"
- User asks "is my job running", "job status"
- User wants to check a specific job run
- Debugging a failed job run
- 用户询问「部署作业」、「创建作业」、「运行批处理任务」相关问题
- 用户询问「调度作业」、「运行cron定时任务」相关问题
- 用户询问「查看作业运行记录」、「列出我的作业的运行历史」相关问题
- 用户询问「我的作业是否在运行」、「作业状态」相关问题
- 用户需要检查特定作业运行记录
- 调试失败的作业运行任务
When NOT to Use
不适用场景
- User wants to list job applications -> prefer skill; ask if the user wants another valid path with
applicationsapplication_type: "job"
- 用户想要列出作业应用 -> 优先使用skill;询问用户是否需要使用
applications的其他有效路径application_type: "job"
Prerequisites
前置条件
Always verify before deploying:
- Credentials -- and
TFY_BASE_URLmust be set (env orTFY_API_KEY).env - Workspace -- required. Never auto-pick. Ask the user if missing.
TFY_WORKSPACE_FQN - CLI -- Check if CLI is available:
tfy. If not, install a pinned version (tfy --version).pip install 'truefoundry==0.5.0'
For credential check commands and .env setup, see .
</context>
<instructions>references/prerequisites.md部署前请务必确认以下条件:
- 凭证 -- 必须配置和
TFY_BASE_URL(环境变量或TFY_API_KEY文件中配置).env - 工作空间 -- 必须提供。绝对不要自动选择。如果缺失请询问用户。
TFY_WORKSPACE_FQN - CLI -- 检查CLI是否可用:执行
tfy。如果不可用,安装指定版本:tfy --version。pip install 'truefoundry==0.5.0'
凭证检查命令和.env文件配置方法可参考。
</context>
<instructions>references/prerequisites.mdStep 1: Analyze the Job
步骤1:分析作业需求
- What does the job do? (training, batch processing, data pipeline, maintenance)
- One-time or scheduled?
- Resource requirements (CPU/GPU/memory)
- Expected duration
Security requirements
- Never request or print raw secret values in chat.
- For sensitive env vars (tokens/passwords/keys), require
references instead of inline values.tfy-secret://...- For
, use trusted repositories and prefer immutable refs (commit SHA or pinned tag) over floating branches.build_source.type: git
- 作业的功能是什么?(训练、批处理、数据管道、运维等)
- 是一次性任务还是定时任务?
- 资源需求(CPU/GPU/内存)
- 预期运行时长
安全要求
- 绝对不要在聊天中请求或打印明文密钥值。
- 对于敏感环境变量(令牌/密码/密钥),要求使用
引用,不要直接填写明文值。tfy-secret://...- 对于
场景,使用可信仓库,优先使用不可变引用(提交SHA或固定标签)而非浮动分支。build_source.type: git
Step 2: Generate YAML Manifest
步骤2:生成YAML清单
Based on the job requirements, create a YAML manifest.
Security: Always confirm container image sources and git repository URLs with the user before deploying. Do not pull untrusted container images or clone unverified git repositories. Pin image tags to specific versions — avoidin production.:latest
根据作业需求创建YAML清单。
安全提示: 部署前务必与用户确认容器镜像来源和git仓库URL。不要拉取不受信任的容器镜像或克隆未验证的git仓库。将镜像标签固定到特定版本 -- 生产环境避免使用标签。:latest
Option A: Pre-built Image
选项A:使用预构建镜像
yaml
name: my-batch-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0 # pin to a specific version
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
ephemeral_storage_request: 1000
ephemeral_storage_limit: 2000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0 # pin to a specific version
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
ephemeral_storage_request: 1000
ephemeral_storage_limit: 2000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameOption B: Git Repo + Dockerfile
选项B:Git仓库 + Dockerfile构建
yaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: dockerfile
dockerfile_path: Dockerfile
build_context_path: "."
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: dockerfile
dockerfile_path: Dockerfile
build_context_path: "."
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameOption C: Git Repo + PythonBuild (No Dockerfile)
选项C:Git仓库 + PythonBuild(无需Dockerfile)
yaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: tfy-python-buildpack
command: python train.py
python_version: "3.11"
python_dependencies:
type: pip
requirements_path: requirements.txt
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: tfy-python-buildpack
command: python train.py
python_version: "3.11"
python_dependencies:
type: pip
requirements_path: requirements.txt
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameScheduled Jobs (Cron)
定时作业(Cron)
Add a section for scheduled execution:
triggeryaml
name: nightly-retrain
type: job
trigger:
type: cron
schedule: "0 2 * * *" # 2 AM daily
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameCron format:
minute hour day_of_month month day_of_weekCommon schedules:
| Schedule | Cron | Description |
|---|---|---|
| Every hour | | Top of every hour |
| Daily at 2 AM | | Nightly jobs |
| Weekly Monday | | Weekly Monday 9 AM |
| Monthly 1st | | First of month midnight |
添加配置段实现定时执行:
triggeryaml
name: nightly-retrain
type: job
trigger:
type: cron
schedule: "0 2 * * *" # 2 AM daily
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameCron格式:
分钟 小时 日期 月份 星期常用调度配置:
| 调度规则 | Cron表达式 | 说明 |
|---|---|---|
| 每小时 | | 每小时整点执行 |
| 每日凌晨2点 | | 夜间执行任务 |
| 每周一 | | 每周一上午9点执行 |
| 每月1号 | | 每月1号零点执行 |
Manual Trigger with Retries
支持重试的手动触发作业
yaml
name: my-job
type: job
trigger:
type: manual
num_retries: 3
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python job.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameyaml
name: my-job
type: job
trigger:
type: manual
num_retries: 3
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python job.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameConcurrency Policies
并发策略
Three options for scheduled jobs when a run overlaps:
- Forbid (default): Skip new run if previous still running
- Allow: Run in parallel
- Replace: Kill current, start new
当定时作业的运行时间重叠时,支持三种策略:
- Forbid(默认):如果上一次运行仍在执行,跳过新的运行任务
- Allow:并行运行
- Replace:终止当前运行任务,启动新任务
Parameterized Jobs
参数化作业
python
import argparsepython
import argparseIn your job script, use argparse for dynamic params
In your job script, use argparse for dynamic params
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10)
parser.add_argument("--batch-size", type=int, default=32)
args = parser.parse_args()
Then set command: `python train.py --epochs 50 --batch-size 64`parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10)
parser.add_argument("--batch-size", type=int, default=32)
args = parser.parse_args()
然后设置启动命令:`python train.py --epochs 50 --batch-size 64`GPU Jobs
GPU作业
yaml
name: gpu-training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
devices:
- type: nvidia_gpu
name: A10_24GB
count: 1
workspace_fqn: cluster-id:workspace-nameyaml
name: gpu-training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
devices:
- type: nvidia_gpu
name: A10_24GB
count: 1
workspace_fqn: cluster-id:workspace-nameJob with Volume Mounts
挂载存储卷的作业
yaml
name: training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
mounts:
- mount_path: /data
volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-nameyaml
name: training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
mounts:
- mount_path: /data
volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-nameStep 3: Write and Apply Manifest
步骤3:编写并应用清单
Write the manifest to :
tfy-manifest.yamlbash
undefined将清单写入文件:
tfy-manifest.yamlbash
undefinedPreview
预览变更
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
Apply after user confirms
用户确认后执行应用
tfy apply -f tfy-manifest.yaml
undefinedtfy apply -f tfy-manifest.yaml
undefinedFallback: REST API
备选方案:REST API
If CLI is not available, convert the YAML manifest to JSON and deploy via REST API. See for the conversion process.
tfyreferences/cli-fallback.mdbash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps '{
"manifest": { ... JSON version of the YAML manifest ... },
"workspaceId": "WORKSPACE_ID"
}'如果 CLI不可用,将YAML清单转换为JSON格式,通过REST API部署。转换流程可参考。
tfyreferences/cli-fallback.mdbash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps '{
"manifest": { ... JSON version of the YAML manifest ... },
"workspaceId": "WORKSPACE_ID"
}'Step 4: Trigger the Job
步骤4:触发作业
After deployment, trigger manually via API:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'部署完成后,通过API手动触发:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'After Deploy -- Report Status
部署后 -- 上报状态
CRITICAL: Always report the deployment status and job details to the user.
Do this automatically after deploy, without asking an extra verification prompt.
关键要求:务必向用户上报部署状态和作业详情。
部署完成后自动执行该操作,无需额外向用户发起确认提示。
Check Job Status
检查作业状态
text
undefinedtext
undefinedPreferred (MCP tool call)
优先使用(MCP工具调用)
tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})
If MCP tool calls are unavailable, use API fallback:
```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.shtfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})
如果MCP工具调用不可用,使用API备选方案:
```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.shGet job application details
获取作业应用详情
$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefined$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefinedReport to User
向用户反馈
Always present this summary after deployment:
Job deployed successfully!
Job: {job-name}
Workspace: {workspace-fqn}
Status: Suspended (deployed, ready to trigger)
Schedule: {cron expression if scheduled, or "Manual trigger"}
To trigger the job:
- Dashboard: Click "Run Job" on the job page
- API: POST /api/svc/v1/jobs/{JOB_ID}/runs
To monitor runs:
- Use the job monitoring commands below
- Or check the TrueFoundry dashboardFor scheduled jobs, also show when the next run will execute.
For manually triggered jobs, remind the user how to trigger them.
部署完成后务必提供以下汇总信息:
作业部署成功!
作业名称:{job-name}
工作空间:{workspace-fqn}
状态:已暂停(部署完成,可触发运行)
触发方式:{如果是定时任务则显示cron表达式,否则显示「手动触发」}
触发作业方式:
- 控制台:在作业详情页点击「运行作业」
- API:调用POST /api/svc/v1/jobs/{JOB_ID}/runs
监控运行方式:
- 使用下方的作业监控命令
- 或访问TrueFoundry控制台查看对于定时作业,还需要展示下一次运行的时间。
对于手动触发作业,提醒用户触发方式。
.tfyignore
.tfyignore文件
Create a file (follows syntax) to exclude files from the Docker build:
.tfyignore.gitignore.git/
__pycache__/
*.pyc
.env
data/创建文件(遵循语法),排除Docker构建不需要的文件:
.tfyignore.gitignore.git/
__pycache__/
*.pyc
.env
data/List Job Runs
列出作业运行记录
When using direct API, set to the full path of this skill's . See for paths per agent.
TFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.md使用直接API调用时,将设置为当前skill的完整路径。不同Agent的路径可参考。
TFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.mdVia Tool Call
通过工具调用
tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name") # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name") # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})Via Direct API
通过直接API调用
bash
undefinedbash
undefinedSet the path to tfy-api.sh for your agent (example for Claude Code):
为你的Agent设置tfy-api.sh路径(Claude Code示例):
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
List runs for a job
列出作业的所有运行记录
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs
Get specific run
获取特定运行记录详情
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME
With filters
带过滤条件查询
$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefined$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefinedFilter Parameters
过滤参数
| Parameter | API Key | Description |
|---|---|---|
| | Filter runs by name prefix |
| | Sort field (e.g. |
| | Filter by who triggered |
| 参数名 | API字段名 | 说明 |
|---|---|---|
| | 按名称前缀过滤运行记录 |
| | 排序字段(例如 |
| | 按触发者过滤 |
Presenting Job Runs
作业运行记录展示格式
Job Runs for data-pipeline:
| Run Name | Status | Started | Duration |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | SUCCEEDED | 2026-02-10 09:00 | 5m 32s |
| run-20260210-2 | FAILED | 2026-02-10 10:00 | 1m 05s |
| run-20260210-3 | RUNNING | 2026-02-10 11:00 | -- |<success_criteria>
data-pipeline的作业运行记录:
| 运行名称 | 状态 | 启动时间 | 运行时长 |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | 运行成功 | 2026-02-10 09:00 | 5分32秒 |
| run-20260210-2 | 运行失败 | 2026-02-10 10:00 | 1分05秒 |
| run-20260210-3 | 运行中 | 2026-02-10 11:00 | -- |<success_criteria>
Success Criteria
成功判定标准
- The job has been deployed to the target workspace and the user can see it in the TrueFoundry dashboard
- The user has been provided the job ID and knows how to trigger runs (manually or via cron schedule)
- The agent has reported the deployment status including job name, workspace, and trigger type
- Deployment status is verified automatically immediately after apply/deploy (no extra prompt)
- Job logs are accessible for monitoring via the skill or the dashboard
logs - For scheduled jobs, the cron expression is confirmed and the user knows when the next run will execute
</success_criteria>
<references>- 作业已部署到目标工作空间,用户可在TrueFoundry控制台中查看
- 已向用户提供作业ID,用户了解如何触发运行(手动触发或cron定时调度)
- Agent已上报部署状态,包括作业名称、工作空间、触发类型
- 应用/部署完成后自动验证部署状态(无需额外提示)
- 可通过skill或控制台访问作业日志用于监控
logs - 对于定时作业,已确认cron表达式,用户了解下一次运行时间
</success_criteria>
<references>Composability
功能组合
- Schedule jobs: Use cron trigger for automated scheduling
- Monitor runs: Use the job runs monitoring sections below
- Find job first: Use skill with
applicationsto get job app IDapplication_type: "job" - Check logs: Use skill with
logsto see run outputjob_run_name
- 调度作业:使用cron触发器实现自动化调度
- 监控运行记录:使用下方的作业运行监控模块
- 先查找作业:使用skill配合
applications参数获取作业应用IDapplication_type: "job" - 查看日志:使用skill配合
logs参数查看运行输出job_run_name
Error Handling
错误处理
Job Not Found
未找到作业
Job ID not found. Use applications skill to list jobs:
tfy_applications_list(filters={"application_type": "job"})未找到对应作业ID。使用applications skill列出所有作业:
tfy_applications_list(filters={"application_type": "job"})No Runs Found
未找到运行记录
No runs found for this job. The job may not have been triggered yet.未找到该作业的运行记录。作业可能尚未被触发。CLI Errors
CLI错误
- -- Install with
tfy: command not foundpip install 'truefoundry==0.5.0' - validation errors -- Check YAML syntax, ensure required fields (name, type, image, resources, workspace_fqn) are present
tfy apply
- -- 执行
tfy: command not found安装pip install 'truefoundry==0.5.0' - 校验错误 -- 检查YAML语法,确保必填字段(name、type、image、resources、workspace_fqn)已填写
tfy apply