truefoundry-jobs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<objective>Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>路由说明:若用户意图不明确,请使用references/intent-clarification.md中的通用澄清模板。
Jobs
作业
Deploy, schedule, and monitor TrueFoundry job runs. Two paths:
- CLI () -- Write a YAML manifest and apply it. Works everywhere.
tfy apply - REST API (fallback) -- When CLI unavailable, use .
tfy-api.sh
部署、调度和监控TrueFoundry作业运行。支持两种实现路径:
- CLI () -- 编写YAML清单并执行生效,全场景适用。
tfy apply - REST API(备选方案)-- 当CLI不可用时,使用实现。
tfy-api.sh
When to Use
适用场景
- User asks "deploy a job", "create a job", "run a batch task"
- User asks "schedule a job", "run a cron job"
- User asks "show job runs", "list runs for my job"
- User asks "is my job running", "job status"
- User wants to check a specific job run
- Debugging a failed job run
- 用户询问「部署一个作业」、「创建一个作业」、「运行批量任务」
- 用户询问「调度一个作业」、「运行cron作业」
- 用户询问「查看作业运行记录」、「列出我的作业的运行记录」
- 用户询问「我的作业在运行吗」、「作业状态」
- 用户想要检查某一个特定的作业运行记录
- 调试失败的作业运行任务
When NOT to Use
不适用场景
- User wants to list job applications -> prefer skill; ask if the user wants another valid path with
applicationsapplication_type: "job"
- 用户想要列出作业应用 -> 优先使用skill;询问用户是否需要使用
applications的其他有效路径application_type: "job"
Prerequisites
前置要求
Always verify before deploying:
- Credentials -- and
TFY_BASE_URLmust be set (env orTFY_API_KEY).env - Workspace -- required. Never auto-pick. Ask the user if missing.
TFY_WORKSPACE_FQN - CLI -- Check if CLI is available:
tfy. If not, install a pinned version (tfy --version).pip install 'truefoundry==0.5.0'
For credential check commands and .env setup, see .
</context>
<instructions>references/prerequisites.md部署前请始终确认以下条件:
- 凭证 -- 必须配置和
TFY_BASE_URL(环境变量或TFY_API_KEY文件中).env - 工作区 -- 需要。绝对不要自动选择,缺失时询问用户。
TFY_WORKSPACE_FQN - CLI -- 检查CLI是否可用:
tfy。如果不可用,安装指定版本(tfy --version)。pip install 'truefoundry==0.5.0'
凭证检查命令和.env配置方法请查看。
</context>
<instructions>references/prerequisites.mdStep 1: Analyze the Job
步骤1:分析作业需求
- What does the job do? (training, batch processing, data pipeline, maintenance)
- One-time or scheduled?
- Resource requirements (CPU/GPU/memory)
- Expected duration
Security requirements
- Never request or print raw secret values in chat.
- For sensitive env vars (tokens/passwords/keys), require
references instead of inline values.tfy-secret://...- For
, use trusted repositories and prefer immutable refs (commit SHA or pinned tag) over floating branches.build_source.type: git
- 作业的功能是什么?(训练、批量处理、数据管道、维护任务)
- 是一次性任务还是定时任务?
- 资源要求(CPU/GPU/内存)
- 预期运行时长
安全要求
- 绝对不要在聊天中请求或输出原始密钥值。
- 对于敏感环境变量(令牌/密码/密钥),要求使用
引用,不要直接填写值。tfy-secret://...- 对于
的场景,使用可信仓库,优先使用不可变引用(提交SHA或固定标签)而非浮动分支。build_source.type: git
Step 2: Generate YAML Manifest
步骤2:生成YAML清单
Based on the job requirements, create a YAML manifest.
Security: Always confirm container image sources and git repository URLs with the user before deploying. Do not pull untrusted container images or clone unverified git repositories. Pin image tags to specific versions — avoidin production.:latest
根据作业需求创建YAML清单。
安全提示:部署前请始终与用户确认容器镜像来源和git仓库URL。不要拉取不受信任的容器镜像或克隆未验证的git仓库。将镜像标签固定到特定版本 -- 生产环境避免使用。:latest
Option A: Pre-built Image
选项A:预构建镜像
yaml
name: my-batch-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0 # pin to a specific version
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
ephemeral_storage_request: 1000
ephemeral_storage_limit: 2000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0 # pin to a specific version
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
ephemeral_storage_request: 1000
ephemeral_storage_limit: 2000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameOption B: Git Repo + Dockerfile
选项B:Git仓库 + Dockerfile
yaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: dockerfile
dockerfile_path: Dockerfile
build_context_path: "."
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: dockerfile
dockerfile_path: Dockerfile
build_context_path: "."
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
env:
ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-nameOption C: Git Repo + PythonBuild (No Dockerfile)
选项C:Git仓库 + PythonBuild(无Dockerfile)
yaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: tfy-python-buildpack
command: python train.py
python_version: "3.11"
python_dependencies:
type: pip
requirements_path: requirements.txt
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameyaml
name: my-batch-job
type: job
image:
type: build
build_source:
type: git
repo_url: https://github.com/user/repo
branch_name: main
ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
build_spec:
type: tfy-python-buildpack
command: python train.py
python_version: "3.11"
python_dependencies:
type: pip
requirements_path: requirements.txt
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameScheduled Jobs (Cron)
定时作业(Cron)
Add a section for scheduled execution:
triggeryaml
name: nightly-retrain
type: job
trigger:
type: cron
schedule: "0 2 * * *" # 2 AM daily
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameCron format:
minute hour day_of_month month day_of_weekCommon schedules:
| Schedule | Cron | Description |
|---|---|---|
| Every hour | | Top of every hour |
| Daily at 2 AM | | Nightly jobs |
| Weekly Monday | | Weekly Monday 9 AM |
| Monthly 1st | | First of month midnight |
添加区块实现定时执行:
triggeryaml
name: nightly-retrain
type: job
trigger:
type: cron
schedule: "0 2 * * *" # 2 AM daily
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameCron格式:
分钟 小时 日 月 星期常用调度配置:
| 调度规则 | Cron表达式 | 说明 |
|---|---|---|
| 每小时 | | 每小时整点执行 |
| 每日凌晨2点 | | 夜间任务 |
| 每周一 | | 每周一上午9点 |
| 每月1号 | | 每月1号零点 |
Manual Trigger with Retries
带重试的手动触发任务
yaml
name: my-job
type: job
trigger:
type: manual
num_retries: 3
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python job.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameyaml
name: my-job
type: job
trigger:
type: manual
num_retries: 3
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python job.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
workspace_fqn: cluster-id:workspace-nameConcurrency Policies
并发策略
Three options for scheduled jobs when a run overlaps:
- Forbid (default): Skip new run if previous still running
- Allow: Run in parallel
- Replace: Kill current, start new
定时作业运行重叠时支持三种处理选项:
- 禁止(默认):如果上一次运行仍在执行,跳过新的运行
- 允许:并行运行
- 替换:终止当前运行,启动新的运行
Parameterized Jobs
参数化作业
python
import argparsepython
import argparseIn your job script, use argparse for dynamic params
在你的作业脚本中使用argparse实现动态传参
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10)
parser.add_argument("--batch-size", type=int, default=32)
args = parser.parse_args()
Then set command: `python train.py --epochs 50 --batch-size 64`parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10)
parser.add_argument("--batch-size", type=int, default=32)
args = parser.parse_args()
然后设置命令为:`python train.py --epochs 50 --batch-size 64`GPU Jobs
GPU作业
yaml
name: gpu-training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
devices:
- type: nvidia_gpu
name: A10_24GB
count: 1
workspace_fqn: cluster-id:workspace-nameyaml
name: gpu-training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
devices:
- type: nvidia_gpu
name: A10_24GB
count: 1
workspace_fqn: cluster-id:workspace-nameJob with Volume Mounts
挂载存储卷的作业
yaml
name: training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
mounts:
- mount_path: /data
volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-nameyaml
name: training-job
type: job
image:
type: image
image_uri: my-registry/my-image:v1.0.0
command: python train.py
resources:
cpu_request: 2
cpu_limit: 4
memory_request: 4000
memory_limit: 8000
mounts:
- mount_path: /data
volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-nameStep 3: Write and Apply Manifest
步骤3:编写并生效清单
Write the manifest to :
tfy-manifest.yamlbash
undefined将清单写入:
tfy-manifest.yamlbash
undefinedPreview
预览变更
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
tfy apply -f tfy-manifest.yaml --dry-run --show-diff
Apply after user confirms
用户确认后执行生效
tfy apply -f tfy-manifest.yaml
undefinedtfy apply -f tfy-manifest.yaml
undefinedFallback: REST API
备选方案:REST API
If CLI is not available, convert the YAML manifest to JSON and deploy via REST API. See for the conversion process.
tfyreferences/cli-fallback.mdbash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps '{
"manifest": { ... JSON version of the YAML manifest ... },
"workspaceId": "WORKSPACE_ID"
}'如果 CLI不可用,将YAML清单转换为JSON并通过REST API部署。转换流程请查看。
tfyreferences/cli-fallback.mdbash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps '{
"manifest": { ... JSON version of the YAML manifest ... },
"workspaceId": "WORKSPACE_ID"
}'Step 4: Trigger the Job
步骤4:触发作业
After deployment, trigger manually via API:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'部署完成后,通过API手动触发:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'After Deploy -- Report Status
部署后 -- 上报状态
CRITICAL: Always report the deployment status and job details to the user.
Do this automatically after deploy, without asking an extra verification prompt.
重要提示:请始终向用户上报部署状态和作业详情。
部署完成后自动执行此步骤,不需要额外发起验证询问。
Check Job Status
检查作业状态
text
undefinedtext
undefinedPreferred (MCP tool call)
优先使用(MCP工具调用)
tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})
If MCP tool calls are unavailable, use API fallback:
```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.shtfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})
如果MCP工具调用不可用,使用API备选方案:
```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.shGet job application details
获取作业应用详情
$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefined$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefinedReport to User
向用户上报信息
Always present this summary after deployment:
Job deployed successfully!
Job: {job-name}
Workspace: {workspace-fqn}
Status: Suspended (deployed, ready to trigger)
Schedule: {cron expression if scheduled, or "Manual trigger"}
To trigger the job:
- Dashboard: Click "Run Job" on the job page
- API: POST /api/svc/v1/jobs/{JOB_ID}/runs
To monitor runs:
- Use the job monitoring commands below
- Or check the TrueFoundry dashboardFor scheduled jobs, also show when the next run will execute.
For manually triggered jobs, remind the user how to trigger them.
部署完成后请始终展示以下汇总信息:
作业部署成功!
作业名称: {job-name}
工作区: {workspace-fqn}
状态: 已暂停(部署完成,可触发运行)
触发方式: {如果是定时任务则展示cron表达式,否则展示「手动触发」}
触发作业方式:
- 控制台: 在作业页面点击「Run Job」
- API: POST /api/svc/v1/jobs/{JOB_ID}/runs
监控运行记录方式:
- 使用下方的作业监控命令
- 或查看TrueFoundry控制台对于定时作业,还需要展示下一次运行的时间。
对于手动触发的作业,提醒用户触发方式。
.tfyignore
.tfyignore文件
Create a file (follows syntax) to exclude files from the Docker build:
.tfyignore.gitignore.git/
__pycache__/
*.pyc
.env
data/创建文件(遵循语法),在Docker构建时排除指定文件:
.tfyignore.gitignore.git/
__pycache__/
*.pyc
.env
data/List Job Runs
列出作业运行记录
When using direct API, set to the full path of this skill's . See for paths per agent.
TFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.md使用直接API调用时,将设置为当前skill的的完整路径。不同Agent的路径请查看。
TFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.mdVia Tool Call
通过工具调用
tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name") # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name") # 获取特定运行记录
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})Via Direct API
通过直接API调用
bash
undefinedbash
undefinedSet the path to tfy-api.sh for your agent (example for Claude Code):
为你的Agent设置tfy-api.sh路径(Claude Code示例):
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
List runs for a job
列出作业的所有运行记录
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs
Get specific run
获取特定运行记录
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME
With filters
带过滤条件
$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefined$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefinedFilter Parameters
过滤参数
| Parameter | API Key | Description |
|---|---|---|
| | Filter runs by name prefix |
| | Sort field (e.g. |
| | Filter by who triggered |
| 参数 | API字段名 | 说明 |
|---|---|---|
| | 按名称前缀过滤运行记录 |
| | 排序字段(例如 |
| | 按触发者过滤 |
Presenting Job Runs
作业运行记录展示格式
Job Runs for data-pipeline:
| Run Name | Status | Started | Duration |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | SUCCEEDED | 2026-02-10 09:00 | 5m 32s |
| run-20260210-2 | FAILED | 2026-02-10 10:00 | 1m 05s |
| run-20260210-3 | RUNNING | 2026-02-10 11:00 | -- |<success_criteria>
data-pipeline的作业运行记录:
| 运行名称 | 状态 | 启动时间 | 运行时长 |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | 运行成功 | 2026-02-10 09:00 | 5m 32s |
| run-20260210-2 | 运行失败 | 2026-02-10 10:00 | 1m 05s |
| run-20260210-3 | 运行中 | 2026-02-10 11:00 | -- |<success_criteria>
Success Criteria
成功标准
- The job has been deployed to the target workspace and the user can see it in the TrueFoundry dashboard
- The user has been provided the job ID and knows how to trigger runs (manually or via cron schedule)
- The agent has reported the deployment status including job name, workspace, and trigger type
- Deployment status is verified automatically immediately after apply/deploy (no extra prompt)
- Job logs are accessible for monitoring via the skill or the dashboard
logs - For scheduled jobs, the cron expression is confirmed and the user knows when the next run will execute
</success_criteria>
<references>- 作业已部署到目标工作区,用户可以在TrueFoundry控制台中看到该作业
- 已为用户提供作业ID,且用户知晓如何触发运行(手动触发或通过cron调度)
- Agent已上报部署状态,包括作业名称、工作区和触发类型
- 部署状态在apply/部署完成后立即自动验证(无额外询问)
- 可通过skill或控制台访问作业日志用于监控
logs - 对于定时作业,已确认cron表达式,且用户知晓下一次运行的时间
</success_criteria>
<references>Composability
可组合性
- Schedule jobs: Use cron trigger for automated scheduling
- Monitor runs: Use the job runs monitoring sections below
- Find job first: Use skill with
applicationsto get job app IDapplication_type: "job" - Check logs: Use skill with
logsto see run outputjob_run_name
- 调度作业:使用cron触发器实现自动调度
- 监控运行记录:使用下方的作业运行监控模块
- 先查找作业:使用skill配合
applications获取作业应用IDapplication_type: "job" - 查看日志:使用skill配合
logs查看运行输出job_run_name
Error Handling
错误处理
Job Not Found
作业不存在
Job ID not found. Use applications skill to list jobs:
tfy_applications_list(filters={"application_type": "job"})未找到对应作业ID。使用applications skill列出所有作业:
tfy_applications_list(filters={"application_type": "job"})No Runs Found
未找到运行记录
No runs found for this job. The job may not have been triggered yet.未找到该作业的运行记录。可能该作业尚未被触发。CLI Errors
CLI错误
- -- Install with
tfy: command not foundpip install 'truefoundry==0.5.0' - validation errors -- Check YAML syntax, ensure required fields (name, type, image, resources, workspace_fqn) are present
tfy apply
- -- 执行
tfy: command not found安装pip install 'truefoundry==0.5.0' - 验证错误 -- 检查YAML语法,确认必填字段(name、type、image、resources、workspace_fqn)已填写
tfy apply