hugging-face-jobs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRunning Workloads on Hugging Face Jobs
在Hugging Face Jobs上运行工作负载
Overview
概述
Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.
Common use cases:
- Data Processing - Transform, filter, or analyze large datasets
- Batch Inference - Run inference on thousands of samples
- Experiments & Benchmarks - Reproducible ML experiments
- Model Training - Fine-tune models (see skill for TRL-specific training)
model-trainer - Synthetic Data Generation - Generate datasets using LLMs
- Development & Testing - Test code without local GPU setup
- Scheduled Jobs - Automate recurring tasks
For model training specifically: See the skill for TRL-based training workflows.
model-trainer在全托管的Hugging Face基础设施上运行任何工作负载。无需本地设置——任务可在云端CPU、GPU或TPU上运行,且能将结果持久化到Hugging Face Hub。
常见使用场景:
- 数据处理 - 转换、过滤或分析大型数据集
- 批处理推理 - 对数千个样本运行推理
- 实验与基准测试 - 可复现的机器学习实验
- 模型训练 - 微调模型(针对TRL特定训练,请查看技能)
model-trainer - 合成数据生成 - 使用大语言模型生成数据集
- 开发与测试 - 无需本地GPU即可测试代码
- 定时任务 - 自动化重复任务
针对模型训练的说明: 基于TRL的训练工作流请使用技能。
model-trainerWhen to Use This Skill
何时使用此技能
Use this skill when users want to:
- Run Python workloads on cloud infrastructure
- Execute jobs without local GPU/TPU setup
- Process data at scale
- Run batch inference or experiments
- Schedule recurring tasks
- Use GPUs/TPUs for any workload
- Persist results to the Hugging Face Hub
当用户希望执行以下操作时,可使用此技能:
- 在云端基础设施上运行Python工作负载
- 无需本地GPU/TPU设置即可执行任务
- 大规模处理数据
- 运行批处理推理或实验
- 定时执行重复任务
- 为任意工作负载使用GPU/TPU
- 将结果持久化到Hugging Face Hub
Key Directives
核心指导原则
When assisting with jobs:
-
ALWAYS useMCP tool - Submit jobs using
hf_jobs()orhf_jobs("uv", {...}). Thehf_jobs("run", {...})parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string toscript.hf_jobs() -
Always handle authentication - Jobs that interact with the Hub requirevia secrets. See Token Usage section below.
HF_TOKEN -
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
-
Set appropriate timeouts - Default 30min may be insufficient for long-running tasks.
协助用户处理任务时:
-
始终使用MCP工具 - 通过
hf_jobs()或hf_jobs("uv", {...})提交任务。hf_jobs("run", {...})参数可直接接受Python代码。除非用户明确要求,否则不要保存到本地文件。将脚本内容作为字符串传递给script。hf_jobs() -
始终处理认证 - 与Hub交互的任务需要通过密钥传入。请查看下方的令牌使用部分。
HF_TOKEN -
提交后提供任务详情 - 提交任务后,提供任务ID、监控URL、预估时间,并告知用户之后可请求查看状态。
-
设置合适的超时时间 - 默认30分钟可能不足以处理长时间运行的任务。
Prerequisites Checklist
前置检查清单
Before starting any job, verify:
启动任何任务前,请验证:
✅ Account & Authentication
✅ 账户与认证
- Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
- Authenticated login: Check with
hf_whoami() - HF_TOKEN for Hub Access ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)
- Token must have appropriate permissions (read for downloads, write for uploads)
- 拥有Pro、Team或Enterprise计划的Hugging Face账户(Jobs功能需要付费计划)
- 已完成认证登录:通过检查
hf_whoami() - 用于Hub访问的HF_TOKEN ⚠️ 至关重要 - 任何Hub操作(推送模型/数据集、下载私有仓库等)都需要
- 令牌必须拥有合适的权限(下载需要读权限,上传需要写权限)
✅ Token Usage (See Token Usage section for details)
✅ 令牌使用(详情请查看令牌使用部分)
When tokens are required:
- Pushing models/datasets to Hub
- Accessing private repositories
- Using Hub APIs in scripts
- Any authenticated Hub operations
How to provide tokens:
python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Recommended: automatic token
}⚠️ CRITICAL: The placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.
$HF_TOKEN需要令牌的场景:
- 向Hub推送模型/数据集
- 访问私有仓库
- 在脚本中使用Hub API
- 任何需要认证的Hub操作
提供令牌的方式:
python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # 推荐:自动令牌
}⚠️ 重要提示: 是占位符,会自动替换为你登录后的实际令牌。切勿在脚本中硬编码令牌。
$HF_TOKENToken Usage Guide
令牌使用指南
Understanding Tokens
了解令牌
What are HF Tokens?
- Authentication credentials for Hugging Face Hub
- Required for authenticated operations (push, private repos, API access)
- Stored securely on your machine after
hf auth login
Token Types:
- Read Token - Can download models/datasets, read private repos
- Write Token - Can push models/datasets, create repos, modify content
- Organization Token - Can act on behalf of an organization
什么是HF令牌?
- 用于Hugging Face Hub的认证凭据
- 执行认证操作(推送、私有仓库访问、API调用)时必需
- 在执行后安全存储在你的设备上
hf auth login
令牌类型:
- 读令牌 - 可下载模型/数据集、读取私有仓库
- 写令牌 - 可推送模型/数据集、创建仓库、修改内容
- 组织令牌 - 可代表组织执行操作
When Tokens Are Required
何时需要令牌
Always Required:
- Pushing models/datasets to Hub
- Accessing private repositories
- Creating new repositories
- Modifying existing repositories
- Using Hub APIs programmatically
Not Required:
- Downloading public models/datasets
- Running jobs that don't interact with Hub
- Reading public repository information
始终需要的场景:
- 向Hub推送模型/数据集
- 访问私有仓库
- 创建新仓库
- 修改现有仓库
- 以编程方式使用Hub API
不需要的场景:
- 下载公开模型/数据集
- 运行不与Hub交互的任务
- 读取公开仓库信息
How to Provide Tokens to Jobs
如何为任务提供令牌
Method 1: Automatic Token (Recommended)
方法1:自动令牌(推荐)
python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Automatic replacement
})How it works:
- is a placeholder that gets replaced with your actual token
$HF_TOKEN - Uses the token from your logged-in session ()
hf auth login - Most secure and convenient method
- Token is encrypted server-side when passed as a secret
Benefits:
- No token exposure in code
- Uses your current login session
- Automatically updated if you re-login
- Works seamlessly with MCP tools
python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 自动替换
})工作原理:
- 是占位符,会替换为你的实际令牌
$HF_TOKEN - 使用你登录会话中的令牌(生成的令牌)
hf auth login - 最安全且便捷的方式
- 作为密钥传递时,令牌会在服务器端加密
优势:
- 代码中不会暴露令牌
- 使用当前登录会话
- 重新登录后会自动更新
- 与MCP工具无缝协作
Method 2: Explicit Token (Not Recommended)
方法2:显式令牌(不推荐)
python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Hardcoded token
})When to use:
- Only if automatic token doesn't work
- Testing with a specific token
- Organization tokens (use with caution)
Security concerns:
- Token visible in code/logs
- Must manually update if token rotates
- Risk of token exposure
python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ 硬编码令牌
})使用场景:
- 仅在自动令牌无法工作时使用
- 使用特定令牌进行测试
- 组织令牌(谨慎使用)
安全隐患:
- 令牌会在代码/日志中可见
- 令牌轮换时必须手动更新
- 存在令牌泄露风险
Method 3: Environment Variable (Less Secure)
方法3:环境变量(安全性较低)
python
hf_jobs("uv", {
"script": "your_script.py",
"env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Less secure than secrets
})Difference from secrets:
- variables are visible in job logs
env - are encrypted server-side
secrets - Always prefer for tokens
secrets
python
hf_jobs("uv", {
"script": "your_script.py",
"env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ 安全性低于密钥
})与密钥的区别:
- 变量会在任务日志中可见
env - 会在服务器端加密
secrets - 始终优先使用存储令牌
secrets
Using Tokens in Scripts
在脚本中使用令牌
In your Python script, tokens are available as environment variables:
python
undefined在Python脚本中,令牌可作为环境变量访问:
python
undefined/// script
/// script
dependencies = ["huggingface-hub"]
dependencies = ["huggingface-hub"]
///
///
import os
from huggingface_hub import HfApi
import os
from huggingface_hub import HfApi
Token is automatically available if passed via secrets
如果通过密钥传递,令牌会自动可用
token = os.environ.get("HF_TOKEN")
token = os.environ.get("HF_TOKEN")
Use with Hub API
与Hub API一起使用
api = HfApi(token=token)
api = HfApi(token=token)
Or let huggingface_hub auto-detect
或者让huggingface-hub自动检测
api = HfApi() # Automatically uses HF_TOKEN env var
**Best practices:**
- Don't hardcode tokens in scripts
- Use `os.environ.get("HF_TOKEN")` to access
- Let `huggingface_hub` auto-detect when possible
- Verify token exists before Hub operationsapi = HfApi() # 自动使用HF_TOKEN环境变量
**最佳实践:**
- 不要在脚本中硬编码令牌
- 使用`os.environ.get("HF_TOKEN")`来获取
- 尽可能让`huggingface-hub`自动检测
- 在执行Hub操作前验证令牌是否存在Token Verification
令牌验证
Check if you're logged in:
python
from huggingface_hub import whoami
user_info = whoami() # Returns your username if authenticatedVerify token in job:
python
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...") # Should start with "hf_"检查是否已登录:
python
from huggingface_hub import whoami
user_info = whoami() # 已认证时会返回你的用户名在任务中验证令牌:
python
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN未找到!"
token = os.environ["HF_TOKEN"]
print(f"令牌开头为: {token[:7]}...") # 应该以"hf_"开头Common Token Issues
常见令牌问题
Error: 401 Unauthorized
- Cause: Token missing or invalid
- Fix: Add to job config
secrets={"HF_TOKEN": "$HF_TOKEN"} - Verify: Check works locally
hf_whoami()
Error: 403 Forbidden
- Cause: Token lacks required permissions
- Fix: Ensure token has write permissions for push operations
- Check: Token type at https://huggingface.co/settings/tokens
Error: Token not found in environment
- Cause: not passed or wrong key name
secrets - Fix: Use (not
secrets={"HF_TOKEN": "$HF_TOKEN"})env - Verify: Script checks
os.environ.get("HF_TOKEN")
Error: Repository access denied
- Cause: Token doesn't have access to private repo
- Fix: Use token from account with access
- Check: Verify repo visibility and your permissions
错误:401 Unauthorized
- 原因: 令牌缺失或无效
- 解决方法: 在任务配置中添加
secrets={"HF_TOKEN": "$HF_TOKEN"} - 验证: 确认本地可正常工作
hf_whoami()
错误:403 Forbidden
- 原因: 令牌缺少所需权限
- 解决方法: 确保令牌拥有推送操作所需的写权限
- 检查: 在https://huggingface.co/settings/tokens查看令牌类型
错误:环境中未找到令牌
- 原因: 未传递或密钥名称错误
secrets - 解决方法: 使用(而非
secrets={"HF_TOKEN": "$HF_TOKEN"})env - 验证: 脚本中检查
os.environ.get("HF_TOKEN")
错误:仓库访问被拒绝
- 原因: 令牌无权访问私有仓库
- 解决方法: 使用拥有访问权限的账户的令牌
- 检查: 验证仓库可见性和你的权限
Token Security Best Practices
令牌安全最佳实践
- Never commit tokens - Use placeholder or environment variables
$HF_TOKEN - Use secrets, not env - Secrets are encrypted server-side
- Rotate tokens regularly - Generate new tokens periodically
- Use minimal permissions - Create tokens with only needed permissions
- Don't share tokens - Each user should use their own token
- Monitor token usage - Check token activity in Hub settings
- 切勿提交令牌 - 使用占位符或环境变量
$HF_TOKEN - 使用密钥而非环境变量 - 密钥会在服务器端加密
- 定期轮换令牌 - 定期生成新令牌
- 使用最小权限 - 创建仅拥有所需权限的令牌
- 不要共享令牌 - 每个用户应使用自己的令牌
- 监控令牌使用情况 - 在Hub设置中查看令牌活动
Complete Token Example
完整令牌示例
python
undefinedpython
undefinedExample: Push results to Hub
示例:将结果推送到Hub
hf_jobs("uv", {
"script": """
hf_jobs("uv", {
"script": """
/// script
/// script
dependencies = ["huggingface-hub", "datasets"]
dependencies = ["huggingface-hub", "datasets"]
///
///
import os
from huggingface_hub import HfApi
from datasets import Dataset
import os
from huggingface_hub import HfApi
from datasets import Dataset
Verify token is available
验证令牌是否可用
assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"
assert "HF_TOKEN" in os.environ, "需要HF_TOKEN!"
Use token for Hub operations
使用令牌执行Hub操作
api = HfApi(token=os.environ["HF_TOKEN"])
api = HfApi(token=os.environ["HF_TOKEN"])
Create and push dataset
创建并推送数据集
data = {"text": ["Hello", "World"]}
dataset = Dataset.from_dict(data)
dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ Dataset pushed successfully!")
""",
"flavor": "cpu-basic",
"timeout": "30m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely
})
undefineddata = {"text": ["Hello", "World"]}
dataset = Dataset.from_dict(data)
dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ 数据集推送成功!")
""",
"flavor": "cpu-basic",
"timeout": "30m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 安全提供令牌
})
undefinedQuick Start: Two Approaches
快速开始:两种方式
Approach 1: UV Scripts (Recommended)
方式1:UV脚本(推荐)
UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.
MCP Tool:
python
hf_jobs("uv", {
"script": """UV脚本使用PEP 723内联依赖,实现简洁、独立的工作负载。
MCP工具:
python
hf_jobs("uv", {
"script": """/// script
/// script
dependencies = ["transformers", "torch"]
dependencies = ["transformers", "torch"]
///
///
from transformers import pipeline
import torch
from transformers import pipeline
import torch
Your workload here
你的工作负载代码
classifier = pipeline("sentiment-analysis")
result = classifier("I love Hugging Face!")
print(result)
""",
"flavor": "cpu-basic",
"timeout": "30m"
})
**CLI Equivalent:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30mPython API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")Benefits: Direct MCP tool usage, clean code, dependencies declared inline, no file saving required
When to use: Default choice for all workloads, custom logic, any scenario requiring
hf_jobs()classifier = pipeline("sentiment-analysis")
result = classifier("I love Hugging Face!")
print(result)
""",
"flavor": "cpu-basic",
"timeout": "30m"
})
**CLI等效命令:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30mPython API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")优势: 直接使用MCP工具,代码简洁,内联声明依赖,无需保存文件
适用场景: 所有工作负载的默认选择、自定义逻辑、任何需要的场景
hf_jobs()Custom Docker Images for UV Scripts
为UV脚本使用自定义Docker镜像
By default, UV scripts use . For ML workloads with complex dependencies, use pre-built images:
ghcr.io/astral-sh/uv:python3.12-bookworm-slimpython
hf_jobs("uv", {
"script": "inference.py",
"image": "vllm/vllm-openai:latest", # Pre-built image with vLLM
"flavor": "a10g-large"
})CLI:
bash
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.pyBenefits: Faster startup, pre-installed dependencies, optimized for specific frameworks
默认情况下,UV脚本使用。对于具有复杂依赖的机器学习工作负载,可使用预构建镜像:
ghcr.io/astral-sh/uv:python3.12-bookworm-slimpython
hf_jobs("uv", {
"script": "inference.py",
"image": "vllm/vllm-openai:latest", # 预构建的vLLM镜像
"flavor": "a10g-large"
})CLI:
bash
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py优势: 启动速度更快,预安装依赖,针对特定框架优化
Python Version
Python版本
By default, UV scripts use Python 3.12. Specify a different version:
python
hf_jobs("uv", {
"script": "my_script.py",
"python": "3.11", # Use Python 3.11
"flavor": "cpu-basic"
})Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")默认情况下,UV脚本使用Python 3.12。可指定其他版本:
python
hf_jobs("uv", {
"script": "my_script.py",
"python": "3.11", # 使用Python 3.11
"flavor": "cpu-basic"
})Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")Working with Scripts
脚本使用注意事项
⚠️ Important: There are two "script path" stories depending on how you run Jobs:
- Using the MCP tool (recommended in this repo): the
hf_jobs()value must be inline code (a string) or a URL. A local filesystem path (likescript) won't exist inside the remote container."./scripts/foo.py" - Using the CLI: local file paths do work (the CLI uploads your script).
hf jobs uv run
Common mistake with MCP tool:
hf_jobs()python
undefined⚠️ 重要提示: 根据运行Jobs的方式,"脚本路径"有两种不同的处理方式:
- 使用MCP工具(本仓库推荐方式):
hf_jobs()的值必须是内联代码(字符串)或URL。本地文件系统路径(如script)在远程容器中不存在。"./scripts/foo.py" - 使用CLI:本地文件路径可正常使用(CLI会上传你的脚本)。
hf jobs uv run
使用 MCP工具的常见错误:
hf_jobs()python
undefined❌ Will fail (remote container can't see your local path)
❌ 会失败(远程容器无法访问你的本地路径)
hf_jobs("uv", {"script": "./scripts/foo.py"})
**Correct patterns with `hf_jobs()` MCP tool:**
```pythonhf_jobs("uv", {"script": "./scripts/foo.py"})
**使用`hf_jobs()` MCP工具的正确方式:**
```python✅ Inline: read the local script file and pass its contents
✅ 内联:读取本地脚本文件并传递其内容
from pathlib import Path
script = Path("hf-jobs/scripts/foo.py").read_text()
hf_jobs("uv", {"script": script})
from pathlib import Path
script = Path("hf-jobs/scripts/foo.py").read_text()
hf_jobs("uv", {"script": script})
✅ URL: host the script somewhere reachable
✅ URL:将脚本托管在可访问的位置
hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})
hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})
✅ URL from GitHub
✅ GitHub URL
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})
**CLI equivalent (local paths supported):**
```bash
hf jobs uv run ./scripts/foo.py -- --your --argshf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})
**等效CLI命令(支持本地路径):**
```bash
hf jobs uv run ./scripts/foo.py -- --your --argsAdding Dependencies at Runtime
在运行时添加依赖
Add extra dependencies beyond what's in the PEP 723 header:
python
hf_jobs("uv", {
"script": "inference.py",
"dependencies": ["transformers", "torch>=2.0"], # Extra deps
"flavor": "a10g-small"
})Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])可添加PEP 723标头之外的额外依赖:
python
hf_jobs("uv", {
"script": "inference.py",
"dependencies": ["transformers", "torch>=2.0"], # 额外依赖
"flavor": "a10g-small"
})Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])Approach 2: Docker-Based Jobs
方式2:基于Docker的任务
Run jobs with custom Docker images and commands.
MCP Tool:
python
hf_jobs("run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Hello from HF Jobs!')"],
"flavor": "cpu-basic",
"timeout": "30m"
})CLI Equivalent:
bash
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"Python API:
python
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")Benefits: Full Docker control, use pre-built images, run any command
When to use: Need specific Docker images, non-Python workloads, complex environments
Example with GPU:
python
hf_jobs("run", {
"image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
"command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
"flavor": "a10g-small",
"timeout": "1h"
})Using Hugging Face Spaces as Images:
You can use Docker images from HF Spaces:
python
hf_jobs("run", {
"image": "hf.co/spaces/lhoestq/duckdb", # Space as Docker image
"command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
"flavor": "cpu-basic"
})CLI:
bash
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"使用自定义Docker镜像和命令运行任务。
MCP工具:
python
hf_jobs("run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Hello from HF Jobs!')"],
"flavor": "cpu-basic",
"timeout": "30m"
})CLI等效命令:
bash
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"Python API:
python
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")优势: 完全控制Docker,可使用预构建镜像,运行任意命令
适用场景: 需要特定Docker镜像、非Python工作负载、复杂环境
GPU示例:
python
hf_jobs("run", {
"image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
"command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
"flavor": "a10g-small",
"timeout": "1h"
})使用Hugging Face Spaces作为镜像:
可使用HF Spaces中的Docker镜像:
python
hf_jobs("run", {
"image": "hf.co/spaces/lhoestq/duckdb", # 将Space作为Docker镜像
"command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
"flavor": "cpu-basic"
})CLI:
bash
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'" Finding More UV Scripts on Hub
在Hub上查找更多UV脚本
The organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
uv-scriptspython
undefineduv-scriptspython
undefinedDiscover available UV script collections
发现可用的UV脚本集合
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
Explore a specific collection
浏览特定集合
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creationhub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
**热门集合:** OCR、分类、合成数据、vLLM、数据集创建Hardware Selection
硬件选择
Reference: HF Jobs Hardware Docs (updated 07/2025)
| Workload Type | Recommended Hardware | Use Case |
|---|---|---|
| Data processing, testing | | Lightweight tasks |
| Small models, demos | | <1B models, quick tests |
| Medium models | | 1-7B models |
| Large models, production | | 7-13B models |
| Very large models | | 13B+ models |
| Batch inference | | High-throughput |
| Multi-GPU workloads | | Parallel/large models |
| TPU workloads | | JAX/Flax, TPU-optimized |
All Available Flavors:
- CPU: ,
cpu-basiccpu-upgrade - GPU: ,
t4-small,t4-medium,l4x1,l4x4,a10g-small,a10g-large,a10g-largex2,a10g-largex4a100-large - TPU: ,
v5e-1x1,v5e-2x2v5e-2x4
Guidelines:
- Start with smaller hardware for testing
- Scale up based on actual needs
- Use multi-GPU for parallel workloads or large models
- Use TPUs for JAX/Flax workloads
- See for detailed specifications
references/hardware_guide.md
参考: HF Jobs硬件文档(2025年7月更新)
| 工作负载类型 | 推荐硬件 | 使用场景 |
|---|---|---|
| 数据处理、测试 | | 轻量级任务 |
| 小型模型、演示 | | 小于10亿参数的模型、快速测试 |
| 中型模型 | | 10亿-70亿参数的模型 |
| 大型模型、生产环境 | | 70亿-130亿参数的模型 |
| 超大型模型 | | 130亿参数以上的模型 |
| 批处理推理 | | 高吞吐量 |
| 多GPU工作负载 | | 并行任务或大型模型 |
| TPU工作负载 | | JAX/Flax、TPU优化任务 |
所有可用硬件规格:
- CPU: 、
cpu-basiccpu-upgrade - GPU: 、
t4-small、t4-medium、l4x1、l4x4、a10g-small、a10g-large、a10g-largex2、a10g-largex4a100-large - TPU: 、
v5e-1x1、v5e-2x2v5e-2x4
选择指南:
- 测试时先使用较小的硬件
- 根据实际需求扩容
- 并行工作负载或大型模型使用多GPU
- JAX/Flax工作负载使用TPU
- 详细规格请查看
references/hardware_guide.md
Critical: Saving Results
关键:保存结果
⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS
The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, ALL WORK IS LOST.
⚠️ 临时环境——必须持久化结果
Jobs环境是临时的。任务结束后所有文件都会被删除。如果不持久化结果,所有工作都会丢失。
Persistence Options
持久化选项
1. Push to Hugging Face Hub (Recommended)
python
undefined1. 推送到Hugging Face Hub(推荐)
python
undefinedPush models
推送模型
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
Push datasets
推送数据集
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
Push artifacts
推送制品
api.upload_file(
path_or_fileobj="results.json",
path_in_repo="results.json",
repo_id="username/results",
token=os.environ["HF_TOKEN"]
)
**2. Use External Storage**
```pythonapi.upload_file(
path_or_fileobj="results.json",
path_in_repo="results.json",
repo_id="username/results",
token=os.environ["HF_TOKEN"]
)
**2. 使用外部存储**
```pythonUpload to S3, GCS, etc.
上传到S3、GCS等
import boto3
s3 = boto3.client('s3')
s3.upload_file('results.json', 'my-bucket', 'results.json')
**3. Send Results via API**
```pythonimport boto3
s3 = boto3.client('s3')
s3.upload_file('results.json', 'my-bucket', 'results.json')
**3. 通过API发送结果**
```pythonPOST results to your API
将结果POST到你的API
import requests
requests.post("https://your-api.com/results", json=results)
undefinedimport requests
requests.post("https://your-api.com/results", json=results)
undefinedRequired Configuration for Hub Push
Hub推送的必要配置
In job submission:
python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
}In script:
python
import os
from huggingface_hub import HfApi任务提交时:
python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # 启用认证
}脚本中:
python
import os
from huggingface_hub import HfApiToken automatically available from secrets
令牌会从密钥中自动获取
api = HfApi(token=os.environ.get("HF_TOKEN"))
api = HfApi(token=os.environ.get("HF_TOKEN"))
Push your results
推送你的结果
api.upload_file(...)
undefinedapi.upload_file(...)
undefinedVerification Checklist
验证清单
Before submitting:
- Results persistence method chosen
- if using Hub
secrets={"HF_TOKEN": "$HF_TOKEN"} - Script handles missing token gracefully
- Test persistence path works
See: for detailed Hub persistence guide
references/hub_saving.md提交前:
- 已选择结果持久化方式
- 如果使用Hub,已添加
secrets={"HF_TOKEN": "$HF_TOKEN"} - 脚本可优雅处理令牌缺失的情况
- 测试持久化路径可正常工作
查看: 获取详细的Hub持久化指南
references/hub_saving.mdTimeout Management
超时管理
⚠️ DEFAULT: 30 MINUTES
Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.
⚠️ 默认:30分钟
任务会在超时后自动停止。对于训练等长时间运行的任务,务必设置自定义超时。
Setting Timeouts
设置超时时间
MCP Tool:
python
{
"timeout": "2h" # 2 hours
}Supported formats:
- Integer/float: seconds (e.g., = 5 minutes)
300 - String with suffix: (minutes),
"5m"(hours),"2h"(days)"1d" - Examples: ,
"90m","2h","1.5h",300"1d"
Python API:
python
from huggingface_hub import run_job, run_uv_job
run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200) # 2 hours in secondsMCP工具:
python
{
"timeout": "2h" # 2小时
}支持的格式:
- 整数/浮点数:秒(如= 5分钟)
300 - 带后缀的字符串:(分钟)、
"5m"(小时)、"2h"(天)"1d" - 示例:、
"90m"、"2h"、"1.5h"、300"1d"
Python API:
python
from huggingface_hub import run_job, run_uv_job
run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200) # 2小时(秒)Timeout Guidelines
超时设置指南
| Scenario | Recommended | Notes |
|---|---|---|
| Quick test | 10-30 min | Verify setup |
| Data processing | 1-2 hours | Depends on data size |
| Batch inference | 2-4 hours | Large batches |
| Experiments | 4-8 hours | Multiple runs |
| Long-running | 8-24 hours | Production workloads |
Always add 20-30% buffer for setup, network delays, and cleanup.
On timeout: Job killed immediately, all unsaved progress lost
| 场景 | 推荐设置 | 说明 |
|---|---|---|
| 快速测试 | 10-30分钟 | 验证设置 |
| 数据处理 | 1-2小时 | 取决于数据大小 |
| 批处理推理 | 2-4小时 | 大型批处理 |
| 实验 | 4-8小时 | 多次运行 |
| 长时间运行任务 | 8-24小时 | 生产环境工作负载 |
始终预留20-30%的缓冲时间,用于启动、网络延迟和清理工作。
超时后: 任务会立即终止,所有未保存的进度都会丢失
Cost Estimation
成本估算
General guidelines:
Total Cost = (Hours of runtime) × (Cost per hour)Example calculations:
Quick test:
- Hardware: cpu-basic ($0.10/hour)
- Time: 15 minutes (0.25 hours)
- Cost: $0.03
Data processing:
- Hardware: l4x1 ($2.50/hour)
- Time: 2 hours
- Cost: $5.00
Batch inference:
- Hardware: a10g-large ($5/hour)
- Time: 4 hours
- Cost: $20.00
Cost optimization tips:
- Start small - Test on cpu-basic or t4-small
- Monitor runtime - Set appropriate timeouts
- Use checkpoints - Resume if job fails
- Optimize code - Reduce unnecessary compute
- Choose right hardware - Don't over-provision
通用公式:
总成本 = 运行时长(小时) × 每小时成本示例计算:
快速测试:
- 硬件:cpu-basic($0.10/小时)
- 时间:15分钟(0.25小时)
- 成本:$0.03
数据处理:
- 硬件:l4x1($2.50/小时)
- 时间:2小时
- 成本:$5.00
批处理推理:
- 硬件:a10g-large($5/小时)
- 时间:4小时
- 成本:$20.00
成本优化技巧:
- 从小规模开始 - 在cpu-basic或t4-small上测试
- 监控运行时间 - 设置合适的超时
- 使用检查点 - 任务失败后可恢复
- 优化代码 - 减少不必要的计算
- 选择合适的硬件 - 不要过度配置
Monitoring and Tracking
监控与追踪
Check Job Status
检查任务状态
MCP Tool:
python
undefinedMCP工具:
python
undefinedList all jobs
列出所有任务
hf_jobs("ps")
hf_jobs("ps")
Inspect specific job
查看特定任务详情
hf_jobs("inspect", {"job_id": "your-job-id"})
hf_jobs("inspect", {"job_id": "your-job-id"})
View logs
查看日志
hf_jobs("logs", {"job_id": "your-job-id"})
hf_jobs("logs", {"job_id": "your-job-id"})
Cancel a job
取消任务
hf_jobs("cancel", {"job_id": "your-job-id"})
**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_jobhf_jobs("cancel", {"job_id": "your-job-id"})
**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_jobList your jobs
列出你的任务
jobs = list_jobs()
jobs = list_jobs()
List running jobs only
仅列出运行中的任务
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
Inspect specific job
查看特定任务详情
job_info = inspect_job(job_id="your-job-id")
job_info = inspect_job(job_id="your-job-id")
View logs
查看日志
for log in fetch_job_logs(job_id="your-job-id"):
print(log)
for log in fetch_job_logs(job_id="your-job-id"):
print(log)
Cancel a job
取消任务
cancel_job(job_id="your-job-id")
**CLI:**
```bash
hf jobs ps # List jobs
hf jobs logs <job-id> # View logs
hf jobs cancel <job-id> # Cancel jobRemember: Wait for user to request status checks. Avoid polling repeatedly.
cancel_job(job_id="your-job-id")
**CLI:**
```bash
hf jobs ps # 列出任务
hf jobs logs <job-id> # 查看日志
hf jobs cancel <job-id> # 取消任务注意: 等待用户请求状态检查。避免重复轮询。
Job URLs
任务URL
After submission, jobs have monitoring URLs:
https://huggingface.co/jobs/username/job-idView logs, status, and details in the browser.
提交任务后,任务会有监控URL:
https://huggingface.co/jobs/username/job-id可在浏览器中查看日志、状态和详情。
Wait for Multiple Jobs
等待多个任务完成
python
import time
from huggingface_hub import inspect_job, run_jobpython
import time
from huggingface_hub import inspect_job, run_jobRun multiple jobs
运行多个任务
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
Wait for all to complete
等待所有任务完成
for job in jobs:
while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
time.sleep(10)
undefinedfor job in jobs:
while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
time.sleep(10)
undefinedScheduled Jobs
定时任务
Run jobs on a schedule using CRON expressions or predefined schedules.
MCP Tool:
python
undefined使用CRON表达式或预定义计划定时运行任务。
MCP工具:
python
undefinedSchedule a UV script that runs every hour
定时运行每小时执行一次的UV脚本
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "@hourly",
"flavor": "cpu-basic"
})
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "@hourly",
"flavor": "cpu-basic"
})
Schedule with CRON syntax
使用CRON语法定时运行
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "0 9 * * 1", # 9 AM every Monday
"flavor": "cpu-basic"
})
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "0 9 * * 1", # 每周一上午9点
"flavor": "cpu-basic"
})
Schedule a Docker-based job
定时运行基于Docker的任务
hf_jobs("scheduled run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Scheduled!')"],
"schedule": "@daily",
"flavor": "cpu-basic"
})
**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_jobhf_jobs("scheduled run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Scheduled!')"],
"schedule": "@daily",
"flavor": "cpu-basic"
})
**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_jobSchedule a Docker job
定时运行Docker任务
create_scheduled_job(
image="python:3.12",
command=["python", "-c", "print('Running on schedule!')"],
schedule="@hourly"
)
create_scheduled_job(
image="python:3.12",
command=["python", "-c", "print('Running on schedule!')"],
schedule="@hourly"
)
Schedule a UV script
定时运行UV脚本
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
Schedule with GPU
定时运行GPU任务
create_scheduled_uv_job(
"ml_inference.py",
schedule="0 */6 * * *", # Every 6 hours
flavor="a10g-small"
)
**Available schedules:**
- `@annually`, `@yearly` - Once per year
- `@monthly` - Once per month
- `@weekly` - Once per week
- `@daily` - Once per day
- `@hourly` - Once per hour
- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)
**Manage scheduled jobs:**
```pythoncreate_scheduled_uv_job(
"ml_inference.py",
schedule="0 */6 * * *", # 每6小时一次
flavor="a10g-small"
)
**可用计划:**
- `@annually`、`@yearly` - 每年一次
- `@monthly` - 每月一次
- `@weekly` - 每周一次
- `@daily` - 每天一次
- `@hourly` - 每小时一次
- CRON表达式 - 自定义计划(如`"*/5 * * * *"`表示每5分钟一次)
**管理定时任务:**
```pythonMCP Tool
MCP工具
hf_jobs("scheduled ps") # List scheduled jobs
hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details
hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause
hf_jobs("scheduled resume", {"job_id": "..."}) # Resume
hf_jobs("scheduled delete", {"job_id": "..."}) # Delete
**Python API for management:**
```python
from huggingface_hub import (
list_scheduled_jobs,
inspect_scheduled_job,
suspend_scheduled_job,
resume_scheduled_job,
delete_scheduled_job
)hf_jobs("scheduled ps") # 列出定时任务
hf_jobs("scheduled inspect", {"job_id": "..."}) # 查看详情
hf_jobs("scheduled suspend", {"job_id": "..."}) # 暂停
hf_jobs("scheduled resume", {"job_id": "..."}) # 恢复
hf_jobs("scheduled delete", {"job_id": "..."}) # 删除
**用于管理的Python API:**
```python
from huggingface_hub import (
list_scheduled_jobs,
inspect_scheduled_job,
suspend_scheduled_job,
resume_scheduled_job,
delete_scheduled_job
)List all scheduled jobs
列出所有定时任务
scheduled = list_scheduled_jobs()
scheduled = list_scheduled_jobs()
Inspect a scheduled job
查看定时任务详情
info = inspect_scheduled_job(scheduled_job_id)
info = inspect_scheduled_job(scheduled_job_id)
Suspend (pause) a scheduled job
暂停定时任务
suspend_scheduled_job(scheduled_job_id)
suspend_scheduled_job(scheduled_job_id)
Resume a scheduled job
恢复定时任务
resume_scheduled_job(scheduled_job_id)
resume_scheduled_job(scheduled_job_id)
Delete a scheduled job
删除定时任务
delete_scheduled_job(scheduled_job_id)
undefineddelete_scheduled_job(scheduled_job_id)
undefinedWebhooks: Trigger Jobs on Events
Webhooks:事件触发任务
Trigger jobs automatically when changes happen in Hugging Face repositories.
Python API:
python
from huggingface_hub import create_webhook当Hugging Face仓库发生变化时,自动触发任务。
Python API:
python
from huggingface_hub import create_webhookCreate webhook that triggers a job when a repo changes
创建Webhook,当仓库变化时触发任务
webhook = create_webhook(
job_id=job.id,
watched=[
{"type": "user", "name": "your-username"},
{"type": "org", "name": "your-org-name"}
],
domains=["repo", "discussion"],
secret="your-secret"
)
**How it works:**
1. Webhook listens for changes in watched repositories
2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable
3. Your script can parse the payload to understand what changed
**Use cases:**
- Auto-process new datasets when uploaded
- Trigger inference when models are updated
- Run tests when code changes
- Generate reports on repository activity
**Access webhook payload in script:**
```python
import os
import json
payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")See Webhooks Documentation for more details.
webhook = create_webhook(
job_id=job.id,
watched=[
{"type": "user", "name": "your-username"},
{"type": "org", "name": "your-org-name"}
],
domains=["repo", "discussion"],
secret="your-secret"
)
**工作原理:**
1. Webhook监听被关注仓库的变化
2. 触发时,任务会在`WEBHOOK_PAYLOAD`环境变量中获取相关信息
3. 你的脚本可解析该负载以了解发生了哪些变化
**使用场景:**
- 上传新数据集时自动处理
- 模型更新时触发推理
- 代码变化时运行测试
- 生成仓库活动报告
**在脚本中访问Webhook负载:**
```python
import os
import json
payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"事件类型: {payload.get('event', {}).get('action')}")查看Webhooks文档获取更多详情。
Common Workload Patterns
常见工作负载模式
This repository ships ready-to-run UV scripts in . Prefer using them instead of inventing new templates.
hf-jobs/scripts/本仓库在中提供了可直接运行的UV脚本。优先使用这些脚本,而非自行编写模板。
hf-jobs/scripts/Pattern 1: Dataset → Model Responses (vLLM) — scripts/generate-responses.py
scripts/generate-responses.py模式1:数据集→模型响应(vLLM)——scripts/generate-responses.py
scripts/generate-responses.pyWhat it does: loads a Hub dataset (chat or a column), applies a model chat template, generates responses with vLLM, and pushes the output dataset + dataset card back to the Hub.
messagespromptRequires: GPU + write token (it pushes a dataset).
python
from pathlib import Path
script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"username/input-dataset",
"username/output-dataset",
"--messages-column", "messages",
"--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
"--temperature", "0.7",
"--top-p", "0.8",
"--max-tokens", "2048",
],
"flavor": "a10g-large",
"timeout": "4h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})功能: 加载Hub数据集(聊天或列),应用模型聊天模板,使用vLLM生成响应,并将输出数据集和数据集卡片推送回Hub。
messagesprompt要求: GPU + 写令牌(需要推送数据集)。
python
from pathlib import Path
script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"username/input-dataset",
"username/output-dataset",
"--messages-column", "messages",
"--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
"--temperature", "0.7",
"--top-p", "0.8",
"--max-tokens", "2048",
],
"flavor": "a10g-large",
"timeout": "4h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})Pattern 2: CoT Self-Instruct Synthetic Data — scripts/cot-self-instruct.py
scripts/cot-self-instruct.py模式2:CoT自指令合成数据——scripts/cot-self-instruct.py
scripts/cot-self-instruct.pyWhat it does: generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then pushes the generated dataset + dataset card to the Hub.
Requires: GPU + write token (it pushes a dataset).
python
from pathlib import Path
script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--seed-dataset", "davanstrien/s1k-reasoning",
"--output-dataset", "username/synthetic-math",
"--task-type", "reasoning",
"--num-samples", "5000",
"--filter-method", "answer-consistency",
],
"flavor": "l4x4",
"timeout": "8h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})功能: 通过CoT自指令生成合成提示/答案,可选择过滤输出(答案一致性/RIP),然后将生成的数据集和数据集卡片推送到Hub。
要求: GPU + 写令牌(需要推送数据集)。
python
from pathlib import Path
script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--seed-dataset", "davanstrien/s1k-reasoning",
"--output-dataset", "username/synthetic-math",
"--task-type", "reasoning",
"--num-samples", "5000",
"--filter-method", "answer-consistency",
],
"flavor": "l4x4",
"timeout": "8h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})Pattern 3: Streaming Dataset Stats (Polars + HF Hub) — scripts/finepdfs-stats.py
scripts/finepdfs-stats.py模式3:流式数据集统计(Polars + HF Hub)——scripts/finepdfs-stats.py
scripts/finepdfs-stats.pyWhat it does: scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.
Requires: CPU is often enough; token needed only if you pass (upload).
--output-repopython
from pathlib import Path
script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--limit", "10000",
"--show-plan",
"--output-repo", "username/finepdfs-temporal-stats",
],
"flavor": "cpu-upgrade",
"timeout": "2h",
"env": {"HF_XET_HIGH_PERFORMANCE": "1"},
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})功能: 直接从Hub扫描parquet文件(无需下载300GB数据),计算时间统计信息,并(可选)将结果上传到Hub数据集仓库。
要求: 通常CPU即可;仅当传递(上传)时需要令牌。
--output-repopython
from pathlib import Path
script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--limit", "10000",
"--show-plan",
"--output-repo", "username/finepdfs-temporal-stats",
],
"flavor": "cpu-upgrade",
"timeout": "2h",
"env": {"HF_XET_HIGH_PERFORMANCE": "1"},
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})Common Failure Modes
常见失败模式
Out of Memory (OOM)
内存不足(OOM)
Fix:
- Reduce batch size or data chunk size
- Process data in smaller batches
- Upgrade hardware: cpu → t4 → a10g → a100
解决方法:
- 减小批处理大小或数据块大小
- 分小批处理数据
- 升级硬件:cpu → t4 → a10g → a100
Job Timeout
任务超时
Fix:
- Check logs for actual runtime
- Increase timeout with buffer:
"timeout": "3h" - Optimize code for faster execution
- Process data in chunks
解决方法:
- 查看日志了解实际运行时间
- 增加超时并预留缓冲:
"timeout": "3h" - 优化代码以加快执行速度
- 分块处理数据
Hub Push Failures
Hub推送失败
Fix:
- Add to job:
secrets={"HF_TOKEN": "$HF_TOKEN"} - Verify token in script:
assert "HF_TOKEN" in os.environ - Check token permissions
- Verify repo exists or can be created
解决方法:
- 在任务中添加:
secrets={"HF_TOKEN": "$HF_TOKEN"} - 在脚本中验证令牌:
assert "HF_TOKEN" in os.environ - 检查令牌权限
- 验证仓库是否存在或可创建
Missing Dependencies
依赖缺失
Fix:
Add to PEP 723 header:
python
undefined解决方法:
在PEP 723标头中添加:
python
undefined/// script
/// script
dependencies = ["package1", "package2>=1.0.0"]
dependencies = ["package1", "package2>=1.0.0"]
///
///
undefinedundefinedAuthentication Errors
认证错误
Fix:
- Check works locally
hf_whoami() - Verify in job config
secrets={"HF_TOKEN": "$HF_TOKEN"} - Re-login:
hf auth login - Check token has required permissions
解决方法:
- 确认本地可正常工作
hf_whoami() - 验证任务配置中包含
secrets={"HF_TOKEN": "$HF_TOKEN"} - 重新登录:
hf auth login - 检查令牌是否拥有所需权限
Troubleshooting
故障排除
Common issues:
- Job times out → Increase timeout, optimize code
- Results not saved → Check persistence method, verify HF_TOKEN
- Out of Memory → Reduce batch size, upgrade hardware
- Import errors → Add dependencies to PEP 723 header
- Authentication errors → Check token, verify secrets parameter
See: for complete troubleshooting guide
references/troubleshooting.md常见问题:
- 任务超时 → 增加超时时间,优化代码
- 结果未保存 → 检查持久化方式,验证HF_TOKEN
- 内存不足 → 减小批处理大小,升级硬件
- 导入错误 → 在PEP 723标头中添加依赖
- 认证错误 → 检查令牌,验证secrets参数
查看: 获取完整的故障排除指南
references/troubleshooting.mdResources
资源
References (In This Skill)
本技能中的参考文档
- - Complete token usage guide
references/token_usage.md - - Hardware specs and selection
references/hardware_guide.md - - Hub persistence guide
references/hub_saving.md - - Common issues and solutions
references/troubleshooting.md
- - 完整的令牌使用指南
references/token_usage.md - - 硬件规格与选择
references/hardware_guide.md - - Hub持久化指南
references/hub_saving.md - - 常见问题与解决方案
references/troubleshooting.md
Scripts (In This Skill)
本技能中的脚本
- - vLLM batch generation: dataset → responses → push to Hub
scripts/generate-responses.py - - CoT Self-Instruct synthetic data generation + filtering → push to Hub
scripts/cot-self-instruct.py - - Polars streaming stats over
scripts/finepdfs-stats.pyparquet on Hub (optional push)finepdfs-edu
- - vLLM批处理生成:数据集→响应→推送到Hub
scripts/generate-responses.py - - CoT自指令合成数据生成+过滤→推送到Hub
scripts/cot-self-instruct.py - - 对Hub上的
scripts/finepdfs-stats.pyparquet文件进行Polars流式统计(可选推送)finepdfs-edu
External Links
外部链接
Official Documentation:
- HF Jobs Guide - Main documentation
- HF Jobs CLI Reference - Command line interface
- HF Jobs API Reference - Python API details
- Hardware Flavors Reference - Available hardware
Related Tools:
- UV Scripts Guide - PEP 723 inline dependencies
- UV Scripts Organization - Community UV script collection
- HF Hub Authentication - Token setup
- Webhooks Documentation - Event triggers
官方文档:
- HF Jobs指南 - 主文档
- HF Jobs CLI参考 - 命令行接口
- HF Jobs API参考 - Python API详情
- 硬件规格参考 - 可用硬件
相关工具:
- UV脚本指南 - PEP 723内联依赖
- UV脚本组织 - 社区UV脚本集合
- HF Hub认证 - 令牌设置
- Webhooks文档 - 事件触发
Key Takeaways
核心要点
- Submit scripts inline - The parameter accepts Python code directly; no file saving required unless user requests
script - Jobs are asynchronous - Don't wait/poll; let user check when ready
- Always set timeout - Default 30 min may be insufficient; set appropriate timeout
- Always persist results - Environment is ephemeral; without persistence, all work is lost
- Use tokens securely - Always use for Hub operations
secrets={"HF_TOKEN": "$HF_TOKEN"} - Choose appropriate hardware - Start small, scale up based on needs (see hardware guide)
- Use UV scripts - Default to with inline scripts for Python workloads
hf_jobs("uv", {...}) - Handle authentication - Verify tokens are available before Hub operations
- Monitor jobs - Provide job URLs and status check commands
- Optimize costs - Choose right hardware, set appropriate timeouts
- 内联提交脚本 - 参数可直接接受Python代码;除非用户要求,否则无需保存文件
script - 任务是异步的 - 不要等待/轮询;让用户在需要时检查
- 始终设置超时 - 默认30分钟可能不足;设置合适的超时时间
- 始终持久化结果 - 环境是临时的;不持久化的话所有工作都会丢失
- 安全使用令牌 - 执行Hub操作时始终使用
secrets={"HF_TOKEN": "$HF_TOKEN"} - 选择合适的硬件 - 从小规模开始,根据需求扩容(查看硬件指南)
- 使用UV脚本 - 对于Python工作负载,默认使用和内联脚本
hf_jobs("uv", {...}) - 处理认证 - 在执行Hub操作前验证令牌是否可用
- 监控任务 - 提供任务URL和状态检查命令
- 优化成本 - 选择合适的硬件,设置合适的超时时间
Quick Reference: MCP Tool vs CLI vs Python API
快速参考:MCP工具 vs CLI vs Python API
| Operation | MCP Tool | CLI | Python API |
|---|---|---|---|
| Run UV script | | | |
| Run Docker job | | | |
| List jobs | | | |
| View logs | | | |
| Cancel job | | | |
| Schedule UV | | - | |
| Schedule Docker | | - | |
| 操作 | MCP工具 | CLI | Python API |
|---|---|---|---|
| 运行UV脚本 | | | |
| 运行Docker任务 | | | |
| 列出任务 | | | |
| 查看日志 | | | |
| 取消任务 | | | |
| 定时运行UV脚本 | | - | |
| 定时运行Docker任务 | | - | |
| ", |