hugging-face-jobs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Running Workloads on Hugging Face Jobs

在Hugging Face Jobs上运行工作负载

Overview

概述

Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.

Common use cases:

Data Processing - Transform, filter, or analyze large datasets
Batch Inference - Run inference on thousands of samples
Experiments & Benchmarks - Reproducible ML experiments
Model Training - Fine-tune models (see
```
model-trainer
```
skill for TRL-specific training)
Synthetic Data Generation - Generate datasets using LLMs
Development & Testing - Test code without local GPU setup
Scheduled Jobs - Automate recurring tasks

For model training specifically: See the

model-trainer

skill for TRL-based training workflows.

在全托管的Hugging Face基础设施上运行任何工作负载。无需本地设置——任务可在云端CPU、GPU或TPU上运行，且能将结果持久化到Hugging Face Hub。

常见使用场景：

数据处理 - 转换、过滤或分析大型数据集
批处理推理 - 对数千个样本运行推理
实验与基准测试 - 可复现的机器学习实验
模型训练 - 微调模型（针对TRL特定训练，请查看
```
model-trainer
```
技能）
合成数据生成 - 使用大语言模型生成数据集
开发与测试 - 无需本地GPU即可测试代码
定时任务 - 自动化重复任务

针对模型训练的说明： 基于TRL的训练工作流请使用

model-trainer

技能。

When to Use This Skill

何时使用此技能

Use this skill when users want to:

Run Python workloads on cloud infrastructure
Execute jobs without local GPU/TPU setup
Process data at scale
Run batch inference or experiments
Schedule recurring tasks
Use GPUs/TPUs for any workload
Persist results to the Hugging Face Hub

当用户希望执行以下操作时，可使用此技能：

在云端基础设施上运行Python工作负载
无需本地GPU/TPU设置即可执行任务
大规模处理数据
运行批处理推理或实验
定时执行重复任务
为任意工作负载使用GPU/TPU
将结果持久化到Hugging Face Hub

Key Directives

核心指导原则

When assisting with jobs:

ALWAYS use
hf_jobs()
MCP tool - Submit jobs using
```
hf_jobs("uv", {...})
```
or
```
hf_jobs("run", {...})
```
. The
```
script
```
parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to
```
hf_jobs()
```
.
Always handle authentication - Jobs that interact with the Hub require
```
HF_TOKEN
```
via secrets. See Token Usage section below.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Set appropriate timeouts - Default 30min may be insufficient for long-running tasks.

协助用户处理任务时：

始终使用
hf_jobs()
MCP工具 - 通过
```
hf_jobs("uv", {...})
```
或
```
hf_jobs("run", {...})
```
提交任务。
```
script
```
参数可直接接受Python代码。除非用户明确要求，否则不要保存到本地文件。将脚本内容作为字符串传递给
```
hf_jobs()
```
。
始终处理认证 - 与Hub交互的任务需要通过密钥传入
```
HF_TOKEN
```
。请查看下方的令牌使用部分。
提交后提供任务详情 - 提交任务后，提供任务ID、监控URL、预估时间，并告知用户之后可请求查看状态。
设置合适的超时时间 - 默认30分钟可能不足以处理长时间运行的任务。

Prerequisites Checklist

前置检查清单

Before starting any job, verify:

启动任何任务前，请验证：

✅ Account & Authentication

✅ 账户与认证

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with
```
hf_whoami()
```
HF_TOKEN for Hub Access ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)
Token must have appropriate permissions (read for downloads, write for uploads)

拥有Pro、Team或Enterprise计划的Hugging Face账户（Jobs功能需要付费计划）
已完成认证登录：通过
```
hf_whoami()
```
检查
用于Hub访问的HF_TOKEN ⚠️ 至关重要 - 任何Hub操作（推送模型/数据集、下载私有仓库等）都需要
令牌必须拥有合适的权限（下载需要读权限，上传需要写权限）

✅ Token Usage (See Token Usage section for details)

✅ 令牌使用（详情请查看令牌使用部分）

When tokens are required:

Pushing models/datasets to Hub
Accessing private repositories
Using Hub APIs in scripts
Any authenticated Hub operations

How to provide tokens:

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Recommended: automatic token
}

⚠️ CRITICAL: The

$HF_TOKEN

placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.

需要令牌的场景：

向Hub推送模型/数据集
访问私有仓库
在脚本中使用Hub API
任何需要认证的Hub操作

提供令牌的方式：

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 推荐：自动令牌
}

⚠️ 重要提示：

$HF_TOKEN

是占位符，会自动替换为你登录后的实际令牌。切勿在脚本中硬编码令牌。

Token Usage Guide

令牌使用指南

Understanding Tokens

了解令牌

What are HF Tokens?

Authentication credentials for Hugging Face Hub
Required for authenticated operations (push, private repos, API access)
Stored securely on your machine after
```
hf auth login
```

Token Types:

Read Token - Can download models/datasets, read private repos
Write Token - Can push models/datasets, create repos, modify content
Organization Token - Can act on behalf of an organization

什么是HF令牌？

用于Hugging Face Hub的认证凭据
执行认证操作（推送、私有仓库访问、API调用）时必需
在执行
```
hf auth login
```
后安全存储在你的设备上

令牌类型：

读令牌 - 可下载模型/数据集、读取私有仓库
写令牌 - 可推送模型/数据集、创建仓库、修改内容
组织令牌 - 可代表组织执行操作

When Tokens Are Required

何时需要令牌

Always Required:

Pushing models/datasets to Hub
Accessing private repositories
Creating new repositories
Modifying existing repositories
Using Hub APIs programmatically

Not Required:

Downloading public models/datasets
Running jobs that don't interact with Hub
Reading public repository information

始终需要的场景：

向Hub推送模型/数据集
访问私有仓库
创建新仓库
修改现有仓库
以编程方式使用Hub API

不需要的场景：

下载公开模型/数据集
运行不与Hub交互的任务
读取公开仓库信息

How to Provide Tokens to Jobs

如何为任务提供令牌

Method 1: Automatic Token (Recommended)

方法1：自动令牌（推荐）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ Automatic replacement
})

How it works:

```
$HF_TOKEN
```
is a placeholder that gets replaced with your actual token
Uses the token from your logged-in session (
```
hf auth login
```
)
Most secure and convenient method
Token is encrypted server-side when passed as a secret

Benefits:

No token exposure in code
Uses your current login session
Automatically updated if you re-login
Works seamlessly with MCP tools

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ 自动替换
})

工作原理：

```
$HF_TOKEN
```
是占位符，会替换为你的实际令牌
使用你登录会话中的令牌（
```
hf auth login
```
生成的令牌）
最安全且便捷的方式
作为密钥传递时，令牌会在服务器端加密

优势：

代码中不会暴露令牌
使用当前登录会话
重新登录后会自动更新
与MCP工具无缝协作

Method 2: Explicit Token (Not Recommended)

方法2：显式令牌（不推荐）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Hardcoded token
})

When to use:

Only if automatic token doesn't work
Testing with a specific token
Organization tokens (use with caution)

Security concerns:

Token visible in code/logs
Must manually update if token rotates
Risk of token exposure

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 硬编码令牌
})

使用场景：

仅在自动令牌无法工作时使用
使用特定令牌进行测试
组织令牌（谨慎使用）

安全隐患：

令牌会在代码/日志中可见
令牌轮换时必须手动更新
存在令牌泄露风险

Method 3: Environment Variable (Less Secure)

方法3：环境变量（安全性较低）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Less secure than secrets
})

Difference from secrets:

```
env
```
variables are visible in job logs
```
secrets
```
are encrypted server-side
Always prefer
```
secrets
```
for tokens

python

hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 安全性低于密钥
})

与密钥的区别：

```
env
```
变量会在任务日志中可见
```
secrets
```
会在服务器端加密
始终优先使用
```
secrets
```
存储令牌

Using Tokens in Scripts

在脚本中使用令牌

In your Python script, tokens are available as environment variables:

python

undefined

在Python脚本中，令牌可作为环境变量访问：

python

undefined

/// script

dependencies = ["huggingface-hub"]

///

import os from huggingface_hub import HfApi

Token is automatically available if passed via secrets

如果通过密钥传递，令牌会自动可用

token = os.environ.get("HF_TOKEN")

Use with Hub API

与Hub API一起使用

api = HfApi(token=token)

Or let huggingface_hub auto-detect

或者让huggingface-hub自动检测

api = HfApi() # Automatically uses HF_TOKEN env var


**Best practices:**
- Don't hardcode tokens in scripts
- Use `os.environ.get("HF_TOKEN")` to access
- Let `huggingface_hub` auto-detect when possible
- Verify token exists before Hub operations

api = HfApi() # 自动使用HF_TOKEN环境变量


**最佳实践：**
- 不要在脚本中硬编码令牌
- 使用`os.environ.get("HF_TOKEN")`来获取
- 尽可能让`huggingface-hub`自动检测
- 在执行Hub操作前验证令牌是否存在

Token Verification

令牌验证

Check if you're logged in:

python

from huggingface_hub import whoami
user_info = whoami()  # Returns your username if authenticated

Verify token in job:

python

import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...")  # Should start with "hf_"

检查是否已登录：

python

from huggingface_hub import whoami
user_info = whoami()  # 已认证时会返回你的用户名

在任务中验证令牌：

python

import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN未找到！"
token = os.environ["HF_TOKEN"]
print(f"令牌开头为: {token[:7]}...")  # 应该以"hf_"开头

Common Token Issues

常见令牌问题

Error: 401 Unauthorized

Cause: Token missing or invalid
Fix: Add
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
to job config
Verify: Check
```
hf_whoami()
```
works locally

Error: 403 Forbidden

Cause: Token lacks required permissions
Fix: Ensure token has write permissions for push operations
Check: Token type at https://huggingface.co/settings/tokens

Error: Token not found in environment

Cause:
```
secrets
```
not passed or wrong key name
Fix: Use
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
(not
```
env
```
)
Verify: Script checks
```
os.environ.get("HF_TOKEN")
```

Error: Repository access denied

Cause: Token doesn't have access to private repo
Fix: Use token from account with access
Check: Verify repo visibility and your permissions

错误：401 Unauthorized

原因： 令牌缺失或无效
解决方法： 在任务配置中添加
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
验证： 确认本地
```
hf_whoami()
```
可正常工作

错误：403 Forbidden

原因： 令牌缺少所需权限
解决方法： 确保令牌拥有推送操作所需的写权限
检查： 在https://huggingface.co/settings/tokens查看令牌类型

错误：环境中未找到令牌

原因： 未传递
```
secrets
```
或密钥名称错误
解决方法： 使用
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
（而非
```
env
```
）
验证： 脚本中检查
```
os.environ.get("HF_TOKEN")
```

错误：仓库访问被拒绝

原因： 令牌无权访问私有仓库
解决方法： 使用拥有访问权限的账户的令牌
检查： 验证仓库可见性和你的权限

Token Security Best Practices

令牌安全最佳实践

Never commit tokens - Use
```
$HF_TOKEN
```
placeholder or environment variables
Use secrets, not env - Secrets are encrypted server-side
Rotate tokens regularly - Generate new tokens periodically
Use minimal permissions - Create tokens with only needed permissions
Don't share tokens - Each user should use their own token
Monitor token usage - Check token activity in Hub settings

切勿提交令牌 - 使用
```
$HF_TOKEN
```
占位符或环境变量
使用密钥而非环境变量 - 密钥会在服务器端加密
定期轮换令牌 - 定期生成新令牌
使用最小权限 - 创建仅拥有所需权限的令牌
不要共享令牌 - 每个用户应使用自己的令牌
监控令牌使用情况 - 在Hub设置中查看令牌活动

Complete Token Example

完整令牌示例

python

undefined

python

undefined

Example: Push results to Hub

示例：将结果推送到Hub

hf_jobs("uv", { "script": """

/// script

dependencies = ["huggingface-hub", "datasets"]

///

import os from huggingface_hub import HfApi from datasets import Dataset

Verify token is available

验证令牌是否可用

assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"

assert "HF_TOKEN" in os.environ, "需要HF_TOKEN！"

Use token for Hub operations

使用令牌执行Hub操作

api = HfApi(token=os.environ["HF_TOKEN"])

Create and push dataset

创建并推送数据集

data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])

print("✅ Dataset pushed successfully!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely })

undefined

data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])

print("✅ 数据集推送成功！") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 安全提供令牌 })

undefined

Quick Start: Two Approaches

快速开始：两种方式

Approach 1: UV Scripts (Recommended)

方式1：UV脚本（推荐）

UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.

MCP Tool:

python

hf_jobs("uv", {
    "script": """

UV脚本使用PEP 723内联依赖，实现简洁、独立的工作负载。

MCP工具：

python

hf_jobs("uv", {
    "script": """

/// script

dependencies = ["transformers", "torch"]

///

from transformers import pipeline import torch

Your workload here

你的工作负载代码

classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })


**CLI Equivalent:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")

Benefits: Direct MCP tool usage, clean code, dependencies declared inline, no file saving required

When to use: Default choice for all workloads, custom logic, any scenario requiring

hf_jobs()

classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })


**CLI等效命令：**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")

优势： 直接使用MCP工具，代码简洁，内联声明依赖，无需保存文件

适用场景： 所有工作负载的默认选择、自定义逻辑、任何需要

hf_jobs()

的场景

Custom Docker Images for UV Scripts

为UV脚本使用自定义Docker镜像

By default, UV scripts use

ghcr.io/astral-sh/uv:python3.12-bookworm-slim

. For ML workloads with complex dependencies, use pre-built images:

python

hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # Pre-built image with vLLM
    "flavor": "a10g-large"
})

CLI:

bash

hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py

Benefits: Faster startup, pre-installed dependencies, optimized for specific frameworks

默认情况下，UV脚本使用

ghcr.io/astral-sh/uv:python3.12-bookworm-slim

。对于具有复杂依赖的机器学习工作负载，可使用预构建镜像：

python

hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # 预构建的vLLM镜像
    "flavor": "a10g-large"
})

CLI：

bash

hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py

优势： 启动速度更快，预安装依赖，针对特定框架优化

Python Version

Python版本

By default, UV scripts use Python 3.12. Specify a different version:

python

hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # Use Python 3.11
    "flavor": "cpu-basic"
})

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")

默认情况下，UV脚本使用Python 3.12。可指定其他版本：

python

hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # 使用Python 3.11
    "flavor": "cpu-basic"
})

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")

Working with Scripts

脚本使用注意事项

⚠️ Important: There are two "script path" stories depending on how you run Jobs:

Using the
hf_jobs()
MCP tool (recommended in this repo): the
```
script
```
value must be inline code (a string) or a URL. A local filesystem path (like
```
"./scripts/foo.py"
```
) won't exist inside the remote container.
Using the
hf jobs uv run
CLI: local file paths do work (the CLI uploads your script).

Common mistake with
hf_jobs()
MCP tool:

python

undefined

⚠️ 重要提示： 根据运行Jobs的方式，"脚本路径"有两种不同的处理方式：

使用
hf_jobs()
MCP工具（本仓库推荐方式）：
```
script
```
的值必须是内联代码（字符串）或URL。本地文件系统路径（如
```
"./scripts/foo.py"
```
）在远程容器中不存在。
使用
hf jobs uv run
CLI：本地文件路径可正常使用（CLI会上传你的脚本）。

使用
hf_jobs()
MCP工具的常见错误：

python

undefined

❌ Will fail (remote container can't see your local path)

❌ 会失败（远程容器无法访问你的本地路径）

hf_jobs("uv", {"script": "./scripts/foo.py"})


**Correct patterns with `hf_jobs()` MCP tool:**

```python

hf_jobs("uv", {"script": "./scripts/foo.py"})


**使用`hf_jobs()` MCP工具的正确方式：**

```python

✅ Inline: read the local script file and pass its contents

✅ 内联：读取本地脚本文件并传递其内容

from pathlib import Path script = Path("hf-jobs/scripts/foo.py").read_text() hf_jobs("uv", {"script": script})

✅ URL: host the script somewhere reachable

✅ URL：将脚本托管在可访问的位置

hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})

✅ URL from GitHub

✅ GitHub URL

hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})


**CLI equivalent (local paths supported):**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})


**等效CLI命令（支持本地路径）：**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

Adding Dependencies at Runtime

在运行时添加依赖

Add extra dependencies beyond what's in the PEP 723 header:

python

hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # Extra deps
    "flavor": "a10g-small"
})

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])

可添加PEP 723标头之外的额外依赖：

python

hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # 额外依赖
    "flavor": "a10g-small"
})

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])

Approach 2: Docker-Based Jobs

方式2：基于Docker的任务

Run jobs with custom Docker images and commands.

MCP Tool:

python

hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})

CLI Equivalent:

bash

hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"

Python API:

python

from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")

Benefits: Full Docker control, use pre-built images, run any command When to use: Need specific Docker images, non-Python workloads, complex environments

Example with GPU:

python

hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})

Using Hugging Face Spaces as Images:

You can use Docker images from HF Spaces:

python

hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # Space as Docker image
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})

CLI:

bash

hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"

使用自定义Docker镜像和命令运行任务。

MCP工具：

python

hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})

CLI等效命令：

bash

hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"

Python API：

python

from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")

优势： 完全控制Docker，可使用预构建镜像，运行任意命令 适用场景： 需要特定Docker镜像、非Python工作负载、复杂环境

GPU示例：

python

hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})

使用Hugging Face Spaces作为镜像：

可使用HF Spaces中的Docker镜像：

python

hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # 将Space作为Docker镜像
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})

CLI：

bash

hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"

Finding More UV Scripts on Hub

在Hub上查找更多UV脚本

The

uv-scripts

organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:

python

undefined

uv-scripts

组织在Hugging Face Hub上提供了可直接使用的UV脚本，以数据集形式存储：

python

undefined

Discover available UV script collections

发现可用的UV脚本集合

dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

Explore a specific collection

浏览特定集合

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**热门集合：** OCR、分类、合成数据、vLLM、数据集创建

Hardware Selection

硬件选择

Reference: HF Jobs Hardware Docs (updated 07/2025)

Workload Type	Recommended Hardware	Use Case
Data processing, testing	`cpu-basic` , `cpu-upgrade`	Lightweight tasks
Small models, demos	`t4-small`	<1B models, quick tests
Medium models	`t4-medium` , `l4x1`	1-7B models
Large models, production	`a10g-small` , `a10g-large`	7-13B models
Very large models	`a100-large`	13B+ models
Batch inference	`a10g-large` , `a100-large`	High-throughput
Multi-GPU workloads	`l4x4` , `a10g-largex2` , `a10g-largex4`	Parallel/large models
TPU workloads	`v5e-1x1` , `v5e-2x2` , `v5e-2x4`	JAX/Flax, TPU-optimized

All Available Flavors:

CPU:
```
cpu-basic
```
,
```
cpu-upgrade
```

GPU:

t4-small

t4-medium

l4x1

l4x4

a10g-small

a10g-large

a10g-largex2

a10g-largex4

a100-large

TPU:
```
v5e-1x1
```
,
```
v5e-2x2
```
,
```
v5e-2x4
```

Guidelines:

Start with smaller hardware for testing
Scale up based on actual needs
Use multi-GPU for parallel workloads or large models
Use TPUs for JAX/Flax workloads
See
```
references/hardware_guide.md
```
for detailed specifications

参考： HF Jobs硬件文档（2025年7月更新）

工作负载类型	推荐硬件	使用场景
数据处理、测试	`cpu-basic` 、 `cpu-upgrade`	轻量级任务
小型模型、演示	`t4-small`	小于10亿参数的模型、快速测试
中型模型	`t4-medium` 、 `l4x1`	10亿-70亿参数的模型
大型模型、生产环境	`a10g-small` 、 `a10g-large`	70亿-130亿参数的模型
超大型模型	`a100-large`	130亿参数以上的模型
批处理推理	`a10g-large` 、 `a100-large`	高吞吐量
多GPU工作负载	`l4x4` 、 `a10g-largex2` 、 `a10g-largex4`	并行任务或大型模型
TPU工作负载	`v5e-1x1` 、 `v5e-2x2` 、 `v5e-2x4`	JAX/Flax、TPU优化任务

所有可用硬件规格：

CPU：
```
cpu-basic
```
、
```
cpu-upgrade
```

GPU：

t4-small

、

t4-medium

、

l4x1

、

l4x4

、

a10g-small

、

a10g-large

、

a10g-largex2

、

a10g-largex4

、

a100-large

TPU：
```
v5e-1x1
```
、
```
v5e-2x2
```
、
```
v5e-2x4
```

选择指南：

测试时先使用较小的硬件
根据实际需求扩容
并行工作负载或大型模型使用多GPU
JAX/Flax工作负载使用TPU
详细规格请查看
```
references/hardware_guide.md
```

Critical: Saving Results

关键：保存结果

⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS

The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, ALL WORK IS LOST.

⚠️ 临时环境——必须持久化结果

Jobs环境是临时的。任务结束后所有文件都会被删除。如果不持久化结果，所有工作都会丢失。

Persistence Options

持久化选项

1. Push to Hugging Face Hub (Recommended)

python

undefined

1. 推送到Hugging Face Hub（推荐）

python

undefined

Push models

推送模型

model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])

Push datasets

推送数据集

dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])

Push artifacts

推送制品

api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )


**2. Use External Storage**

```python

api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )


**2. 使用外部存储**

```python

Upload to S3, GCS, etc.

上传到S3、GCS等

import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')


**3. Send Results via API**

```python

import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')


**3. 通过API发送结果**

```python

POST results to your API

将结果POST到你的API

import requests requests.post("https://your-api.com/results", json=results)

undefined

import requests requests.post("https://your-api.com/results", json=results)

undefined

Required Configuration for Hub Push

Hub推送的必要配置

In job submission:

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}

In script:

python

import os
from huggingface_hub import HfApi

任务提交时：

python

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 启用认证
}

脚本中：

python

import os
from huggingface_hub import HfApi

Token automatically available from secrets

令牌会从密钥中自动获取

api = HfApi(token=os.environ.get("HF_TOKEN"))

Push your results

推送你的结果

api.upload_file(...)

undefined

api.upload_file(...)

undefined

Verification Checklist

验证清单

Before submitting:

Results persistence method chosen
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
if using Hub
Script handles missing token gracefully
Test persistence path works

See:

references/hub_saving.md

for detailed Hub persistence guide

提交前：

已选择结果持久化方式
如果使用Hub，已添加
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
脚本可优雅处理令牌缺失的情况
测试持久化路径可正常工作

查看：

references/hub_saving.md

获取详细的Hub持久化指南

Timeout Management

超时管理

⚠️ DEFAULT: 30 MINUTES

Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.

⚠️ 默认：30分钟

任务会在超时后自动停止。对于训练等长时间运行的任务，务必设置自定义超时。

Setting Timeouts

设置超时时间

MCP Tool:

python

{
    "timeout": "2h"   # 2 hours
}

Supported formats:

Integer/float: seconds (e.g.,
```
300
```
= 5 minutes)
String with suffix:
```
"5m"
```
(minutes),
```
"2h"
```
(hours),
```
"1d"
```
(days)
Examples:
```
"90m"
```
,
```
"2h"
```
,
```
"1.5h"
```
,
```
300
```
,
```
"1d"
```

Python API:

python

from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2 hours in seconds

MCP工具：

python

{
    "timeout": "2h"   # 2小时
}

支持的格式：

整数/浮点数：秒（如
```
300
```
= 5分钟）
带后缀的字符串：
```
"5m"
```
（分钟）、
```
"2h"
```
（小时）、
```
"1d"
```
（天）
示例：
```
"90m"
```
、
```
"2h"
```
、
```
"1.5h"
```
、
```
300
```
、
```
"1d"
```

Python API：

python

from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2小时（秒）

Timeout Guidelines

超时设置指南

Scenario	Recommended	Notes
Quick test	10-30 min	Verify setup
Data processing	1-2 hours	Depends on data size
Batch inference	2-4 hours	Large batches
Experiments	4-8 hours	Multiple runs
Long-running	8-24 hours	Production workloads

Always add 20-30% buffer for setup, network delays, and cleanup.

On timeout: Job killed immediately, all unsaved progress lost

场景	推荐设置	说明
快速测试	10-30分钟	验证设置
数据处理	1-2小时	取决于数据大小
批处理推理	2-4小时	大型批处理
实验	4-8小时	多次运行
长时间运行任务	8-24小时	生产环境工作负载

始终预留20-30%的缓冲时间，用于启动、网络延迟和清理工作。

超时后： 任务会立即终止，所有未保存的进度都会丢失

Cost Estimation

成本估算

General guidelines:

Total Cost = (Hours of runtime) × (Cost per hour)

Example calculations:

Quick test:

Hardware: cpu-basic ($0.10/hour)
Time: 15 minutes (0.25 hours)
Cost: $0.03

Data processing:

Hardware: l4x1 ($2.50/hour)
Time: 2 hours
Cost: $5.00

Batch inference:

Hardware: a10g-large ($5/hour)
Time: 4 hours
Cost: $20.00

Cost optimization tips:

Start small - Test on cpu-basic or t4-small
Monitor runtime - Set appropriate timeouts
Use checkpoints - Resume if job fails
Optimize code - Reduce unnecessary compute
Choose right hardware - Don't over-provision

通用公式：

总成本 = 运行时长（小时） × 每小时成本

示例计算：

快速测试：

硬件：cpu-basic（$0.10/小时）
时间：15分钟（0.25小时）
成本：$0.03

数据处理：

硬件：l4x1（$2.50/小时）
时间：2小时
成本：$5.00

批处理推理：

硬件：a10g-large（$5/小时）
时间：4小时
成本：$20.00

成本优化技巧：

从小规模开始 - 在cpu-basic或t4-small上测试
监控运行时间 - 设置合适的超时
使用检查点 - 任务失败后可恢复
优化代码 - 减少不必要的计算
选择合适的硬件 - 不要过度配置

Monitoring and Tracking

监控与追踪

Check Job Status

检查任务状态

MCP Tool:

python

undefined

MCP工具：

python

undefined

List all jobs

列出所有任务

hf_jobs("ps")

Inspect specific job

查看特定任务详情

hf_jobs("inspect", {"job_id": "your-job-id"})

View logs

查看日志

hf_jobs("logs", {"job_id": "your-job-id"})

Cancel a job

取消任务

hf_jobs("cancel", {"job_id": "your-job-id"})


**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job

hf_jobs("cancel", {"job_id": "your-job-id"})


**Python API：**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job

List your jobs

列出你的任务

jobs = list_jobs()

List running jobs only

仅列出运行中的任务

running = [j for j in list_jobs() if j.status.stage == "RUNNING"]

Inspect specific job

查看特定任务详情

job_info = inspect_job(job_id="your-job-id")

View logs

查看日志

for log in fetch_job_logs(job_id="your-job-id"): print(log)

Cancel a job

取消任务

cancel_job(job_id="your-job-id")


**CLI:**
```bash
hf jobs ps                    # List jobs
hf jobs logs <job-id>         # View logs
hf jobs cancel <job-id>       # Cancel job

Remember: Wait for user to request status checks. Avoid polling repeatedly.

cancel_job(job_id="your-job-id")


**CLI：**
```bash
hf jobs ps                    # 列出任务
hf jobs logs <job-id>         # 查看日志
hf jobs cancel <job-id>       # 取消任务

注意： 等待用户请求状态检查。避免重复轮询。

Job URLs

任务URL

After submission, jobs have monitoring URLs:

https://huggingface.co/jobs/username/job-id

View logs, status, and details in the browser.

提交任务后，任务会有监控URL：

https://huggingface.co/jobs/username/job-id

可在浏览器中查看日志、状态和详情。

Wait for Multiple Jobs

等待多个任务完成

python

import time
from huggingface_hub import inspect_job, run_job

python

import time
from huggingface_hub import inspect_job, run_job

Run multiple jobs

运行多个任务

jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]

Wait for all to complete

等待所有任务完成

for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)

undefined

for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)

undefined

Scheduled Jobs

定时任务

Run jobs on a schedule using CRON expressions or predefined schedules.

MCP Tool:

python

undefined

使用CRON表达式或预定义计划定时运行任务。

MCP工具：

python

undefined

Schedule a UV script that runs every hour

定时运行每小时执行一次的UV脚本

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "@hourly", "flavor": "cpu-basic" })

Schedule with CRON syntax

使用CRON语法定时运行

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 9 AM every Monday "flavor": "cpu-basic" })

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 每周一上午9点 "flavor": "cpu-basic" })

Schedule a Docker-based job

定时运行基于Docker的任务

hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })


**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job

hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })


**Python API：**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job

Schedule a Docker job

定时运行Docker任务

create_scheduled_job( image="python:3.12", command=["python", "-c", "print('Running on schedule!')"], schedule="@hourly" )

Schedule a UV script

定时运行UV脚本

create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")

Schedule with GPU

定时运行GPU任务

create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # Every 6 hours flavor="a10g-small" )


**Available schedules:**
- `@annually`, `@yearly` - Once per year
- `@monthly` - Once per month
- `@weekly` - Once per week
- `@daily` - Once per day
- `@hourly` - Once per hour
- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)

**Manage scheduled jobs:**
```python

create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # 每6小时一次 flavor="a10g-small" )


**可用计划：**
- `@annually`、`@yearly` - 每年一次
- `@monthly` - 每月一次
- `@weekly` - 每周一次
- `@daily` - 每天一次
- `@hourly` - 每小时一次
- CRON表达式 - 自定义计划（如`"*/5 * * * *"`表示每5分钟一次）

**管理定时任务：**
```python

MCP Tool

MCP工具

hf_jobs("scheduled ps") # List scheduled jobs hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause hf_jobs("scheduled resume", {"job_id": "..."}) # Resume hf_jobs("scheduled delete", {"job_id": "..."}) # Delete


**Python API for management:**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)

hf_jobs("scheduled ps") # 列出定时任务 hf_jobs("scheduled inspect", {"job_id": "..."}) # 查看详情 hf_jobs("scheduled suspend", {"job_id": "..."}) # 暂停 hf_jobs("scheduled resume", {"job_id": "..."}) # 恢复 hf_jobs("scheduled delete", {"job_id": "..."}) # 删除


**用于管理的Python API：**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)

List all scheduled jobs

列出所有定时任务

scheduled = list_scheduled_jobs()

Inspect a scheduled job

查看定时任务详情

info = inspect_scheduled_job(scheduled_job_id)

Suspend (pause) a scheduled job

暂停定时任务

suspend_scheduled_job(scheduled_job_id)

Resume a scheduled job

恢复定时任务

resume_scheduled_job(scheduled_job_id)

Delete a scheduled job

删除定时任务

delete_scheduled_job(scheduled_job_id)

undefined

delete_scheduled_job(scheduled_job_id)

undefined

Webhooks: Trigger Jobs on Events

Webhooks：事件触发任务

Trigger jobs automatically when changes happen in Hugging Face repositories.

Python API:

python

from huggingface_hub import create_webhook

当Hugging Face仓库发生变化时，自动触发任务。

Python API：

python

from huggingface_hub import create_webhook

Create webhook that triggers a job when a repo changes

创建Webhook，当仓库变化时触发任务

webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )


**How it works:**
1. Webhook listens for changes in watched repositories
2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable
3. Your script can parse the payload to understand what changed

**Use cases:**
- Auto-process new datasets when uploaded
- Trigger inference when models are updated
- Run tests when code changes
- Generate reports on repository activity

**Access webhook payload in script:**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")

See Webhooks Documentation for more details.

webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )


**工作原理：**
1. Webhook监听被关注仓库的变化
2. 触发时，任务会在`WEBHOOK_PAYLOAD`环境变量中获取相关信息
3. 你的脚本可解析该负载以了解发生了哪些变化

**使用场景：**
- 上传新数据集时自动处理
- 模型更新时触发推理
- 代码变化时运行测试
- 生成仓库活动报告

**在脚本中访问Webhook负载：**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"事件类型: {payload.get('event', {}).get('action')}")

查看Webhooks文档获取更多详情。

Common Workload Patterns

常见工作负载模式

This repository ships ready-to-run UV scripts in

hf-jobs/scripts/

. Prefer using them instead of inventing new templates.

本仓库在

hf-jobs/scripts/

中提供了可直接运行的UV脚本。优先使用这些脚本，而非自行编写模板。

Pattern 1: Dataset → Model Responses (vLLM) —

scripts/generate-responses.py

模式1：数据集→模型响应（vLLM）——

scripts/generate-responses.py

What it does: loads a Hub dataset (chat

messages

or a

prompt

column), applies a model chat template, generates responses with vLLM, and pushes the output dataset + dataset card back to the Hub.

Requires: GPU + write token (it pushes a dataset).

python

from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 加载Hub数据集（聊天

messages

或

prompt

列），应用模型聊天模板，使用vLLM生成响应，并将输出数据集和数据集卡片推送回Hub。

要求： GPU + 写令牌（需要推送数据集）。

python

from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 2: CoT Self-Instruct Synthetic Data —

scripts/cot-self-instruct.py

模式2：CoT自指令合成数据——

scripts/cot-self-instruct.py

What it does: generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then pushes the generated dataset + dataset card to the Hub.

Requires: GPU + write token (it pushes a dataset).

python

from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 通过CoT自指令生成合成提示/答案，可选择过滤输出（答案一致性/RIP），然后将生成的数据集和数据集卡片推送到Hub。

要求： GPU + 写令牌（需要推送数据集）。

python

from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 3: Streaming Dataset Stats (Polars + HF Hub) —

scripts/finepdfs-stats.py

模式3：流式数据集统计（Polars + HF Hub）——

scripts/finepdfs-stats.py

What it does: scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.

Requires: CPU is often enough; token needed only if you pass

--output-repo

(upload).

python

from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 直接从Hub扫描parquet文件（无需下载300GB数据），计算时间统计信息，并（可选）将结果上传到Hub数据集仓库。

要求： 通常CPU即可；仅当传递

--output-repo

（上传）时需要令牌。

python

from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Common Failure Modes

常见失败模式

Out of Memory (OOM)

内存不足（OOM）

Fix:

Reduce batch size or data chunk size
Process data in smaller batches
Upgrade hardware: cpu → t4 → a10g → a100

解决方法：

减小批处理大小或数据块大小
分小批处理数据
升级硬件：cpu → t4 → a10g → a100

Job Timeout

任务超时

Fix:

Check logs for actual runtime
Increase timeout with buffer:
```
"timeout": "3h"
```
Optimize code for faster execution
Process data in chunks

解决方法：

查看日志了解实际运行时间
增加超时并预留缓冲：
```
"timeout": "3h"
```
优化代码以加快执行速度
分块处理数据

Hub Push Failures

Hub推送失败

Fix:

Add to job:
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
Verify token in script:
```
assert "HF_TOKEN" in os.environ
```
Check token permissions
Verify repo exists or can be created

解决方法：

在任务中添加：
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
在脚本中验证令牌：
```
assert "HF_TOKEN" in os.environ
```
检查令牌权限
验证仓库是否存在或可创建

Missing Dependencies

依赖缺失

Fix: Add to PEP 723 header:

python

undefined

解决方法： 在PEP 723标头中添加：

python

undefined

/// script

dependencies = ["package1", "package2>=1.0.0"]

///

undefined

undefined

Authentication Errors

认证错误

Fix:

Check
```
hf_whoami()
```
works locally
Verify
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
in job config
Re-login:
```
hf auth login
```
Check token has required permissions

解决方法：

确认本地
```
hf_whoami()
```
可正常工作
验证任务配置中包含
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
重新登录：
```
hf auth login
```
检查令牌是否拥有所需权限

Troubleshooting

故障排除

Common issues:

Job times out → Increase timeout, optimize code
Results not saved → Check persistence method, verify HF_TOKEN
Out of Memory → Reduce batch size, upgrade hardware
Import errors → Add dependencies to PEP 723 header
Authentication errors → Check token, verify secrets parameter

See:

references/troubleshooting.md

for complete troubleshooting guide

常见问题：

任务超时 → 增加超时时间，优化代码
结果未保存 → 检查持久化方式，验证HF_TOKEN
内存不足 → 减小批处理大小，升级硬件
导入错误 → 在PEP 723标头中添加依赖
认证错误 → 检查令牌，验证secrets参数

查看：

references/troubleshooting.md

获取完整的故障排除指南

Resources

资源

References (In This Skill)

本技能中的参考文档

```
references/token_usage.md
```
- Complete token usage guide
```
references/hardware_guide.md
```
- Hardware specs and selection
```
references/hub_saving.md
```
- Hub persistence guide
```
references/troubleshooting.md
```
- Common issues and solutions

```
references/token_usage.md
```
- 完整的令牌使用指南
```
references/hardware_guide.md
```
- 硬件规格与选择
```
references/hub_saving.md
```
- Hub持久化指南
```
references/troubleshooting.md
```
- 常见问题与解决方案

Scripts (In This Skill)

本技能中的脚本

```
scripts/generate-responses.py
```
- vLLM batch generation: dataset → responses → push to Hub
```
scripts/cot-self-instruct.py
```
- CoT Self-Instruct synthetic data generation + filtering → push to Hub
```
scripts/finepdfs-stats.py
```
- Polars streaming stats over
```
finepdfs-edu
```
parquet on Hub (optional push)

```
scripts/generate-responses.py
```
- vLLM批处理生成：数据集→响应→推送到Hub
```
scripts/cot-self-instruct.py
```
- CoT自指令合成数据生成+过滤→推送到Hub
```
scripts/finepdfs-stats.py
```
- 对Hub上的
```
finepdfs-edu
```
parquet文件进行Polars流式统计（可选推送）

External Links

外部链接

Official Documentation:

HF Jobs Guide - Main documentation
HF Jobs CLI Reference - Command line interface
HF Jobs API Reference - Python API details
Hardware Flavors Reference - Available hardware

Related Tools:

UV Scripts Guide - PEP 723 inline dependencies
UV Scripts Organization - Community UV script collection
HF Hub Authentication - Token setup
Webhooks Documentation - Event triggers

官方文档：

HF Jobs指南 - 主文档
HF Jobs CLI参考 - 命令行接口
HF Jobs API参考 - Python API详情
硬件规格参考 - 可用硬件

相关工具：

UV脚本指南 - PEP 723内联依赖
UV脚本组织 - 社区UV脚本集合
HF Hub认证 - 令牌设置
Webhooks文档 - 事件触发

Key Takeaways

核心要点

Submit scripts inline - The
```
script
```
parameter accepts Python code directly; no file saving required unless user requests
Jobs are asynchronous - Don't wait/poll; let user check when ready
Always set timeout - Default 30 min may be insufficient; set appropriate timeout
Always persist results - Environment is ephemeral; without persistence, all work is lost
Use tokens securely - Always use
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
for Hub operations
Choose appropriate hardware - Start small, scale up based on needs (see hardware guide)
Use UV scripts - Default to
```
hf_jobs("uv", {...})
```
with inline scripts for Python workloads
Handle authentication - Verify tokens are available before Hub operations
Monitor jobs - Provide job URLs and status check commands
Optimize costs - Choose right hardware, set appropriate timeouts

内联提交脚本 -
```
script
```
参数可直接接受Python代码；除非用户要求，否则无需保存文件
任务是异步的 - 不要等待/轮询；让用户在需要时检查
始终设置超时 - 默认30分钟可能不足；设置合适的超时时间
始终持久化结果 - 环境是临时的；不持久化的话所有工作都会丢失
安全使用令牌 - 执行Hub操作时始终使用
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
选择合适的硬件 - 从小规模开始，根据需求扩容（查看硬件指南）
使用UV脚本 - 对于Python工作负载，默认使用
```
hf_jobs("uv", {...})
```
和内联脚本
处理认证 - 在执行Hub操作前验证令牌是否可用
监控任务 - 提供任务URL和状态检查命令
优化成本 - 选择合适的硬件，设置合适的超时时间

Quick Reference: MCP Tool vs CLI vs Python API

快速参考：MCP工具 vs CLI vs Python API

Operation	MCP Tool	CLI	Python API
Run UV script	`hf_jobs("uv", {...})`	`hf jobs uv run script.py`	`run_uv_job("script.py")`
Run Docker job	`hf_jobs("run", {...})`	`hf jobs run image cmd`	`run_job(image, command)`
List jobs	`hf_jobs("ps")`	`hf jobs ps`	`list_jobs()`
View logs	`hf_jobs("logs", {...})`	`hf jobs logs <id>`	`fetch_job_logs(job_id)`
Cancel job	`hf_jobs("cancel", {...})`	`hf jobs cancel <id>`	`cancel_job(job_id)`
Schedule UV	`hf_jobs("scheduled uv", {...})`	-	`create_scheduled_uv_job()`
Schedule Docker	`hf_jobs("scheduled run", {...})`	-	`create_scheduled_job()`

操作	MCP工具	CLI	Python API
运行UV脚本	`hf_jobs("uv", {...})`	`hf jobs uv run script.py`	`run_uv_job("script.py")`
运行Docker任务	`hf_jobs("run", {...})`	`hf jobs run image cmd`	`run_job(image, command)`
列出任务	`hf_jobs("ps")`	`hf jobs ps`	`list_jobs()`
查看日志	`hf_jobs("logs", {...})`	`hf jobs logs <id>`	`fetch_job_logs(job_id)`
取消任务	`hf_jobs("cancel", {...})`	`hf jobs cancel <id>`	`cancel_job(job_id)`
定时运行UV脚本	`hf_jobs("scheduled uv", {...})`	-	`create_scheduled_uv_job()`
定时运行Docker任务	`hf_jobs("scheduled run", {...})`	-	`create_scheduled_job()`
",