hugging-face-jobs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Running Workloads on Hugging Face Jobs

在Hugging Face Jobs上运行工作负载

Overview

概述

Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.
Common use cases:
  • Data Processing - Transform, filter, or analyze large datasets
  • Batch Inference - Run inference on thousands of samples
  • Experiments & Benchmarks - Reproducible ML experiments
  • Model Training - Fine-tune models (see
    model-trainer
    skill for TRL-specific training)
  • Synthetic Data Generation - Generate datasets using LLMs
  • Development & Testing - Test code without local GPU setup
  • Scheduled Jobs - Automate recurring tasks
For model training specifically: See the
model-trainer
skill for TRL-based training workflows.
在全托管的Hugging Face基础设施上运行任何工作负载。无需本地设置——任务可在云端CPU、GPU或TPU上运行,且能将结果持久化到Hugging Face Hub。
常见使用场景:
  • 数据处理 - 转换、过滤或分析大型数据集
  • 批处理推理 - 对数千个样本运行推理
  • 实验与基准测试 - 可复现的机器学习实验
  • 模型训练 - 微调模型(针对TRL特定训练,请查看
    model-trainer
    技能)
  • 合成数据生成 - 使用大语言模型生成数据集
  • 开发与测试 - 无需本地GPU即可测试代码
  • 定时任务 - 自动化重复任务
针对模型训练的说明: 基于TRL的训练工作流请使用
model-trainer
技能。

When to Use This Skill

何时使用此技能

Use this skill when users want to:
  • Run Python workloads on cloud infrastructure
  • Execute jobs without local GPU/TPU setup
  • Process data at scale
  • Run batch inference or experiments
  • Schedule recurring tasks
  • Use GPUs/TPUs for any workload
  • Persist results to the Hugging Face Hub
当用户希望执行以下操作时,可使用此技能:
  • 在云端基础设施上运行Python工作负载
  • 无需本地GPU/TPU设置即可执行任务
  • 大规模处理数据
  • 运行批处理推理或实验
  • 定时执行重复任务
  • 为任意工作负载使用GPU/TPU
  • 将结果持久化到Hugging Face Hub

Key Directives

核心指导原则

When assisting with jobs:
  1. ALWAYS use
    hf_jobs()
    MCP tool
    - Submit jobs using
    hf_jobs("uv", {...})
    or
    hf_jobs("run", {...})
    . The
    script
    parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to
    hf_jobs()
    .
  2. Always handle authentication - Jobs that interact with the Hub require
    HF_TOKEN
    via secrets. See Token Usage section below.
  3. Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
  4. Set appropriate timeouts - Default 30min may be insufficient for long-running tasks.
协助用户处理任务时:
  1. 始终使用
    hf_jobs()
    MCP工具
    - 通过
    hf_jobs("uv", {...})
    hf_jobs("run", {...})
    提交任务。
    script
    参数可直接接受Python代码。除非用户明确要求,否则不要保存到本地文件。将脚本内容作为字符串传递给
    hf_jobs()
  2. 始终处理认证 - 与Hub交互的任务需要通过密钥传入
    HF_TOKEN
    。请查看下方的令牌使用部分。
  3. 提交后提供任务详情 - 提交任务后,提供任务ID、监控URL、预估时间,并告知用户之后可请求查看状态。
  4. 设置合适的超时时间 - 默认30分钟可能不足以处理长时间运行的任务。

Prerequisites Checklist

前置检查清单

Before starting any job, verify:
启动任何任务前,请验证:

Account & Authentication

账户与认证

  • Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
  • Authenticated login: Check with
    hf_whoami()
  • HF_TOKEN for Hub Access ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)
  • Token must have appropriate permissions (read for downloads, write for uploads)
  • 拥有Pro、Team或Enterprise计划的Hugging Face账户(Jobs功能需要付费计划)
  • 已完成认证登录:通过
    hf_whoami()
    检查
  • 用于Hub访问的HF_TOKEN ⚠️ 至关重要 - 任何Hub操作(推送模型/数据集、下载私有仓库等)都需要
  • 令牌必须拥有合适的权限(下载需要读权限,上传需要写权限)

Token Usage (See Token Usage section for details)

令牌使用(详情请查看令牌使用部分)

When tokens are required:
  • Pushing models/datasets to Hub
  • Accessing private repositories
  • Using Hub APIs in scripts
  • Any authenticated Hub operations
How to provide tokens:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Recommended: automatic token
}
⚠️ CRITICAL: The
$HF_TOKEN
placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.
需要令牌的场景:
  • 向Hub推送模型/数据集
  • 访问私有仓库
  • 在脚本中使用Hub API
  • 任何需要认证的Hub操作
提供令牌的方式:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 推荐:自动令牌
}
⚠️ 重要提示:
$HF_TOKEN
是占位符,会自动替换为你登录后的实际令牌。切勿在脚本中硬编码令牌。

Token Usage Guide

令牌使用指南

Understanding Tokens

了解令牌

What are HF Tokens?
  • Authentication credentials for Hugging Face Hub
  • Required for authenticated operations (push, private repos, API access)
  • Stored securely on your machine after
    hf auth login
Token Types:
  • Read Token - Can download models/datasets, read private repos
  • Write Token - Can push models/datasets, create repos, modify content
  • Organization Token - Can act on behalf of an organization
什么是HF令牌?
  • 用于Hugging Face Hub的认证凭据
  • 执行认证操作(推送、私有仓库访问、API调用)时必需
  • 在执行
    hf auth login
    后安全存储在你的设备上
令牌类型:
  • 读令牌 - 可下载模型/数据集、读取私有仓库
  • 写令牌 - 可推送模型/数据集、创建仓库、修改内容
  • 组织令牌 - 可代表组织执行操作

When Tokens Are Required

何时需要令牌

Always Required:
  • Pushing models/datasets to Hub
  • Accessing private repositories
  • Creating new repositories
  • Modifying existing repositories
  • Using Hub APIs programmatically
Not Required:
  • Downloading public models/datasets
  • Running jobs that don't interact with Hub
  • Reading public repository information
始终需要的场景:
  • 向Hub推送模型/数据集
  • 访问私有仓库
  • 创建新仓库
  • 修改现有仓库
  • 以编程方式使用Hub API
不需要的场景:
  • 下载公开模型/数据集
  • 运行不与Hub交互的任务
  • 读取公开仓库信息

How to Provide Tokens to Jobs

如何为任务提供令牌

Method 1: Automatic Token (Recommended)

方法1:自动令牌(推荐)

python
hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ Automatic replacement
})
How it works:
  • $HF_TOKEN
    is a placeholder that gets replaced with your actual token
  • Uses the token from your logged-in session (
    hf auth login
    )
  • Most secure and convenient method
  • Token is encrypted server-side when passed as a secret
Benefits:
  • No token exposure in code
  • Uses your current login session
  • Automatically updated if you re-login
  • Works seamlessly with MCP tools
python
hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ 自动替换
})
工作原理:
  • $HF_TOKEN
    是占位符,会替换为你的实际令牌
  • 使用你登录会话中的令牌(
    hf auth login
    生成的令牌)
  • 最安全且便捷的方式
  • 作为密钥传递时,令牌会在服务器端加密
优势:
  • 代码中不会暴露令牌
  • 使用当前登录会话
  • 重新登录后会自动更新
  • 与MCP工具无缝协作

Method 2: Explicit Token (Not Recommended)

方法2:显式令牌(不推荐)

python
hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Hardcoded token
})
When to use:
  • Only if automatic token doesn't work
  • Testing with a specific token
  • Organization tokens (use with caution)
Security concerns:
  • Token visible in code/logs
  • Must manually update if token rotates
  • Risk of token exposure
python
hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 硬编码令牌
})
使用场景:
  • 仅在自动令牌无法工作时使用
  • 使用特定令牌进行测试
  • 组织令牌(谨慎使用)
安全隐患:
  • 令牌会在代码/日志中可见
  • 令牌轮换时必须手动更新
  • 存在令牌泄露风险

Method 3: Environment Variable (Less Secure)

方法3:环境变量(安全性较低)

python
hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Less secure than secrets
})
Difference from secrets:
  • env
    variables are visible in job logs
  • secrets
    are encrypted server-side
  • Always prefer
    secrets
    for tokens
python
hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 安全性低于密钥
})
与密钥的区别:
  • env
    变量会在任务日志中可见
  • secrets
    会在服务器端加密
  • 始终优先使用
    secrets
    存储令牌

Using Tokens in Scripts

在脚本中使用令牌

In your Python script, tokens are available as environment variables:
python
undefined
在Python脚本中,令牌可作为环境变量访问:
python
undefined

/// script

/// script

dependencies = ["huggingface-hub"]

dependencies = ["huggingface-hub"]

///

///

import os from huggingface_hub import HfApi
import os from huggingface_hub import HfApi

Token is automatically available if passed via secrets

如果通过密钥传递,令牌会自动可用

token = os.environ.get("HF_TOKEN")
token = os.environ.get("HF_TOKEN")

Use with Hub API

与Hub API一起使用

api = HfApi(token=token)
api = HfApi(token=token)

Or let huggingface_hub auto-detect

或者让huggingface-hub自动检测

api = HfApi() # Automatically uses HF_TOKEN env var

**Best practices:**
- Don't hardcode tokens in scripts
- Use `os.environ.get("HF_TOKEN")` to access
- Let `huggingface_hub` auto-detect when possible
- Verify token exists before Hub operations
api = HfApi() # 自动使用HF_TOKEN环境变量

**最佳实践:**
- 不要在脚本中硬编码令牌
- 使用`os.environ.get("HF_TOKEN")`来获取
- 尽可能让`huggingface-hub`自动检测
- 在执行Hub操作前验证令牌是否存在

Token Verification

令牌验证

Check if you're logged in:
python
from huggingface_hub import whoami
user_info = whoami()  # Returns your username if authenticated
Verify token in job:
python
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...")  # Should start with "hf_"
检查是否已登录:
python
from huggingface_hub import whoami
user_info = whoami()  # 已认证时会返回你的用户名
在任务中验证令牌:
python
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN未找到!"
token = os.environ["HF_TOKEN"]
print(f"令牌开头为: {token[:7]}...")  # 应该以"hf_"开头

Common Token Issues

常见令牌问题

Error: 401 Unauthorized
  • Cause: Token missing or invalid
  • Fix: Add
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    to job config
  • Verify: Check
    hf_whoami()
    works locally
Error: 403 Forbidden
Error: Token not found in environment
  • Cause:
    secrets
    not passed or wrong key name
  • Fix: Use
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    (not
    env
    )
  • Verify: Script checks
    os.environ.get("HF_TOKEN")
Error: Repository access denied
  • Cause: Token doesn't have access to private repo
  • Fix: Use token from account with access
  • Check: Verify repo visibility and your permissions
错误:401 Unauthorized
  • 原因: 令牌缺失或无效
  • 解决方法: 在任务配置中添加
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  • 验证: 确认本地
    hf_whoami()
    可正常工作
错误:403 Forbidden
错误:环境中未找到令牌
  • 原因: 未传递
    secrets
    或密钥名称错误
  • 解决方法: 使用
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    (而非
    env
  • 验证: 脚本中检查
    os.environ.get("HF_TOKEN")
错误:仓库访问被拒绝
  • 原因: 令牌无权访问私有仓库
  • 解决方法: 使用拥有访问权限的账户的令牌
  • 检查: 验证仓库可见性和你的权限

Token Security Best Practices

令牌安全最佳实践

  1. Never commit tokens - Use
    $HF_TOKEN
    placeholder or environment variables
  2. Use secrets, not env - Secrets are encrypted server-side
  3. Rotate tokens regularly - Generate new tokens periodically
  4. Use minimal permissions - Create tokens with only needed permissions
  5. Don't share tokens - Each user should use their own token
  6. Monitor token usage - Check token activity in Hub settings
  1. 切勿提交令牌 - 使用
    $HF_TOKEN
    占位符或环境变量
  2. 使用密钥而非环境变量 - 密钥会在服务器端加密
  3. 定期轮换令牌 - 定期生成新令牌
  4. 使用最小权限 - 创建仅拥有所需权限的令牌
  5. 不要共享令牌 - 每个用户应使用自己的令牌
  6. 监控令牌使用情况 - 在Hub设置中查看令牌活动

Complete Token Example

完整令牌示例

python
undefined
python
undefined

Example: Push results to Hub

示例:将结果推送到Hub

hf_jobs("uv", { "script": """
hf_jobs("uv", { "script": """

/// script

/// script

dependencies = ["huggingface-hub", "datasets"]

dependencies = ["huggingface-hub", "datasets"]

///

///

import os from huggingface_hub import HfApi from datasets import Dataset
import os from huggingface_hub import HfApi from datasets import Dataset

Verify token is available

验证令牌是否可用

assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"
assert "HF_TOKEN" in os.environ, "需要HF_TOKEN!"

Use token for Hub operations

使用令牌执行Hub操作

api = HfApi(token=os.environ["HF_TOKEN"])
api = HfApi(token=os.environ["HF_TOKEN"])

Create and push dataset

创建并推送数据集

data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ Dataset pushed successfully!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely })
undefined
data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ 数据集推送成功!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 安全提供令牌 })
undefined

Quick Start: Two Approaches

快速开始:两种方式

Approach 1: UV Scripts (Recommended)

方式1:UV脚本(推荐)

UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.
MCP Tool:
python
hf_jobs("uv", {
    "script": """
UV脚本使用PEP 723内联依赖,实现简洁、独立的工作负载。
MCP工具:
python
hf_jobs("uv", {
    "script": """

/// script

/// script

dependencies = ["transformers", "torch"]

dependencies = ["transformers", "torch"]

///

///

from transformers import pipeline import torch
from transformers import pipeline import torch

Your workload here

你的工作负载代码

classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })

**CLI Equivalent:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")
Benefits: Direct MCP tool usage, clean code, dependencies declared inline, no file saving required
When to use: Default choice for all workloads, custom logic, any scenario requiring
hf_jobs()
classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })

**CLI等效命令:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")
优势: 直接使用MCP工具,代码简洁,内联声明依赖,无需保存文件
适用场景: 所有工作负载的默认选择、自定义逻辑、任何需要
hf_jobs()
的场景

Custom Docker Images for UV Scripts

为UV脚本使用自定义Docker镜像

By default, UV scripts use
ghcr.io/astral-sh/uv:python3.12-bookworm-slim
. For ML workloads with complex dependencies, use pre-built images:
python
hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # Pre-built image with vLLM
    "flavor": "a10g-large"
})
CLI:
bash
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py
Benefits: Faster startup, pre-installed dependencies, optimized for specific frameworks
默认情况下,UV脚本使用
ghcr.io/astral-sh/uv:python3.12-bookworm-slim
。对于具有复杂依赖的机器学习工作负载,可使用预构建镜像:
python
hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # 预构建的vLLM镜像
    "flavor": "a10g-large"
})
CLI:
bash
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py
优势: 启动速度更快,预安装依赖,针对特定框架优化

Python Version

Python版本

By default, UV scripts use Python 3.12. Specify a different version:
python
hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # Use Python 3.11
    "flavor": "cpu-basic"
})
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")
默认情况下,UV脚本使用Python 3.12。可指定其他版本:
python
hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # 使用Python 3.11
    "flavor": "cpu-basic"
})
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")

Working with Scripts

脚本使用注意事项

⚠️ Important: There are two "script path" stories depending on how you run Jobs:
  • Using the
    hf_jobs()
    MCP tool (recommended in this repo)
    : the
    script
    value must be inline code (a string) or a URL. A local filesystem path (like
    "./scripts/foo.py"
    ) won't exist inside the remote container.
  • Using the
    hf jobs uv run
    CLI
    : local file paths do work (the CLI uploads your script).
Common mistake with
hf_jobs()
MCP tool:
python
undefined
⚠️ 重要提示: 根据运行Jobs的方式,"脚本路径"有两种不同的处理方式:
  • 使用
    hf_jobs()
    MCP工具(本仓库推荐方式)
    script
    的值必须是内联代码(字符串)或URL。本地文件系统路径(如
    "./scripts/foo.py"
    )在远程容器中不存在。
  • 使用
    hf jobs uv run
    CLI
    :本地文件路径可正常使用(CLI会上传你的脚本)。
使用
hf_jobs()
MCP工具的常见错误:
python
undefined

❌ Will fail (remote container can't see your local path)

❌ 会失败(远程容器无法访问你的本地路径)

hf_jobs("uv", {"script": "./scripts/foo.py"})

**Correct patterns with `hf_jobs()` MCP tool:**

```python
hf_jobs("uv", {"script": "./scripts/foo.py"})

**使用`hf_jobs()` MCP工具的正确方式:**

```python

✅ Inline: read the local script file and pass its contents

✅ 内联:读取本地脚本文件并传递其内容

from pathlib import Path script = Path("hf-jobs/scripts/foo.py").read_text() hf_jobs("uv", {"script": script})
from pathlib import Path script = Path("hf-jobs/scripts/foo.py").read_text() hf_jobs("uv", {"script": script})

✅ URL: host the script somewhere reachable

✅ URL:将脚本托管在可访问的位置

✅ URL from GitHub

✅ GitHub URL


**CLI equivalent (local paths supported):**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

**等效CLI命令(支持本地路径):**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

Adding Dependencies at Runtime

在运行时添加依赖

Add extra dependencies beyond what's in the PEP 723 header:
python
hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # Extra deps
    "flavor": "a10g-small"
})
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])
可添加PEP 723标头之外的额外依赖:
python
hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # 额外依赖
    "flavor": "a10g-small"
})
Python API:
python
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])

Approach 2: Docker-Based Jobs

方式2:基于Docker的任务

Run jobs with custom Docker images and commands.
MCP Tool:
python
hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})
CLI Equivalent:
bash
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"
Python API:
python
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")
Benefits: Full Docker control, use pre-built images, run any command When to use: Need specific Docker images, non-Python workloads, complex environments
Example with GPU:
python
hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})
Using Hugging Face Spaces as Images:
You can use Docker images from HF Spaces:
python
hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # Space as Docker image
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})
CLI:
bash
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"
使用自定义Docker镜像和命令运行任务。
MCP工具:
python
hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})
CLI等效命令:
bash
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"
Python API:
python
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")
优势: 完全控制Docker,可使用预构建镜像,运行任意命令 适用场景: 需要特定Docker镜像、非Python工作负载、复杂环境
GPU示例:
python
hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})
使用Hugging Face Spaces作为镜像:
可使用HF Spaces中的Docker镜像:
python
hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # 将Space作为Docker镜像
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})
CLI:
bash
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'" 

Finding More UV Scripts on Hub

在Hub上查找更多UV脚本

The
uv-scripts
organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
python
undefined
uv-scripts
组织在Hugging Face Hub上提供了可直接使用的UV脚本,以数据集形式存储:
python
undefined

Discover available UV script collections

发现可用的UV脚本集合

dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

Explore a specific collection

浏览特定集合

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

**热门集合:** OCR、分类、合成数据、vLLM、数据集创建

Hardware Selection

硬件选择

Reference: HF Jobs Hardware Docs (updated 07/2025)
Workload TypeRecommended HardwareUse Case
Data processing, testing
cpu-basic
,
cpu-upgrade
Lightweight tasks
Small models, demos
t4-small
<1B models, quick tests
Medium models
t4-medium
,
l4x1
1-7B models
Large models, production
a10g-small
,
a10g-large
7-13B models
Very large models
a100-large
13B+ models
Batch inference
a10g-large
,
a100-large
High-throughput
Multi-GPU workloads
l4x4
,
a10g-largex2
,
a10g-largex4
Parallel/large models
TPU workloads
v5e-1x1
,
v5e-2x2
,
v5e-2x4
JAX/Flax, TPU-optimized
All Available Flavors:
  • CPU:
    cpu-basic
    ,
    cpu-upgrade
  • GPU:
    t4-small
    ,
    t4-medium
    ,
    l4x1
    ,
    l4x4
    ,
    a10g-small
    ,
    a10g-large
    ,
    a10g-largex2
    ,
    a10g-largex4
    ,
    a100-large
  • TPU:
    v5e-1x1
    ,
    v5e-2x2
    ,
    v5e-2x4
Guidelines:
  • Start with smaller hardware for testing
  • Scale up based on actual needs
  • Use multi-GPU for parallel workloads or large models
  • Use TPUs for JAX/Flax workloads
  • See
    references/hardware_guide.md
    for detailed specifications
参考: HF Jobs硬件文档(2025年7月更新)
工作负载类型推荐硬件使用场景
数据处理、测试
cpu-basic
cpu-upgrade
轻量级任务
小型模型、演示
t4-small
小于10亿参数的模型、快速测试
中型模型
t4-medium
l4x1
10亿-70亿参数的模型
大型模型、生产环境
a10g-small
a10g-large
70亿-130亿参数的模型
超大型模型
a100-large
130亿参数以上的模型
批处理推理
a10g-large
a100-large
高吞吐量
多GPU工作负载
l4x4
a10g-largex2
a10g-largex4
并行任务或大型模型
TPU工作负载
v5e-1x1
v5e-2x2
v5e-2x4
JAX/Flax、TPU优化任务
所有可用硬件规格:
  • CPU:
    cpu-basic
    cpu-upgrade
  • GPU:
    t4-small
    t4-medium
    l4x1
    l4x4
    a10g-small
    a10g-large
    a10g-largex2
    a10g-largex4
    a100-large
  • TPU:
    v5e-1x1
    v5e-2x2
    v5e-2x4
选择指南:
  • 测试时先使用较小的硬件
  • 根据实际需求扩容
  • 并行工作负载或大型模型使用多GPU
  • JAX/Flax工作负载使用TPU
  • 详细规格请查看
    references/hardware_guide.md

Critical: Saving Results

关键:保存结果

⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS
The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, ALL WORK IS LOST.
⚠️ 临时环境——必须持久化结果
Jobs环境是临时的。任务结束后所有文件都会被删除。如果不持久化结果,所有工作都会丢失

Persistence Options

持久化选项

1. Push to Hugging Face Hub (Recommended)
python
undefined
1. 推送到Hugging Face Hub(推荐)
python
undefined

Push models

推送模型

model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])

Push datasets

推送数据集

dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])

Push artifacts

推送制品

api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )

**2. Use External Storage**

```python
api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )

**2. 使用外部存储**

```python

Upload to S3, GCS, etc.

上传到S3、GCS等

import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')

**3. Send Results via API**

```python
import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')

**3. 通过API发送结果**

```python

POST results to your API

将结果POST到你的API

import requests requests.post("https://your-api.com/results", json=results)
undefined
import requests requests.post("https://your-api.com/results", json=results)
undefined

Required Configuration for Hub Push

Hub推送的必要配置

In job submission:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}
In script:
python
import os
from huggingface_hub import HfApi
任务提交时:
python
{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 启用认证
}
脚本中:
python
import os
from huggingface_hub import HfApi

Token automatically available from secrets

令牌会从密钥中自动获取

api = HfApi(token=os.environ.get("HF_TOKEN"))
api = HfApi(token=os.environ.get("HF_TOKEN"))

Push your results

推送你的结果

api.upload_file(...)
undefined
api.upload_file(...)
undefined

Verification Checklist

验证清单

Before submitting:
  • Results persistence method chosen
  • secrets={"HF_TOKEN": "$HF_TOKEN"}
    if using Hub
  • Script handles missing token gracefully
  • Test persistence path works
See:
references/hub_saving.md
for detailed Hub persistence guide
提交前:
  • 已选择结果持久化方式
  • 如果使用Hub,已添加
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  • 脚本可优雅处理令牌缺失的情况
  • 测试持久化路径可正常工作
查看:
references/hub_saving.md
获取详细的Hub持久化指南

Timeout Management

超时管理

⚠️ DEFAULT: 30 MINUTES
Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.
⚠️ 默认:30分钟
任务会在超时后自动停止。对于训练等长时间运行的任务,务必设置自定义超时。

Setting Timeouts

设置超时时间

MCP Tool:
python
{
    "timeout": "2h"   # 2 hours
}
Supported formats:
  • Integer/float: seconds (e.g.,
    300
    = 5 minutes)
  • String with suffix:
    "5m"
    (minutes),
    "2h"
    (hours),
    "1d"
    (days)
  • Examples:
    "90m"
    ,
    "2h"
    ,
    "1.5h"
    ,
    300
    ,
    "1d"
Python API:
python
from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2 hours in seconds
MCP工具:
python
{
    "timeout": "2h"   # 2小时
}
支持的格式:
  • 整数/浮点数:秒(如
    300
    = 5分钟)
  • 带后缀的字符串:
    "5m"
    (分钟)、
    "2h"
    (小时)、
    "1d"
    (天)
  • 示例:
    "90m"
    "2h"
    "1.5h"
    300
    "1d"
Python API:
python
from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2小时(秒)

Timeout Guidelines

超时设置指南

ScenarioRecommendedNotes
Quick test10-30 minVerify setup
Data processing1-2 hoursDepends on data size
Batch inference2-4 hoursLarge batches
Experiments4-8 hoursMultiple runs
Long-running8-24 hoursProduction workloads
Always add 20-30% buffer for setup, network delays, and cleanup.
On timeout: Job killed immediately, all unsaved progress lost
场景推荐设置说明
快速测试10-30分钟验证设置
数据处理1-2小时取决于数据大小
批处理推理2-4小时大型批处理
实验4-8小时多次运行
长时间运行任务8-24小时生产环境工作负载
始终预留20-30%的缓冲时间,用于启动、网络延迟和清理工作。
超时后: 任务会立即终止,所有未保存的进度都会丢失

Cost Estimation

成本估算

General guidelines:
Total Cost = (Hours of runtime) × (Cost per hour)
Example calculations:
Quick test:
  • Hardware: cpu-basic ($0.10/hour)
  • Time: 15 minutes (0.25 hours)
  • Cost: $0.03
Data processing:
  • Hardware: l4x1 ($2.50/hour)
  • Time: 2 hours
  • Cost: $5.00
Batch inference:
  • Hardware: a10g-large ($5/hour)
  • Time: 4 hours
  • Cost: $20.00
Cost optimization tips:
  1. Start small - Test on cpu-basic or t4-small
  2. Monitor runtime - Set appropriate timeouts
  3. Use checkpoints - Resume if job fails
  4. Optimize code - Reduce unnecessary compute
  5. Choose right hardware - Don't over-provision
通用公式:
总成本 = 运行时长(小时) × 每小时成本
示例计算:
快速测试:
  • 硬件:cpu-basic($0.10/小时)
  • 时间:15分钟(0.25小时)
  • 成本:$0.03
数据处理:
  • 硬件:l4x1($2.50/小时)
  • 时间:2小时
  • 成本:$5.00
批处理推理:
  • 硬件:a10g-large($5/小时)
  • 时间:4小时
  • 成本:$20.00
成本优化技巧:
  1. 从小规模开始 - 在cpu-basic或t4-small上测试
  2. 监控运行时间 - 设置合适的超时
  3. 使用检查点 - 任务失败后可恢复
  4. 优化代码 - 减少不必要的计算
  5. 选择合适的硬件 - 不要过度配置

Monitoring and Tracking

监控与追踪

Check Job Status

检查任务状态

MCP Tool:
python
undefined
MCP工具:
python
undefined

List all jobs

列出所有任务

hf_jobs("ps")
hf_jobs("ps")

Inspect specific job

查看特定任务详情

hf_jobs("inspect", {"job_id": "your-job-id"})
hf_jobs("inspect", {"job_id": "your-job-id"})

View logs

查看日志

hf_jobs("logs", {"job_id": "your-job-id"})
hf_jobs("logs", {"job_id": "your-job-id"})

Cancel a job

取消任务

hf_jobs("cancel", {"job_id": "your-job-id"})

**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job
hf_jobs("cancel", {"job_id": "your-job-id"})

**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job

List your jobs

列出你的任务

jobs = list_jobs()
jobs = list_jobs()

List running jobs only

仅列出运行中的任务

running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]

Inspect specific job

查看特定任务详情

job_info = inspect_job(job_id="your-job-id")
job_info = inspect_job(job_id="your-job-id")

View logs

查看日志

for log in fetch_job_logs(job_id="your-job-id"): print(log)
for log in fetch_job_logs(job_id="your-job-id"): print(log)

Cancel a job

取消任务

cancel_job(job_id="your-job-id")

**CLI:**
```bash
hf jobs ps                    # List jobs
hf jobs logs <job-id>         # View logs
hf jobs cancel <job-id>       # Cancel job
Remember: Wait for user to request status checks. Avoid polling repeatedly.
cancel_job(job_id="your-job-id")

**CLI:**
```bash
hf jobs ps                    # 列出任务
hf jobs logs <job-id>         # 查看日志
hf jobs cancel <job-id>       # 取消任务
注意: 等待用户请求状态检查。避免重复轮询。

Job URLs

任务URL

After submission, jobs have monitoring URLs:
https://huggingface.co/jobs/username/job-id
View logs, status, and details in the browser.
提交任务后,任务会有监控URL:
https://huggingface.co/jobs/username/job-id
可在浏览器中查看日志、状态和详情。

Wait for Multiple Jobs

等待多个任务完成

python
import time
from huggingface_hub import inspect_job, run_job
python
import time
from huggingface_hub import inspect_job, run_job

Run multiple jobs

运行多个任务

jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]

Wait for all to complete

等待所有任务完成

for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)
undefined
for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)
undefined

Scheduled Jobs

定时任务

Run jobs on a schedule using CRON expressions or predefined schedules.
MCP Tool:
python
undefined
使用CRON表达式或预定义计划定时运行任务。
MCP工具:
python
undefined

Schedule a UV script that runs every hour

定时运行每小时执行一次的UV脚本

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "@hourly", "flavor": "cpu-basic" })
hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "@hourly", "flavor": "cpu-basic" })

Schedule with CRON syntax

使用CRON语法定时运行

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 9 AM every Monday "flavor": "cpu-basic" })
hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 每周一上午9点 "flavor": "cpu-basic" })

Schedule a Docker-based job

定时运行基于Docker的任务

hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })

**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job
hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })

**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job

Schedule a Docker job

定时运行Docker任务

create_scheduled_job( image="python:3.12", command=["python", "-c", "print('Running on schedule!')"], schedule="@hourly" )
create_scheduled_job( image="python:3.12", command=["python", "-c", "print('Running on schedule!')"], schedule="@hourly" )

Schedule a UV script

定时运行UV脚本

create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")

Schedule with GPU

定时运行GPU任务

create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # Every 6 hours flavor="a10g-small" )

**Available schedules:**
- `@annually`, `@yearly` - Once per year
- `@monthly` - Once per month
- `@weekly` - Once per week
- `@daily` - Once per day
- `@hourly` - Once per hour
- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)

**Manage scheduled jobs:**
```python
create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # 每6小时一次 flavor="a10g-small" )

**可用计划:**
- `@annually`、`@yearly` - 每年一次
- `@monthly` - 每月一次
- `@weekly` - 每周一次
- `@daily` - 每天一次
- `@hourly` - 每小时一次
- CRON表达式 - 自定义计划(如`"*/5 * * * *"`表示每5分钟一次)

**管理定时任务:**
```python

MCP Tool

MCP工具

hf_jobs("scheduled ps") # List scheduled jobs hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause hf_jobs("scheduled resume", {"job_id": "..."}) # Resume hf_jobs("scheduled delete", {"job_id": "..."}) # Delete

**Python API for management:**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)
hf_jobs("scheduled ps") # 列出定时任务 hf_jobs("scheduled inspect", {"job_id": "..."}) # 查看详情 hf_jobs("scheduled suspend", {"job_id": "..."}) # 暂停 hf_jobs("scheduled resume", {"job_id": "..."}) # 恢复 hf_jobs("scheduled delete", {"job_id": "..."}) # 删除

**用于管理的Python API:**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)

List all scheduled jobs

列出所有定时任务

scheduled = list_scheduled_jobs()
scheduled = list_scheduled_jobs()

Inspect a scheduled job

查看定时任务详情

info = inspect_scheduled_job(scheduled_job_id)
info = inspect_scheduled_job(scheduled_job_id)

Suspend (pause) a scheduled job

暂停定时任务

suspend_scheduled_job(scheduled_job_id)
suspend_scheduled_job(scheduled_job_id)

Resume a scheduled job

恢复定时任务

resume_scheduled_job(scheduled_job_id)
resume_scheduled_job(scheduled_job_id)

Delete a scheduled job

删除定时任务

delete_scheduled_job(scheduled_job_id)
undefined
delete_scheduled_job(scheduled_job_id)
undefined

Webhooks: Trigger Jobs on Events

Webhooks:事件触发任务

Trigger jobs automatically when changes happen in Hugging Face repositories.
Python API:
python
from huggingface_hub import create_webhook
当Hugging Face仓库发生变化时,自动触发任务。
Python API:
python
from huggingface_hub import create_webhook

Create webhook that triggers a job when a repo changes

创建Webhook,当仓库变化时触发任务

webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )

**How it works:**
1. Webhook listens for changes in watched repositories
2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable
3. Your script can parse the payload to understand what changed

**Use cases:**
- Auto-process new datasets when uploaded
- Trigger inference when models are updated
- Run tests when code changes
- Generate reports on repository activity

**Access webhook payload in script:**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")
See Webhooks Documentation for more details.
webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )

**工作原理:**
1. Webhook监听被关注仓库的变化
2. 触发时,任务会在`WEBHOOK_PAYLOAD`环境变量中获取相关信息
3. 你的脚本可解析该负载以了解发生了哪些变化

**使用场景:**
- 上传新数据集时自动处理
- 模型更新时触发推理
- 代码变化时运行测试
- 生成仓库活动报告

**在脚本中访问Webhook负载:**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"事件类型: {payload.get('event', {}).get('action')}")
查看Webhooks文档获取更多详情。

Common Workload Patterns

常见工作负载模式

This repository ships ready-to-run UV scripts in
hf-jobs/scripts/
. Prefer using them instead of inventing new templates.
本仓库在
hf-jobs/scripts/
中提供了可直接运行的UV脚本。优先使用这些脚本,而非自行编写模板。

Pattern 1: Dataset → Model Responses (vLLM) —
scripts/generate-responses.py

模式1:数据集→模型响应(vLLM)——
scripts/generate-responses.py

What it does: loads a Hub dataset (chat
messages
or a
prompt
column), applies a model chat template, generates responses with vLLM, and pushes the output dataset + dataset card back to the Hub.
Requires: GPU + write token (it pushes a dataset).
python
from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
功能: 加载Hub数据集(聊天
messages
prompt
列),应用模型聊天模板,使用vLLM生成响应,并将输出数据集和数据集卡片推送回Hub。
要求: GPU + 令牌(需要推送数据集)。
python
from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 2: CoT Self-Instruct Synthetic Data —
scripts/cot-self-instruct.py

模式2:CoT自指令合成数据——
scripts/cot-self-instruct.py

What it does: generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then pushes the generated dataset + dataset card to the Hub.
Requires: GPU + write token (it pushes a dataset).
python
from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
功能: 通过CoT自指令生成合成提示/答案,可选择过滤输出(答案一致性/RIP),然后将生成的数据集和数据集卡片推送到Hub。
要求: GPU + 令牌(需要推送数据集)。
python
from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 3: Streaming Dataset Stats (Polars + HF Hub) —
scripts/finepdfs-stats.py

模式3:流式数据集统计(Polars + HF Hub)——
scripts/finepdfs-stats.py

What it does: scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.
Requires: CPU is often enough; token needed only if you pass
--output-repo
(upload).
python
from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
功能: 直接从Hub扫描parquet文件(无需下载300GB数据),计算时间统计信息,并(可选)将结果上传到Hub数据集仓库。
要求: 通常CPU即可;仅当传递
--output-repo
(上传)时需要令牌。
python
from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Common Failure Modes

常见失败模式

Out of Memory (OOM)

内存不足(OOM)

Fix:
  1. Reduce batch size or data chunk size
  2. Process data in smaller batches
  3. Upgrade hardware: cpu → t4 → a10g → a100
解决方法:
  1. 减小批处理大小或数据块大小
  2. 分小批处理数据
  3. 升级硬件:cpu → t4 → a10g → a100

Job Timeout

任务超时

Fix:
  1. Check logs for actual runtime
  2. Increase timeout with buffer:
    "timeout": "3h"
  3. Optimize code for faster execution
  4. Process data in chunks
解决方法:
  1. 查看日志了解实际运行时间
  2. 增加超时并预留缓冲:
    "timeout": "3h"
  3. 优化代码以加快执行速度
  4. 分块处理数据

Hub Push Failures

Hub推送失败

Fix:
  1. Add to job:
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  2. Verify token in script:
    assert "HF_TOKEN" in os.environ
  3. Check token permissions
  4. Verify repo exists or can be created
解决方法:
  1. 在任务中添加:
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  2. 在脚本中验证令牌:
    assert "HF_TOKEN" in os.environ
  3. 检查令牌权限
  4. 验证仓库是否存在或可创建

Missing Dependencies

依赖缺失

Fix: Add to PEP 723 header:
python
undefined
解决方法: 在PEP 723标头中添加:
python
undefined

/// script

/// script

dependencies = ["package1", "package2>=1.0.0"]

dependencies = ["package1", "package2>=1.0.0"]

///

///

undefined
undefined

Authentication Errors

认证错误

Fix:
  1. Check
    hf_whoami()
    works locally
  2. Verify
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    in job config
  3. Re-login:
    hf auth login
  4. Check token has required permissions
解决方法:
  1. 确认本地
    hf_whoami()
    可正常工作
  2. 验证任务配置中包含
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  3. 重新登录:
    hf auth login
  4. 检查令牌是否拥有所需权限

Troubleshooting

故障排除

Common issues:
  • Job times out → Increase timeout, optimize code
  • Results not saved → Check persistence method, verify HF_TOKEN
  • Out of Memory → Reduce batch size, upgrade hardware
  • Import errors → Add dependencies to PEP 723 header
  • Authentication errors → Check token, verify secrets parameter
See:
references/troubleshooting.md
for complete troubleshooting guide
常见问题:
  • 任务超时 → 增加超时时间,优化代码
  • 结果未保存 → 检查持久化方式,验证HF_TOKEN
  • 内存不足 → 减小批处理大小,升级硬件
  • 导入错误 → 在PEP 723标头中添加依赖
  • 认证错误 → 检查令牌,验证secrets参数
查看:
references/troubleshooting.md
获取完整的故障排除指南

Resources

资源

References (In This Skill)

本技能中的参考文档

  • references/token_usage.md
    - Complete token usage guide
  • references/hardware_guide.md
    - Hardware specs and selection
  • references/hub_saving.md
    - Hub persistence guide
  • references/troubleshooting.md
    - Common issues and solutions
  • references/token_usage.md
    - 完整的令牌使用指南
  • references/hardware_guide.md
    - 硬件规格与选择
  • references/hub_saving.md
    - Hub持久化指南
  • references/troubleshooting.md
    - 常见问题与解决方案

Scripts (In This Skill)

本技能中的脚本

  • scripts/generate-responses.py
    - vLLM batch generation: dataset → responses → push to Hub
  • scripts/cot-self-instruct.py
    - CoT Self-Instruct synthetic data generation + filtering → push to Hub
  • scripts/finepdfs-stats.py
    - Polars streaming stats over
    finepdfs-edu
    parquet on Hub (optional push)
  • scripts/generate-responses.py
    - vLLM批处理生成:数据集→响应→推送到Hub
  • scripts/cot-self-instruct.py
    - CoT自指令合成数据生成+过滤→推送到Hub
  • scripts/finepdfs-stats.py
    - 对Hub上的
    finepdfs-edu
    parquet文件进行Polars流式统计(可选推送)

External Links

外部链接

Official Documentation:
Related Tools:
官方文档:
相关工具:

Key Takeaways

核心要点

  1. Submit scripts inline - The
    script
    parameter accepts Python code directly; no file saving required unless user requests
  2. Jobs are asynchronous - Don't wait/poll; let user check when ready
  3. Always set timeout - Default 30 min may be insufficient; set appropriate timeout
  4. Always persist results - Environment is ephemeral; without persistence, all work is lost
  5. Use tokens securely - Always use
    secrets={"HF_TOKEN": "$HF_TOKEN"}
    for Hub operations
  6. Choose appropriate hardware - Start small, scale up based on needs (see hardware guide)
  7. Use UV scripts - Default to
    hf_jobs("uv", {...})
    with inline scripts for Python workloads
  8. Handle authentication - Verify tokens are available before Hub operations
  9. Monitor jobs - Provide job URLs and status check commands
  10. Optimize costs - Choose right hardware, set appropriate timeouts
  1. 内联提交脚本 -
    script
    参数可直接接受Python代码;除非用户要求,否则无需保存文件
  2. 任务是异步的 - 不要等待/轮询;让用户在需要时检查
  3. 始终设置超时 - 默认30分钟可能不足;设置合适的超时时间
  4. 始终持久化结果 - 环境是临时的;不持久化的话所有工作都会丢失
  5. 安全使用令牌 - 执行Hub操作时始终使用
    secrets={"HF_TOKEN": "$HF_TOKEN"}
  6. 选择合适的硬件 - 从小规模开始,根据需求扩容(查看硬件指南)
  7. 使用UV脚本 - 对于Python工作负载,默认使用
    hf_jobs("uv", {...})
    和内联脚本
  8. 处理认证 - 在执行Hub操作前验证令牌是否可用
  9. 监控任务 - 提供任务URL和状态检查命令
  10. 优化成本 - 选择合适的硬件,设置合适的超时时间

Quick Reference: MCP Tool vs CLI vs Python API

快速参考:MCP工具 vs CLI vs Python API

OperationMCP ToolCLIPython API
Run UV script
hf_jobs("uv", {...})
hf jobs uv run script.py
run_uv_job("script.py")
Run Docker job
hf_jobs("run", {...})
hf jobs run image cmd
run_job(image, command)
List jobs
hf_jobs("ps")
hf jobs ps
list_jobs()
View logs
hf_jobs("logs", {...})
hf jobs logs <id>
fetch_job_logs(job_id)
Cancel job
hf_jobs("cancel", {...})
hf jobs cancel <id>
cancel_job(job_id)
Schedule UV
hf_jobs("scheduled uv", {...})
-
create_scheduled_uv_job()
Schedule Docker
hf_jobs("scheduled run", {...})
-
create_scheduled_job()
操作MCP工具CLIPython API
运行UV脚本
hf_jobs("uv", {...})
hf jobs uv run script.py
run_uv_job("script.py")
运行Docker任务
hf_jobs("run", {...})
hf jobs run image cmd
run_job(image, command)
列出任务
hf_jobs("ps")
hf jobs ps
list_jobs()
查看日志
hf_jobs("logs", {...})
hf jobs logs <id>
fetch_job_logs(job_id)
取消任务
hf_jobs("cancel", {...})
hf jobs cancel <id>
cancel_job(job_id)
定时运行UV脚本
hf_jobs("scheduled uv", {...})
-
create_scheduled_uv_job()
定时运行Docker任务
hf_jobs("scheduled run", {...})
-
create_scheduled_job()
",