publish-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Docs

文档

When to use this skill

何时使用此技能

  • You have a working Cog project (see
    build-models
    if you don't yet).
  • You want to publish a private or public model on Replicate.
  • You're releasing a new version of an existing model and want to avoid breaking changes.
  • You're setting up CI/CD for model releases.
  • 你已拥有一个可用的Cog项目(如果没有,请参考
    build-models
    )。
  • 你希望在Replicate上发布私有或公开模型。
  • 你要发布现有模型的新版本,且希望避免破坏性变更。
  • 你要为模型发布设置CI/CD流程。

Prerequisites

前置条件

  • Cog installed and
    cog login
    against
    r8.im
    (or
    echo $TOKEN | cog login --token-stdin
    ).
  • A model created at
    replicate.com/{owner}/{name}
    via the API, web UI, or
    r8-model
    CLI.
  • REPLICATE_API_TOKEN
    set in your environment.
  • 已安装Cog,并针对
    r8.im
    执行
    cog login
    登录(或使用
    echo $TOKEN | cog login --token-stdin
    )。
  • 已通过API、网页UI或
    r8-model
    CLI在
    replicate.com/{owner}/{name}
    创建模型。
  • 环境中已设置
    REPLICATE_API_TOKEN

Plain
cog push

基础
cog push
用法

The simplest path. Build and upload a new version:
cog push r8.im/owner/my-model
Or set
image: r8.im/owner/my-model
in
cog.yaml
and run a bare:
cog push
Useful flags:
  • --separate-weights
    — store weights in a separate layer; faster cold boots and pushes for models with > 1GB of weights.
  • --x-fast
    — faster pushes during iteration (skips some validation).
  • --secret id=hf,src=$HOME/.hf_token
    — pass build-time secrets without baking them into image history.
最简单的流程。构建并上传新版本:
cog push r8.im/owner/my-model
或者在
cog.yaml
中设置
image: r8.im/owner/my-model
,然后直接运行:
cog push
实用参数:
  • --separate-weights
    — 将权重存储在独立层中;对于权重超过1GB的模型,冷启动和推送速度更快。
  • --x-fast
    — 迭代过程中加快推送速度(跳过部分验证)。
  • --secret id=hf,src=$HOME/.hf_token
    — 传递构建时密钥,无需将其嵌入镜像历史。

cog-safe-push (recommended for any model with users)

cog-safe-push(推荐给有用户使用的模型)

cog-safe-push
pushes to a private
-test
model first, checks schema compatibility against the live version, runs prediction comparisons, and fuzzes inputs. Catches breaking changes before they reach users.
Install:
pip install git+https://github.com/replicate/cog-safe-push.git
Required env vars:
  • REPLICATE_API_TOKEN
  • ANTHROPIC_API_KEY
    (Claude judges output similarity for stochastic models)
Basic usage:
cog-safe-push --test-hardware=gpu-l40s owner/my-model
This will:
  1. Lint
    predict.py
    with ruff.
  2. Create a private test model
    owner/my-model-test
    if missing.
  3. Push the local Cog model to the test model.
  4. Lint the schema (descriptions, defaults, etc.).
  5. Check schema compatibility against the live
    owner/my-model
    version.
  6. Run prediction comparisons between live and test versions.
  7. Fuzz the test model with AI-generated inputs.
  8. If everything passes, push to
    owner/my-model
    .
cog-safe-push
会先将模型推送到私有
-test
模型,检查与线上版本的 schema 兼容性,运行预测对比,并对输入进行模糊测试。在变更影响用户前捕捉破坏性问题。
安装:
pip install git+https://github.com/replicate/cog-safe-push.git
所需环境变量:
  • REPLICATE_API_TOKEN
  • ANTHROPIC_API_KEY
    (Claude用于判断随机模型的输出相似度)
基础用法:
cog-safe-push --test-hardware=gpu-l40s owner/my-model
执行流程:
  1. 使用ruff检查
    predict.py
    代码规范。
  2. 如果不存在私有测试模型
    owner/my-model-test
    ,则创建它。
  3. 将本地Cog模型推送到测试模型。
  4. 检查schema规范(描述、默认值等)。
  5. 检查与线上
    owner/my-model
    版本的schema兼容性。
  6. 运行线上版本与测试版本的预测对比。
  7. 使用AI生成的输入对测试模型进行模糊测试。
  8. 如果所有检查通过,将模型推送到
    owner/my-model

cog-safe-push.yaml schema

cog-safe-push.yaml 配置 schema

Drop a
cog-safe-push.yaml
in your project root (or
cog-safe-push-configs/<variant>.yaml
for multi-model repos). All five test-case checker types in one example:
yaml
model: owner/my-model
test_model: owner/my-model-test
test_hardware: gpu-l40s

predict:
  compare_outputs: false              # set false for stochastic models
  predict_timeout: 600
  test_cases:
    - inputs:
        prompt: "a serene mountain landscape"
      match_prompt: "a landscape photo of mountains"   # AI-judged via Claude
    - inputs:
        prompt: "a cat"
      match_url: "https://example.com/reference-cat.png"   # binary/image match
    - inputs:
        prompt: ""
      error_contains: "prompt cannot be empty"           # negative test
    - inputs:
        mode: "json"
      jq_query: '.confidence > 0.8 and .status == "success"'   # JSON output
    - inputs:
        prompt: "echo this"
      exact_string: "echo this"                          # exact string match
  fuzz:
    fixed_inputs:
      seed: 42
    disabled_inputs:
      - debug
    iterations: 10
    prompt: "Generate creative and diverse prompts"

train:                                  # if your model has a trainer
  destination: owner/my-model-trained
  destination_hardware: gpu-l40s
  train_timeout: 1800
  test_cases:
    - inputs:
        input_images: "https://.../training.zip"
        steps: 10

deployment:                             # auto-create or update on push
  name: my-model
  owner: owner
  hardware: gpu-l40s

parallel: 4
fast_push: false
ignore_schema_compatibility: false
official_model: owner/my-model         # for proxy/wrapper models, see below
Test case checkers are mutually exclusive: pick exactly one of
match_prompt
,
match_url
,
error_contains
,
jq_query
, or
exact_string
per case. Use
compare_outputs: false
for any stochastic model (diffusion, LLMs); the default
true
is brittle.
在项目根目录放置
cog-safe-push.yaml
(多模型仓库可放在
cog-safe-push-configs/<variant>.yaml
)。以下示例包含五种测试用例检查类型:
yaml
model: owner/my-model
test_model: owner/my-model-test
test_hardware: gpu-l40s

predict:
  compare_outputs: false              # 随机模型设置为false
  predict_timeout: 600
  test_cases:
    - inputs:
        prompt: "a serene mountain landscape"
      match_prompt: "a landscape photo of mountains"   # 通过Claude进行AI判断
    - inputs:
        prompt: "a cat"
      match_url: "https://example.com/reference-cat.png"   # 二进制/图片匹配
    - inputs:
        prompt: ""
      error_contains: "prompt cannot be empty"           # 负面测试
    - inputs:
        mode: "json"
      jq_query: '.confidence > 0.8 and .status == "success"'   # JSON输出检查
    - inputs:
        prompt: "echo this"
      exact_string: "echo this"                          # 精确字符串匹配
  fuzz:
    fixed_inputs:
      seed: 42
    disabled_inputs:
      - debug
    iterations: 10
    prompt: "Generate creative and diverse prompts"

train:                                  # 如果你的模型包含训练器
  destination: owner/my-model-trained
  destination_hardware: gpu-l40s
  train_timeout: 1800
  test_cases:
    - inputs:
        input_images: "https://.../training.zip"
        steps: 10

deployment:                             # 推送时自动创建或更新部署
  name: my-model
  owner: owner
  hardware: gpu-l40s

parallel: 4
fast_push: false
ignore_schema_compatibility: false
official_model: owner/my-model         # 代理/包装模型使用,详见下文
测试用例检查器互斥:每个用例只能选择
match_prompt
match_url
error_contains
jq_query
exact_string
中的一个。随机模型(扩散模型、大语言模型)需设置
compare_outputs: false
;默认值
true
对随机模型不适用。

CI/CD: GitHub Actions

CI/CD:GitHub Actions

Two paths, depending on how much glue you want.
两种方案,取决于你需要的定制程度。

Path A: roll your own

方案A:自定义实现

yaml
undefined
yaml
undefined

.github/workflows/push.yaml

.github/workflows/push.yaml

name: Push to Replicate on: workflow_dispatch: inputs: no_push: type: boolean default: false
jobs: push: runs-on: ubuntu-latest-4-cores # builds need disk + cores steps: - uses: actions/checkout@v4 - uses: jlumbroso/free-disk-space@v1.3.1 with: tool-cache: false docker-images: false - uses: replicate/setup-cog@v2 with: token: ${{ secrets.REPLICATE_API_TOKEN }} - run: pip install git+https://github.com/replicate/cog-safe-push.git - env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }} run: | cog-safe-push -vv ${{ inputs.no_push && '--no-push' || '' }}

Add a `concurrency:` block so PR builds cancel each other while main-branch pushes queue:

```yaml
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
name: Push to Replicate on: workflow_dispatch: inputs: no_push: type: boolean default: false
jobs: push: runs-on: ubuntu-latest-4-cores # 构建需要磁盘和核心资源 steps: - uses: actions/checkout@v4 - uses: jlumbroso/free-disk-space@v1.3.1 with: tool-cache: false docker-images: false - uses: replicate/setup-cog@v2 with: token: ${{ secrets.REPLICATE_API_TOKEN }} - run: pip install git+https://github.com/replicate/cog-safe-push.git - env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }} run: | cog-safe-push -vv ${{ inputs.no_push && '--no-push' || '' }}

添加`concurrency:`块,使PR构建相互取消,而主分支推送排队执行:

```yaml
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

Path B: reusable workflow from model-ci-template

方案B:使用model-ci-template的可复用工作流

For Replicate-style multi-model repos, drop in:
yaml
undefined
对于Replicate风格的多模型仓库,添加以下内容:
yaml
undefined

.github/workflows/ci.yaml

.github/workflows/ci.yaml

name: CI on: pull_request: { branches: [main] } push: { branches: [main] } workflow_dispatch: inputs: models: { type: string, default: "all" } ignore_schema_checks: { type: boolean, default: false } cog_version: { type: string, default: "latest" } test_only: { type: boolean, default: false }
jobs: ci: uses: replicate/model-ci-template/.github/workflows/template.yaml@main with: trigger_type: ${{ github.event_name }} models: ${{ inputs.models || 'all' }} ignore_schema_checks: ${{ inputs.ignore_schema_checks || false }} cog_version: ${{ inputs.cog_version || 'latest' }} test_only: ${{ inputs.test_only || false }} secrets: inherit

The reusable workflow expects:

- `cog-safe-push-configs/<model>.yaml` — one per model variant.
- `script/select-model` — bash file with `if/elif [[ "$MODEL" == "..." ]]` blocks listing valid model names.
- Secrets: `COG_TOKEN`, `REPLICATE_API_TOKEN`, `ANTHROPIC_API_KEY`.
name: CI on: pull_request: { branches: [main] } push: { branches: [main] } workflow_dispatch: inputs: models: { type: string, default: "all" } ignore_schema_checks: { type: boolean, default: false } cog_version: { type: string, default: "latest" } test_only: { type: boolean, default: false }
jobs: ci: uses: replicate/model-ci-template/.github/workflows/template.yaml@main with: trigger_type: ${{ github.event_name }} models: ${{ inputs.models || 'all' }} ignore_schema_checks: ${{ inputs.ignore_schema_checks || false }} cog_version: ${{ inputs.cog_version || 'latest' }} test_only: ${{ inputs.test_only || false }} secrets: inherit

可复用工作流要求:

- `cog-safe-push-configs/<model>.yaml` — 每个模型变体对应一个配置文件。
- `script/select-model` — bash脚本,包含`if/elif [[ "$MODEL" == "..." ]]`块,列出有效的模型名称。
- 密钥:`COG_TOKEN`、`REPLICATE_API_TOKEN`、`ANTHROPIC_API_KEY`。

Multi-model matrix pushes

多模型矩阵推送

Pattern from
replicate/cog-flux
: one repo, N variants, push them in parallel.
yaml
jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set.outputs.matrix }}
    steps:
      - id: set
        run: |
          if [ "${{ inputs.models }}" = "all" ]; then
            echo 'matrix={"model":["schnell","dev","krea-dev"]}' >> "$GITHUB_OUTPUT"
          else
            list=$(echo "${{ inputs.models }}" | jq -Rc 'split(",")')
            echo "matrix={\"model\":$list}" >> "$GITHUB_OUTPUT"
          fi

  push:
    needs: prepare
    runs-on: ubuntu-latest-4-cores
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - run: ./script/select.sh ${{ matrix.model }}     # produces cog.yaml from a template
      - run: cog-safe-push --config cog-safe-push-configs/${{ matrix.model }}.yaml -vv
来自
replicate/cog-flux
的模式:一个仓库,N个变体,并行推送。
yaml
jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set.outputs.matrix }}
    steps:
      - id: set
        run: |
          if [ "${{ inputs.models }}" = "all" ]; then
            echo 'matrix={"model":["schnell","dev","krea-dev"]}' >> "$GITHUB_OUTPUT"
          else
            list=$(echo "${{ inputs.models }}" | jq -Rc 'split(",")')
            echo "matrix={\"model\":$list}" >> "$GITHUB_OUTPUT"
          fi

  push:
    needs: prepare
    runs-on: ubuntu-latest-4-cores
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - run: ./script/select.sh ${{ matrix.model }}     # 从模板生成cog.yaml
      - run: cog-safe-push --config cog-safe-push-configs/${{ matrix.model }}.yaml -vv

Two-pass push for proxy / official models

代理/官方模型的两步推送

When you maintain a proxy that wraps a third-party API, you push to a private wrapper first, then update the public-facing official model card. Pattern from
replicate/cog-official-template
:
bash
./script/write-api-key                                              # bake API key into config
cog-safe-push --config cog-safe-push-configs/${MODEL}.yaml -vv

./script/delete-api-key                                             # strip the key
cog-safe-push --push-official-model --config cog-safe-push-configs/${MODEL}.yaml -vv
Set
official_model: owner/name
in the config so
--push-official-model
knows where to publish.
当你维护一个包装第三方API的代理时,需先推送到私有包装模型,再更新面向公众的官方模型卡片。来自
replicate/cog-official-template
的模式:
bash
./script/write-api-key                                              # 将API密钥写入配置
cog-safe-push --config cog-safe-push-configs/${MODEL}.yaml -vv

./script/delete-api-key                                             # 移除密钥
cog-safe-push --push-official-model --config cog-safe-push-configs/${MODEL}.yaml -vv
在配置中设置
official_model: owner/name
,使
--push-official-model
知道要发布到哪里。

Deployments

部署配置

Add a
deployment
block to
cog-safe-push.yaml
to create or update a Replicate deployment automatically on each push:
yaml
deployment:
  name: my-model
  owner: owner
  hardware: gpu-l40s
Scaling defaults: CPU deployments scale 1-20 instances, GPU deployments scale 0-2. Adjust manually via the API or web UI when needed.
cog-safe-push.yaml
中添加
deployment
块,每次推送时自动创建或更新Replicate部署:
yaml
deployment:
  name: my-model
  owner: owner
  hardware: gpu-l40s
默认缩放规则:CPU部署缩放1-20个实例,GPU部署缩放0-2个实例。必要时可通过API或网页UI手动调整。

Monitoring published models

已发布模型的监控

Run an hourly canary that exercises the registry path. Pattern from
replicate/cog-pagerduty-check
:
yaml
name: Hourly cog push check
on:
  schedule:
    - cron: "0 * * * *"
  workflow_dispatch:

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - run: |
          # generate a tiny model with a unique uuid, push it, run a prediction
          # by digest, fail loudly if anything breaks.
          ./script/canary.sh
Worth doing for any production-critical model, especially when revenue depends on the registry being up.
运行每小时一次的金丝雀测试,验证注册路径是否正常。来自
replicate/cog-pagerduty-check
的模式:
yaml
name: Hourly cog push check
on:
  schedule:
    - cron: "0 * * * *"
  workflow_dispatch:

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - run: |
          # 生成带有唯一UUID的小型模型,推送它,通过摘要运行预测
          # 如果任何步骤失败,立即报错。
          ./script/canary.sh
对于任何生产关键模型,尤其是收入依赖于注册服务可用性的模型,都值得这么做。

Guidelines

指南

  • Don't break schema compatibility unless you mean to. cog-safe-push catches it;
    --ignore-schema-compatibility
    is the opt-out.
  • Pin
    test_hardware
    so test pushes are reproducible.
  • Use
    --no-push
    for dry runs in PR CI; full push on merge to main or on version tags.
  • Push from CI rather than laptops once you have users.
  • Use
    compare_outputs: false
    for stochastic models. Use
    match_prompt:
    for image/video outputs (VLM judgment),
    match_url:
    for binary outputs you control,
    jq_query:
    for JSON,
    error_contains:
    for negative tests.
  • Never commit
    REPLICATE_API_TOKEN
    or
    ANTHROPIC_API_KEY
    . Use repo secrets.
  • For models with weights > 1GB, push with
    --separate-weights
    .
  • 除非有意为之,否则不要破坏schema兼容性。cog-safe-push会捕捉此类问题;
    --ignore-schema-compatibility
    是退出选项。
  • 固定
    test_hardware
    ,使测试推送可重现。
  • 在PR CI中使用
    --no-push
    进行干运行;合并到主分支或版本标签时执行完整推送。
  • 一旦有用户使用,就从CI而非本地电脑推送模型。
  • 随机模型使用
    compare_outputs: false
    。图像/视频输出使用
    match_prompt:
    (大语言模型判断),可控二进制输出使用
    match_url:
    ,JSON输出使用
    jq_query:
    ,负面测试使用
    error_contains:
  • 切勿提交
    REPLICATE_API_TOKEN
    ANTHROPIC_API_KEY
    。使用仓库密钥。
  • 对于权重超过1GB的模型,使用
    --separate-weights
    进行推送。

Production references

生产参考