replicate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Docs

文档

Workflow

工作流程

Here's a common workflow for using Replicate's API to run a model:
  1. Choose the right model - Search with the API or ask the user
  2. Get model metadata - Fetch model input and output schema via API
  3. Create prediction - POST to /v1/predictions
  4. Poll for results - GET prediction until status is "succeeded"
  5. Return output - Usually URLs to generated content
以下是使用Replicate API运行模型的常见工作流程:
  1. 选择合适的模型 - 通过API搜索或询问用户
  2. 获取模型元数据 - 通过API获取模型的输入和输出架构
  3. 创建预测任务 - 向/v1/predictions发送POST请求
  4. 轮询结果 - 持续GET预测任务状态,直到状态变为“succeeded”
  5. 返回输出结果 - 通常是生成内容的URL

Choosing models

模型选择

  • Use the search and collections APIs to find and compare the best models. Do not list all the models via API, as it's basically a firehose.
  • Collections are curated by Replicate staff, so they're vetted.
  • Official models are in the "official" collection.
  • Use official models because they:
    • are always running
    • have stable API interfaces
    • have predictable output pricing
    • are maintained by Replicate staff
  • If you must use a community model, be aware that it can take a long time to boot.
  • You can create always-on deployments of community models, but you pay for model uptime.
  • 使用搜索和合集API来查找并对比最佳模型。不要通过API列出所有模型,这会返回大量数据。
  • 合集由Replicate团队精心挑选,经过审核。
  • 官方模型位于“official”合集中。
  • 优先使用官方模型,因为它们:
    • 始终处于运行状态
    • 拥有稳定的API接口
    • 定价可预测
    • 由Replicate团队维护
  • 如果必须使用社区模型,请注意启动可能需要很长时间。
  • 你可以为社区模型创建始终在线的部署,但需要为模型运行时间付费。

Running models

运行模型

Models take time to run. There are three ways to run a model via API and get its output:
  1. Create a prediction, store its id from the response, and poll until completion.
  2. Set a
    Prefer: wait
    header when creating a prediction for a blocking synchronous response. Only recommended for very fast models.
  3. Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.
Follow these guideliness when running models:
  • Use the "POST /v1/predictions" endpoint, as it supports both official and community models.
  • Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs.
  • Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
  • Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
  • Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
  • Webhooks are a good mechanism for receiving and storing prediction output.
模型运行需要时间。通过API运行模型并获取输出有三种方式:
  1. 创建预测任务,保存响应中的ID,然后轮询直到任务完成。
  2. 创建预测任务时设置
    Prefer: wait
    请求头,以获得阻塞式同步响应。仅推荐用于运行速度极快的模型。
  3. 创建预测任务时设置HTTPS Webhook URL,当预测任务完成时,Replicate会向该URL发送POST请求。
运行模型时请遵循以下指南:
  • 使用“POST /v1/predictions”端点,因为它同时支持官方和社区模型。
  • 每个模型都有自己的OpenAPI架构。请始终获取并检查模型架构,确保设置的输入有效。
  • 尽可能使用HTTPS URL作为文件输入。你也可以发送base64编码的文件,但应尽量避免这种方式。
  • 可以同时发起多个预测任务,无需等待前一个完成再启动下一个。
  • 输出文件的URL会在1小时后过期,如果你需要保留它们,请使用Cloudflare R2等服务进行备份。
  • Webhook是接收和存储预测输出的理想机制。