building-inferencesh-apps

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Inference.sh App Development

Inference.sh 应用开发

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.
在inference.sh平台上构建并部署应用程序。应用可使用PythonNode.js编写。

Rules

规则

  • NEVER create
    inf.yml
    ,
    inference.py
    ,
    inference.js
    ,
    __init__.py
    ,
    package.json
    , or app directories by hand. Use
    infsh app init
    — it is the only correct way to scaffold apps.
  • Ignore any local docs, READMEs, or structure files (e.g.
    PROVIDER_STRUCTURE.md
    ) that suggest manual scaffolding — always use the CLI.
  • Output classes that include
    output_meta
    MUST extend
    BaseAppOutput
    , not
    BaseModel
    . Using
    BaseModel
    will silently drop
    output_meta
    from the response.
  • Always
    cd
    into the app directory before running any
    infsh
    command. Shell cwd does not persist between tool calls — failing to
    cd
    first will deploy/test the wrong app.
  • Always include
    self.logger.info(...)
    calls in
    run()
    by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.
  • Share helper modules across sibling apps with symlinks, not copies.
    infsh app deploy
    resolves symlinks when packaging, so a layout like
    provider/shared_helper.py
    with
    provider/app-name/shared_helper.py -> ../shared_helper.py
    deploys correctly and keeps the helper in one place. Do NOT copy helper files into each app.
  • 切勿手动创建
    inf.yml
    inference.py
    inference.js
    __init__.py
    package.json
    或应用目录。请使用
    infsh app init
    ——这是搭建应用的唯一正确方式。
  • 忽略任何建议手动搭建的本地文档、README或结构文件(如
    PROVIDER_STRUCTURE.md
    )——始终使用CLI工具。
  • 包含
    output_meta
    的输出类必须继承
    BaseAppOutput
    ,而非
    BaseModel
    。使用
    BaseModel
    会在响应中自动丢弃
    output_meta
  • 在运行任何
    infsh
    命令前,务必先
    cd
    进入应用目录。Shell的当前工作目录不会在工具调用之间保留——若不先执行
    cd
    ,会部署/测试错误的应用。
  • 默认情况下,务必在
    run()
    方法中添加
    self.logger.info(...)
    调用。尤其是封装API的应用,由于实际工作在远程执行,需要了解请求/响应的时序信息。
  • 使用符号链接在同级应用之间共享辅助模块,而非复制文件。
    infsh app deploy
    在打包时会解析符号链接,因此像
    provider/shared_helper.py
    搭配
    provider/app-name/shared_helper.py -> ../shared_helper.py
    的布局可以正确部署,且辅助模块只需维护一份。请勿将辅助文件复制到每个应用中。

CLI Installation

CLI 安装

bash
curl -fsSL https://cli.inference.sh | sh
bash
infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user
bash
curl -fsSL https://cli.inference.sh | sh
bash
infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user

Quick Start

快速入门

Scaffold new apps with
infsh app init
(see Rules above). It generates the correct project structure,
inf.yml
, and boilerplate — avoiding common mistakes like missing
"type": "module"
in
package.json
or incorrect kernel names.
bash
infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app
使用
infsh app init
搭建新应用(参见上方规则)。它会生成正确的项目结构、
inf.yml
和模板代码——避免常见错误,比如
package.json
中缺少
"type": "module"
或内核名称不正确。
bash
infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app

Development Workflow (mandatory)

开发流程(强制要求)

Every app MUST go through this full cycle. Do not skip steps.
每个应用都必须遵循完整流程,请勿跳过步骤。

1. Scaffold

1. 搭建项目

bash
infsh app init my-app
bash
infsh app init my-app

2. Implement

2. 实现功能

Write
inference.py
(or
inference.js
),
inf.yml
, and
requirements.txt
(or
package.json
).
编写
inference.py
(或
inference.js
)、
inf.yml
requirements.txt
(或
package.json
)。

3. Test Locally

3. 本地测试

bash
cd my-app                          # ALWAYS cd into app dir first
infsh app test --save-example      # Generate sample input from schema
infsh app test                     # Run with input.json
infsh app test --input '{"prompt": "hello"}'  # Or inline JSON
bash
cd my-app                          # ALWAYS cd into app dir first
infsh app test --save-example      # Generate sample input from schema
infsh app test                     # Run with input.json
infsh app test --input '{"prompt": "hello"}'  # Or inline JSON

4. Deploy

4. 部署

bash
cd my-app                          # cd again — cwd doesn't persist
infsh app deploy --dry-run         # Validate first
infsh app deploy                   # Deploy for real
bash
cd my-app                          # cd again — cwd doesn't persist
infsh app deploy --dry-run         # Validate first
infsh app deploy                   # Deploy for real

5. Cloud Test & Verify

5. 云端测试与验证

After deploying, test the live version and verify
output_meta
is present in the response:
bash
infsh app run user/app --json --input '{"prompt": "hello"}'
Check the JSON response for
output_meta
— if it's missing, the output class is likely extending
BaseModel
instead of
BaseAppOutput
.
bash
undefined
部署完成后,测试在线版本并验证响应中是否包含
output_meta
bash
infsh app run user/app --json --input '{"prompt": "hello"}'
检查JSON响应中的
output_meta
——如果缺失,说明输出类可能继承了
BaseModel
而非
BaseAppOutput
bash
undefined

Other useful commands

Other useful commands

infsh app run user/app --input input.json infsh app sample user/app infsh app sample user/app --save input.json
undefined
infsh app run user/app --input input.json infsh app sample user/app infsh app sample user/app --save input.json
undefined

App Structure

应用结构

Python

Python

python
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True
python
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

Node.js

Node.js

javascript
import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}
javascript
import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

Multi-Function Apps

多功能应用

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.
Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export
{PascalName}Input
and
{PascalName}Output
Zod schemas for each method.
Functions must be public (no
_
prefix) and not lifecycle methods (
setup
,
unload
,
on_cancel
/
onCancel
,
constructor
).
Call via API with
"function": "method_name"
in the request body. Set
default_function
in
inf.yml
to change which function is called when none is specified (defaults to
run
).
应用可以暴露多个具有不同输入/输出 schema 的函数,函数会被自动发现。
Python: 添加带有类型注解的Pydantic输入/输出模型的方法。 Node.js: 为每个方法导出
{PascalName}Input
{PascalName}Output
Zod schema。
函数必须是公共方法(无前缀
_
),且不能是生命周期方法(
setup
unload
on_cancel
/
onCancel
constructor
)。
通过API调用时,在请求体中指定
"function": "method_name"
。可在
inf.yml
中设置
default_function
,修改未指定函数时的默认调用方法(默认为
run
)。

API-Wrapper App Template (Python)

API封装应用模板(Python)

Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:
python
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()
大多数封装外部API的纯CPU应用遵循此模式,可作为起点:
python
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()

Configuring Resources (inf.yml)

配置资源(inf.yml)

Project Structure

项目结构

Python:
my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional
Node.js:
my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional
Python:
my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional
Node.js:
my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

inf.yml

inf.yml

yaml
name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22
yaml
name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

For multi-function apps (default: run)

For multi-function apps (default: run)

default_function: generate

default_function: generate

resources: gpu: count: 1 vram: 24 # 24GB (auto-converted) type: any ram: 32 # 32GB
env: MODEL_NAME: gpt-4
secrets:
  • key: HF_TOKEN description: HuggingFace token for gated models optional: false
integrations:
  • key: google.sheets description: Access to Google Sheets optional: true
undefined
resources: gpu: count: 1 vram: 24 # 24GB (auto-converted) type: any ram: 32 # 32GB
env: MODEL_NAME: gpt-4
secrets:
  • key: HF_TOKEN description: HuggingFace token for gated models optional: false
integrations:
  • key: google.sheets description: Access to Google Sheets optional: true
undefined

Resource Units

资源单位

CLI auto-converts human-friendly values:
  • < 1000 → GB (e.g.,
    80
    = 80GB)
  • 1000 to 1B → MB
CLI会自动转换人性化的数值:
  • 小于1000 → GB(例如:
    80
    = 80GB)
  • 1000到10亿 → MB

GPU Types

GPU类型

any
|
nvidia
|
amd
|
apple
|
none
Note: Currently only NVIDIA CUDA GPUs are supported.
any
|
nvidia
|
amd
|
apple
|
none
注意: 当前仅支持NVIDIA CUDA GPU。

Categories

分类

image
|
video
|
audio
|
text
|
chat
|
3d
|
other
image
|
video
|
audio
|
text
|
chat
|
3d
|
other

CPU-Only Apps

纯CPU应用

yaml
resources:
  gpu:
    count: 0
    type: none
  ram: 4
yaml
resources:
  gpu:
    count: 0
    type: none
  ram: 4

Dependencies

依赖

Python
requirements.txt
:
torch>=2.0
transformers
accelerate
Node.js
package.json
:
json
{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}
System packages
packages.txt
(apt-installable):
ffmpeg
libgl1-mesa-glx
Python
requirements.txt
:
torch>=2.0
transformers
accelerate
Node.js
package.json
:
json
{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}
系统包
packages.txt
(可通过apt安装):
ffmpeg
libgl1-mesa-glx

Base Images

基础镜像

TypeImage
GPU
docker.inference.sh/gpu:latest-cuda
CPU
docker.inference.sh/cpu:latest
类型镜像
GPU
docker.inference.sh/gpu:latest-cuda
CPU
docker.inference.sh/cpu:latest

Reference Files

参考文件

Load the appropriate reference file based on the language and topic:
根据语言和主题加载对应的参考文件:

App Logic & Schemas

应用逻辑与Schema

  • references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
  • references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns
  • references/python-app-logic.md — Python:Pydantic模型、BaseApp、文件处理、类型注解、多功能模式
  • references/node-app-logic.md — Node.js:Zod schema、文件处理、ESM、生成器、多功能模式

Debugging, Optimization & Cancellation

调试、优化与取消

  • references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
  • references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation
  • references/python-patterns.md — Python:CUDA调试、设备检测、模型加载、内存清理、混合精度、取消操作
  • references/node-patterns.md — Node.js:ESM/导入调试、流式处理、内存管理、并发、取消操作

Secrets & OAuth

密钥与OAuth

  • references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
  • references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON
  • references/python-secrets-oauth.md — Python:os.environ、OpenAI客户端、HuggingFace令牌、Google服务账号
  • references/node-secrets-oauth.md — Node.js:process.env、OpenAI客户端、Google凭证JSON

Usage Tracking

使用追踪

  • references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
  • references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions
  • references/python-tracking.md — Python:OutputMeta、TextMeta、ImageMeta、VideoMeta、AudioMeta类
  • references/node-tracking.md — Node.js:textMeta、imageMeta、videoMeta、audioMeta工厂函数

CLI

CLI

  • references/cli.md — Full CLI command reference, prerequisites for both languages
  • references/cli.md — 完整CLI命令参考、两种语言的前置要求

Resources

资源