building-inferencesh-apps

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Inference.sh App Development

Inference.sh 应用开发

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.

在inference.sh平台上构建并部署应用程序。应用可使用Python或Node.js编写。

Rules

规则

NEVER create
```
inf.yml
```
,
```
inference.py
```
,
```
inference.js
```
,
```
__init__.py
```
,
```
package.json
```
, or app directories by hand. Use
```
infsh app init
```
— it is the only correct way to scaffold apps.
Ignore any local docs, READMEs, or structure files (e.g.
```
PROVIDER_STRUCTURE.md
```
) that suggest manual scaffolding — always use the CLI.
Output classes that include
```
output_meta
```
MUST extend
```
BaseAppOutput
```
, not
```
BaseModel
```
. Using
```
BaseModel
```
will silently drop
```
output_meta
```
from the response.
Always
```
cd
```
into the app directory before running any
```
infsh
```
command. Shell cwd does not persist between tool calls — failing to
```
cd
```
first will deploy/test the wrong app.
Always include
```
self.logger.info(...)
```
calls in
```
run()
```
by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.
Share helper modules across sibling apps with symlinks, not copies.
```
infsh app deploy
```
resolves symlinks when packaging, so a layout like
```
provider/shared_helper.py
```
with
```
provider/app-name/shared_helper.py -> ../shared_helper.py
```
deploys correctly and keeps the helper in one place. Do NOT copy helper files into each app.

切勿手动创建
```
inf.yml
```
、
```
inference.py
```
、
```
inference.js
```
、
```
__init__.py
```
、
```
package.json
```
或应用目录。请使用
```
infsh app init
```
——这是搭建应用的唯一正确方式。
忽略任何建议手动搭建的本地文档、README或结构文件（如
```
PROVIDER_STRUCTURE.md
```
）——始终使用CLI工具。
包含
```
output_meta
```
的输出类必须继承
```
BaseAppOutput
```
，而非
```
BaseModel
```
。使用
```
BaseModel
```
会在响应中自动丢弃
```
output_meta
```
。
在运行任何
```
infsh
```
命令前，务必先
```
cd
```
进入应用目录。Shell的当前工作目录不会在工具调用之间保留——若不先执行
```
cd
```
，会部署/测试错误的应用。
默认情况下，务必在
```
run()
```
方法中添加
```
self.logger.info(...)
```
调用。尤其是封装API的应用，由于实际工作在远程执行，需要了解请求/响应的时序信息。
使用符号链接在同级应用之间共享辅助模块，而非复制文件。
```
infsh app deploy
```
在打包时会解析符号链接，因此像
```
provider/shared_helper.py
```
搭配
```
provider/app-name/shared_helper.py -> ../shared_helper.py
```
的布局可以正确部署，且辅助模块只需维护一份。请勿将辅助文件复制到每个应用中。

CLI Installation

CLI 安装

bash

curl -fsSL https://cli.inference.sh | sh

bash

infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user

bash

curl -fsSL https://cli.inference.sh | sh

bash

infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user

Quick Start

快速入门

Scaffold new apps with

infsh app init

(see Rules above). It generates the correct project structure,

inf.yml

, and boilerplate — avoiding common mistakes like missing

"type": "module"

package.json

or incorrect kernel names.

bash

infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app

使用

infsh app init

搭建新应用（参见上方规则）。它会生成正确的项目结构、

inf.yml

和模板代码——避免常见错误，比如

package.json

中缺少

"type": "module"

或内核名称不正确。

bash

infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app

Development Workflow (mandatory)

开发流程（强制要求）

Every app MUST go through this full cycle. Do not skip steps.

每个应用都必须遵循完整流程，请勿跳过步骤。

1. Scaffold

1. 搭建项目

bash

infsh app init my-app

bash

infsh app init my-app

2. Implement

2. 实现功能

Write

inference.py

(or

inference.js

inf.yml

, and

requirements.txt

(or

package.json

编写

inference.py

（或

inference.js

）、

inf.yml

和

requirements.txt

（或

package.json

）。

3. Test Locally

3. 本地测试

bash

cd my-app                          # ALWAYS cd into app dir first
infsh app test --save-example      # Generate sample input from schema
infsh app test                     # Run with input.json
infsh app test --input '{"prompt": "hello"}'  # Or inline JSON

bash

cd my-app                          # ALWAYS cd into app dir first
infsh app test --save-example      # Generate sample input from schema
infsh app test                     # Run with input.json
infsh app test --input '{"prompt": "hello"}'  # Or inline JSON

4. Deploy

4. 部署

bash

cd my-app                          # cd again — cwd doesn't persist
infsh app deploy --dry-run         # Validate first
infsh app deploy                   # Deploy for real

bash

cd my-app                          # cd again — cwd doesn't persist
infsh app deploy --dry-run         # Validate first
infsh app deploy                   # Deploy for real

5. Cloud Test & Verify

5. 云端测试与验证

After deploying, test the live version and verify

output_meta

is present in the response:

bash

infsh app run user/app --json --input '{"prompt": "hello"}'

Check the JSON response for

output_meta

— if it's missing, the output class is likely extending

BaseModel

instead of

BaseAppOutput

bash

undefined

部署完成后，测试在线版本并验证响应中是否包含

output_meta

：

bash

infsh app run user/app --json --input '{"prompt": "hello"}'

检查JSON响应中的

output_meta

——如果缺失，说明输出类可能继承了

BaseModel

而非

BaseAppOutput

。

bash

undefined

Other useful commands

infsh app run user/app --input input.json infsh app sample user/app infsh app sample user/app --save input.json

undefined

infsh app run user/app --input input.json infsh app sample user/app infsh app sample user/app --save input.json

undefined

App Structure

应用结构

Python

python

from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

python

from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

Node.js

javascript

import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

javascript

import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

Multi-Function Apps

多功能应用

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.

Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export

{PascalName}Input

and

{PascalName}Output

Zod schemas for each method.

Functions must be public (no

prefix) and not lifecycle methods (

setup

unload

on_cancel

onCancel

constructor

Call via API with

"function": "method_name"

in the request body. Set

default_function

inf.yml

to change which function is called when none is specified (defaults to

run

应用可以暴露多个具有不同输入/输出 schema 的函数，函数会被自动发现。

Python： 添加带有类型注解的Pydantic输入/输出模型的方法。 Node.js： 为每个方法导出

{PascalName}Input

和

{PascalName}Output

Zod schema。

函数必须是公共方法（无前缀

），且不能是生命周期方法（

setup

、

unload

、

on_cancel

onCancel

、

constructor

）。

通过API调用时，在请求体中指定

"function": "method_name"

。可在

inf.yml

中设置

default_function

，修改未指定函数时的默认调用方法（默认为

run

）。

API-Wrapper App Template (Python)

API封装应用模板（Python）

Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:

python

import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()

大多数封装外部API的纯CPU应用遵循此模式，可作为起点：

python

import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()

Configuring Resources (inf.yml)

配置资源（inf.yml）

Project Structure

项目结构

Python:

my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional

Node.js:

my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

Python：

my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional

Node.js：

my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

inf.yml

yaml

name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

yaml

name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

For multi-function apps (default: run)

default_function: generate

resources: gpu: count: 1 vram: 24 # 24GB (auto-converted) type: any ram: 32 # 32GB

env: MODEL_NAME: gpt-4

secrets:

key: HF_TOKEN description: HuggingFace token for gated models optional: false

integrations:

key: google.sheets description: Access to Google Sheets optional: true

undefined

resources: gpu: count: 1 vram: 24 # 24GB (auto-converted) type: any ram: 32 # 32GB

env: MODEL_NAME: gpt-4

secrets:

key: HF_TOKEN description: HuggingFace token for gated models optional: false

integrations:

key: google.sheets description: Access to Google Sheets optional: true

undefined

Resource Units

资源单位

CLI auto-converts human-friendly values:

< 1000 → GB (e.g.,
```
80
```
= 80GB)
1000 to 1B → MB

CLI会自动转换人性化的数值：

小于1000 → GB（例如：
```
80
```
= 80GB）
1000到10亿 → MB

GPU Types

GPU类型

any

nvidia

amd

apple

none

Note: Currently only NVIDIA CUDA GPUs are supported.

any

nvidia

amd

apple

none

注意： 当前仅支持NVIDIA CUDA GPU。

分类

image

video

audio

text

chat

3d

other

image

video

audio

text

chat

3d

other

CPU-Only Apps

纯CPU应用

yaml

resources:
  gpu:
    count: 0
    type: none
  ram: 4

yaml

resources:
  gpu:
    count: 0
    type: none
  ram: 4

Dependencies

依赖

Python —

requirements.txt

torch>=2.0
transformers
accelerate

Node.js —

package.json

json

{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}

System packages —

packages.txt

(apt-installable):

ffmpeg
libgl1-mesa-glx

Python —

requirements.txt

torch>=2.0
transformers
accelerate

Node.js —

package.json

json

{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}

系统包 —

packages.txt

（可通过apt安装）：

ffmpeg
libgl1-mesa-glx

Base Images

基础镜像

Type	Image
GPU	`docker.inference.sh/gpu:latest-cuda`
CPU	`docker.inference.sh/cpu:latest`

类型	镜像
GPU	`docker.inference.sh/gpu:latest-cuda`
CPU	`docker.inference.sh/cpu:latest`

Reference Files

参考文件

Load the appropriate reference file based on the language and topic:

根据语言和主题加载对应的参考文件：

App Logic & Schemas

应用逻辑与Schema

references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns

references/python-app-logic.md — Python：Pydantic模型、BaseApp、文件处理、类型注解、多功能模式
references/node-app-logic.md — Node.js：Zod schema、文件处理、ESM、生成器、多功能模式

Debugging, Optimization & Cancellation

调试、优化与取消

references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation

references/python-patterns.md — Python：CUDA调试、设备检测、模型加载、内存清理、混合精度、取消操作
references/node-patterns.md — Node.js：ESM/导入调试、流式处理、内存管理、并发、取消操作

Secrets & OAuth

密钥与OAuth

references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON

references/python-secrets-oauth.md — Python：os.environ、OpenAI客户端、HuggingFace令牌、Google服务账号
references/node-secrets-oauth.md — Node.js：process.env、OpenAI客户端、Google凭证JSON

Usage Tracking

使用追踪

references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions

references/python-tracking.md — Python：OutputMeta、TextMeta、ImageMeta、VideoMeta、AudioMeta类
references/node-tracking.md — Node.js：textMeta、imageMeta、videoMeta、audioMeta工厂函数

CLI

references/cli.md — Full CLI command reference, prerequisites for both languages

references/cli.md — 完整CLI命令参考、两种语言的前置要求

Resources

资源

Full Docs: inference.sh/docs
Examples: github.com/inference-sh/grid

完整文档：inference.sh/docs
示例：github.com/inference-sh/grid

building-inferencesh-apps

Original

Translation

Inference.sh App Development

Inference.sh 应用开发

Rules

规则

CLI Installation

CLI 安装

Quick Start

快速入门

Development Workflow (mandatory)

开发流程（强制要求）

1. Scaffold

1. 搭建项目

2. Implement

2. 实现功能

3. Test Locally

3. 本地测试

4. Deploy

4. 部署

5. Cloud Test & Verify

5. 云端测试与验证

Other useful commands

Other useful commands

App Structure

应用结构

Python

Python

Node.js

Node.js

Multi-Function Apps

多功能应用

API-Wrapper App Template (Python)

API封装应用模板（Python）

Configuring Resources (inf.yml)

配置资源（inf.yml）

Project Structure

项目结构

inf.yml

inf.yml

For multi-function apps (default: run)

For multi-function apps (default: run)

default_function: generate

default_function: generate

Resource Units

资源单位

GPU Types

GPU类型

Categories

分类

CPU-Only Apps

纯CPU应用

Dependencies

依赖

Base Images

基础镜像

Reference Files

参考文件

App Logic & Schemas

应用逻辑与Schema

Debugging, Optimization & Cancellation

调试、优化与取消

Secrets & OAuth

密钥与OAuth

Usage Tracking

使用追踪

CLI

CLI

Resources

资源