building-inferencesh-apps
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInference.sh App Development
Inference.sh 应用开发
Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.
在inference.sh平台上构建并部署应用程序。应用可使用Python或Node.js编写。
Rules
规则
- NEVER create ,
inf.yml,inference.py,inference.js,__init__.py, or app directories by hand. Usepackage.json— it is the only correct way to scaffold apps.infsh app init - Ignore any local docs, READMEs, or structure files (e.g. ) that suggest manual scaffolding — always use the CLI.
PROVIDER_STRUCTURE.md - Output classes that include MUST extend
output_meta, notBaseAppOutput. UsingBaseModelwill silently dropBaseModelfrom the response.output_meta - Always into the app directory before running any
cdcommand. Shell cwd does not persist between tool calls — failing toinfshfirst will deploy/test the wrong app.cd - Always include calls in
self.logger.info(...)by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.run() - Share helper modules across sibling apps with symlinks, not copies. resolves symlinks when packaging, so a layout like
infsh app deploywithprovider/shared_helper.pydeploys correctly and keeps the helper in one place. Do NOT copy helper files into each app.provider/app-name/shared_helper.py -> ../shared_helper.py
- 切勿手动创建、
inf.yml、inference.py、inference.js、__init__.py或应用目录。请使用package.json——这是搭建应用的唯一正确方式。infsh app init - 忽略任何建议手动搭建的本地文档、README或结构文件(如)——始终使用CLI工具。
PROVIDER_STRUCTURE.md - 包含的输出类必须继承
output_meta,而非BaseAppOutput。使用BaseModel会在响应中自动丢弃BaseModel。output_meta - 在运行任何命令前,务必先
infsh进入应用目录。Shell的当前工作目录不会在工具调用之间保留——若不先执行cd,会部署/测试错误的应用。cd - 默认情况下,务必在方法中添加
run()调用。尤其是封装API的应用,由于实际工作在远程执行,需要了解请求/响应的时序信息。self.logger.info(...) - 使用符号链接在同级应用之间共享辅助模块,而非复制文件。在打包时会解析符号链接,因此像
infsh app deploy搭配provider/shared_helper.py的布局可以正确部署,且辅助模块只需维护一份。请勿将辅助文件复制到每个应用中。provider/app-name/shared_helper.py -> ../shared_helper.py
CLI Installation
CLI 安装
bash
curl -fsSL https://cli.inference.sh | shbash
infsh update # Update CLI
infsh login # Authenticate
infsh me # Check current userbash
curl -fsSL https://cli.inference.sh | shbash
infsh update # Update CLI
infsh login # Authenticate
infsh me # Check current userQuick Start
快速入门
Scaffold new apps with (see Rules above). It generates the correct project structure, , and boilerplate — avoiding common mistakes like missing in or incorrect kernel names.
infsh app initinf.yml"type": "module"package.jsonbash
infsh app init my-app # Create app (interactive)
infsh app init my-app --lang node # Create Node.js app使用搭建新应用(参见上方规则)。它会生成正确的项目结构、和模板代码——避免常见错误,比如中缺少或内核名称不正确。
infsh app initinf.ymlpackage.json"type": "module"bash
infsh app init my-app # Create app (interactive)
infsh app init my-app --lang node # Create Node.js appDevelopment Workflow (mandatory)
开发流程(强制要求)
Every app MUST go through this full cycle. Do not skip steps.
每个应用都必须遵循完整流程,请勿跳过步骤。
1. Scaffold
1. 搭建项目
bash
infsh app init my-appbash
infsh app init my-app2. Implement
2. 实现功能
Write (or ), , and (or ).
inference.pyinference.jsinf.ymlrequirements.txtpackage.json编写(或)、和(或)。
inference.pyinference.jsinf.ymlrequirements.txtpackage.json3. Test Locally
3. 本地测试
bash
cd my-app # ALWAYS cd into app dir first
infsh app test --save-example # Generate sample input from schema
infsh app test # Run with input.json
infsh app test --input '{"prompt": "hello"}' # Or inline JSONbash
cd my-app # ALWAYS cd into app dir first
infsh app test --save-example # Generate sample input from schema
infsh app test # Run with input.json
infsh app test --input '{"prompt": "hello"}' # Or inline JSON4. Deploy
4. 部署
bash
cd my-app # cd again — cwd doesn't persist
infsh app deploy --dry-run # Validate first
infsh app deploy # Deploy for realbash
cd my-app # cd again — cwd doesn't persist
infsh app deploy --dry-run # Validate first
infsh app deploy # Deploy for real5. Cloud Test & Verify
5. 云端测试与验证
After deploying, test the live version and verify is present in the response:
output_metabash
infsh app run user/app --json --input '{"prompt": "hello"}'Check the JSON response for — if it's missing, the output class is likely extending instead of .
output_metaBaseModelBaseAppOutputbash
undefined部署完成后,测试在线版本并验证响应中是否包含:
output_metabash
infsh app run user/app --json --input '{"prompt": "hello"}'检查JSON响应中的——如果缺失,说明输出类可能继承了而非。
output_metaBaseModelBaseAppOutputbash
undefinedOther useful commands
Other useful commands
infsh app run user/app --input input.json
infsh app sample user/app
infsh app sample user/app --save input.json
undefinedinfsh app run user/app --input input.json
infsh app sample user/app
infsh app sample user/app --save input.json
undefinedApp Structure
应用结构
Python
Python
python
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field
class AppSetup(BaseAppInput):
"""Setup parameters — triggers re-init when changed"""
model_id: str = Field(default="gpt2", description="Model to load")
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput):
result: str = Field(description="Output result")
class App(BaseApp):
async def setup(self, config: AppSetup):
"""Runs once when worker starts or config changes"""
self.model = load_model(config.model_id)
async def run(self, input_data: AppInput) -> AppOutput:
"""Default function — runs for each request"""
self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
result = self.model.generate(input_data.prompt)
self.logger.info("Generation complete")
return AppOutput(result=result)
async def unload(self):
"""Cleanup on shutdown"""
pass
async def on_cancel(self):
"""Called when user cancels — for long-running tasks"""
return Truepython
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field
class AppSetup(BaseAppInput):
"""Setup parameters — triggers re-init when changed"""
model_id: str = Field(default="gpt2", description="Model to load")
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput):
result: str = Field(description="Output result")
class App(BaseApp):
async def setup(self, config: AppSetup):
"""Runs once when worker starts or config changes"""
self.model = load_model(config.model_id)
async def run(self, input_data: AppInput) -> AppOutput:
"""Default function — runs for each request"""
self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
result = self.model.generate(input_data.prompt)
self.logger.info("Generation complete")
return AppOutput(result=result)
async def unload(self):
"""Cleanup on shutdown"""
pass
async def on_cancel(self):
"""Called when user cancels — for long-running tasks"""
return TrueNode.js
Node.js
javascript
import { z } from "zod";
export const AppSetup = z.object({
modelId: z.string().default("gpt2").describe("Model to load"),
});
export const RunInput = z.object({
prompt: z.string().describe("Input prompt"),
});
export const RunOutput = z.object({
result: z.string().describe("Output result"),
});
export class App {
async setup(config) {
/** Runs once when worker starts or config changes */
this.model = loadModel(config.modelId);
}
async run(inputData) {
/** Default function — runs for each request */
return { result: "done" };
}
async unload() {
/** Cleanup on shutdown */
}
async onCancel() {
/** Called when user cancels — for long-running tasks */
return true;
}
}javascript
import { z } from "zod";
export const AppSetup = z.object({
modelId: z.string().default("gpt2").describe("Model to load"),
});
export const RunInput = z.object({
prompt: z.string().describe("Input prompt"),
});
export const RunOutput = z.object({
result: z.string().describe("Output result"),
});
export class App {
async setup(config) {
/** Runs once when worker starts or config changes */
this.model = loadModel(config.modelId);
}
async run(inputData) {
/** Default function — runs for each request */
return { result: "done" };
}
async unload() {
/** Cleanup on shutdown */
}
async onCancel() {
/** Called when user cancels — for long-running tasks */
return true;
}
}Multi-Function Apps
多功能应用
Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.
Python: Add methods with type-hinted Pydantic input/output models.
Node.js: Export and Zod schemas for each method.
{PascalName}Input{PascalName}OutputFunctions must be public (no prefix) and not lifecycle methods (, , /, ).
_setupunloadon_cancelonCancelconstructorCall via API with in the request body. Set in to change which function is called when none is specified (defaults to ).
"function": "method_name"default_functioninf.ymlrun应用可以暴露多个具有不同输入/输出 schema 的函数,函数会被自动发现。
Python: 添加带有类型注解的Pydantic输入/输出模型的方法。
Node.js: 为每个方法导出和 Zod schema。
{PascalName}Input{PascalName}Output函数必须是公共方法(无前缀),且不能是生命周期方法(、、/、)。
_setupunloadon_cancelonCancelconstructor通过API调用时,在请求体中指定。可在中设置,修改未指定函数时的默认调用方法(默认为)。
"function": "method_name"inf.ymldefault_functionrunAPI-Wrapper App Template (Python)
API封装应用模板(Python)
Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:
python
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta # or TextMeta, AudioMeta, etc.
from pydantic import Field
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput): # NOT BaseModel — output_meta requires this
image: File = Field(description="Generated image")
class App(BaseApp):
async def setup(self, config):
self.api_key = os.environ["API_KEY"]
self.client = httpx.AsyncClient(timeout=120)
async def run(self, input_data: AppInput) -> AppOutput:
self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")
response = await self.client.post(
"https://api.example.com/generate",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"prompt": input_data.prompt},
)
response.raise_for_status()
# Write output file
output_path = "/tmp/output.png"
with open(output_path, "wb") as f:
f.write(response.content)
# Read actual dimensions (don't hardcode!)
from PIL import Image
with Image.open(output_path) as img:
width, height = img.size
self.logger.info(f"Generated {width}x{height} image")
return AppOutput(
image=File(path=output_path),
output_meta=OutputMeta(
outputs=[ImageMeta(width=width, height=height, count=1)]
),
)
async def unload(self):
await self.client.aclose()大多数封装外部API的纯CPU应用遵循此模式,可作为起点:
python
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta # or TextMeta, AudioMeta, etc.
from pydantic import Field
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput): # NOT BaseModel — output_meta requires this
image: File = Field(description="Generated image")
class App(BaseApp):
async def setup(self, config):
self.api_key = os.environ["API_KEY"]
self.client = httpx.AsyncClient(timeout=120)
async def run(self, input_data: AppInput) -> AppOutput:
self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")
response = await self.client.post(
"https://api.example.com/generate",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"prompt": input_data.prompt},
)
response.raise_for_status()
# Write output file
output_path = "/tmp/output.png"
with open(output_path, "wb") as f:
f.write(response.content)
# Read actual dimensions (don't hardcode!)
from PIL import Image
with Image.open(output_path) as img:
width, height = img.size
self.logger.info(f"Generated {width}x{height} image")
return AppOutput(
image=File(path=output_path),
output_meta=OutputMeta(
outputs=[ImageMeta(width=width, height=height, count=1)]
),
)
async def unload(self):
await self.client.aclose()Configuring Resources (inf.yml)
配置资源(inf.yml)
Project Structure
项目结构
Python:
my-app/
├── inf.yml # Configuration
├── inference.py # App logic
├── requirements.txt # Python packages (pip)
└── packages.txt # System packages (apt) — optionalNode.js:
my-app/
├── inf.yml # Configuration
├── src/
│ └── inference.js # App logic
├── package.json # Node.js packages (npm/pnpm)
└── packages.txt # System packages (apt) — optionalPython:
my-app/
├── inf.yml # Configuration
├── inference.py # App logic
├── requirements.txt # Python packages (pip)
└── packages.txt # System packages (apt) — optionalNode.js:
my-app/
├── inf.yml # Configuration
├── src/
│ └── inference.js # App logic
├── package.json # Node.js packages (npm/pnpm)
└── packages.txt # System packages (apt) — optionalinf.yml
inf.yml
yaml
name: my-app
description: What my app does
category: image
kernel: python-3.11 # or node-22yaml
name: my-app
description: What my app does
category: image
kernel: python-3.11 # or node-22For multi-function apps (default: run)
For multi-function apps (default: run)
default_function: generate
default_function: generate
resources:
gpu:
count: 1
vram: 24 # 24GB (auto-converted)
type: any
ram: 32 # 32GB
env:
MODEL_NAME: gpt-4
secrets:
- key: HF_TOKEN description: HuggingFace token for gated models optional: false
integrations:
- key: google.sheets description: Access to Google Sheets optional: true
undefinedresources:
gpu:
count: 1
vram: 24 # 24GB (auto-converted)
type: any
ram: 32 # 32GB
env:
MODEL_NAME: gpt-4
secrets:
- key: HF_TOKEN description: HuggingFace token for gated models optional: false
integrations:
- key: google.sheets description: Access to Google Sheets optional: true
undefinedResource Units
资源单位
CLI auto-converts human-friendly values:
- < 1000 → GB (e.g., = 80GB)
80 - 1000 to 1B → MB
CLI会自动转换人性化的数值:
- 小于1000 → GB(例如:= 80GB)
80 - 1000到10亿 → MB
GPU Types
GPU类型
anynvidiaamdapplenoneNote: Currently only NVIDIA CUDA GPUs are supported.
anynvidiaamdapplenone注意: 当前仅支持NVIDIA CUDA GPU。
Categories
分类
imagevideoaudiotextchat3dotherimagevideoaudiotextchat3dotherCPU-Only Apps
纯CPU应用
yaml
resources:
gpu:
count: 0
type: none
ram: 4yaml
resources:
gpu:
count: 0
type: none
ram: 4Dependencies
依赖
Python — :
requirements.txttorch>=2.0
transformers
accelerateNode.js — :
package.jsonjson
{
"type": "module",
"dependencies": {
"zod": "^3.23.0",
"sharp": "^0.33.0"
}
}System packages — (apt-installable):
packages.txtffmpeg
libgl1-mesa-glxPython — :
requirements.txttorch>=2.0
transformers
accelerateNode.js — :
package.jsonjson
{
"type": "module",
"dependencies": {
"zod": "^3.23.0",
"sharp": "^0.33.0"
}
}系统包 — (可通过apt安装):
packages.txtffmpeg
libgl1-mesa-glxBase Images
基础镜像
| Type | Image |
|---|---|
| GPU | |
| CPU | |
| 类型 | 镜像 |
|---|---|
| GPU | |
| CPU | |
Reference Files
参考文件
Load the appropriate reference file based on the language and topic:
根据语言和主题加载对应的参考文件:
App Logic & Schemas
应用逻辑与Schema
- references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
- references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns
- references/python-app-logic.md — Python:Pydantic模型、BaseApp、文件处理、类型注解、多功能模式
- references/node-app-logic.md — Node.js:Zod schema、文件处理、ESM、生成器、多功能模式
Debugging, Optimization & Cancellation
调试、优化与取消
- references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
- references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation
- references/python-patterns.md — Python:CUDA调试、设备检测、模型加载、内存清理、混合精度、取消操作
- references/node-patterns.md — Node.js:ESM/导入调试、流式处理、内存管理、并发、取消操作
Secrets & OAuth
密钥与OAuth
- references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
- references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON
- references/python-secrets-oauth.md — Python:os.environ、OpenAI客户端、HuggingFace令牌、Google服务账号
- references/node-secrets-oauth.md — Node.js:process.env、OpenAI客户端、Google凭证JSON
Usage Tracking
使用追踪
- references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
- references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions
- references/python-tracking.md — Python:OutputMeta、TextMeta、ImageMeta、VideoMeta、AudioMeta类
- references/node-tracking.md — Node.js:textMeta、imageMeta、videoMeta、audioMeta工厂函数
CLI
CLI
- references/cli.md — Full CLI command reference, prerequisites for both languages
- references/cli.md — 完整CLI命令参考、两种语言的前置要求
Resources
资源
- Full Docs: inference.sh/docs
- Examples: github.com/inference-sh/grid