vllm-studio-backend

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

vLLM Studio Backend Architecture

vLLM Studio后端架构

Overview

概述

This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.

本技能介绍了后端的架构连接方式：控制器运行时、兼容OpenAI的代理、Pi-mono agent循环、LiteLLM网关以及推理流程管理。

When To Use

适用场景

Modifying controller routes or run streaming.
Debugging OpenAI-compatible endpoint behavior.
Updating Pi-mono agent runtime or tool execution.
Understanding how inference + LiteLLM fit together.

修改控制器路由或运行流处理。
调试兼容OpenAI的端点行为。
更新Pi-mono agent运行时或工具执行逻辑。
理解推理流程与LiteLLM的协同工作机制。

Quick Start

快速入门

Read
```
references/backend-architecture.md
```
for the component map and data flow.

Read

references/openai-compat.md

for

/v1/models

and

/v1/chat/completions

behavior.

Read
```
references/backend-commands.md
```
for useful run/debug commands.

阅读
```
references/backend-architecture.md
```
获取组件图和数据流信息。

阅读

references/openai-compat.md

了解

/v1/models

和

/v1/chat/completions

的行为逻辑。

阅读
```
references/backend-commands.md
```
获取实用的运行/调试命令。

Core Guarantees

核心保障

Keep OpenAI-compatible endpoints stable (
```
/v1/models
```
,
```
/v1/chat/completions
```
).
```
/chat
```
UI uses controller run stream (
```
/chats/:id/turn
```
) and Pi-mono runtime.
Tool execution happens server-side (MCP + AgentFS + plan tools).

保持兼容OpenAI的端点稳定（
```
/v1/models
```
、
```
/v1/chat/completions
```
）。
```
/chat
```
界面使用控制器运行流（
```
/chats/:id/turn
```
）和Pi-mono运行时。
工具执行在服务器端进行（MCP + AgentFS + 规划工具）。

References

参考资料

```
references/backend-architecture.md
```
```
references/openai-compat.md
```
```
references/backend-commands.md
```

```
references/backend-architecture.md
```
```
references/openai-compat.md
```
```
references/backend-commands.md
```