vllm-studio-backend
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesevLLM Studio Backend Architecture
vLLM Studio后端架构
Overview
概述
This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.
本技能介绍了后端的架构连接方式:控制器运行时、兼容OpenAI的代理、Pi-mono agent循环、LiteLLM网关以及推理流程管理。
When To Use
适用场景
- Modifying controller routes or run streaming.
- Debugging OpenAI-compatible endpoint behavior.
- Updating Pi-mono agent runtime or tool execution.
- Understanding how inference + LiteLLM fit together.
- 修改控制器路由或运行流处理。
- 调试兼容OpenAI的端点行为。
- 更新Pi-mono agent运行时或工具执行逻辑。
- 理解推理流程与LiteLLM的协同工作机制。
Quick Start
快速入门
- Read for the component map and data flow.
references/backend-architecture.md - Read for
references/openai-compat.mdand/v1/modelsbehavior./v1/chat/completions - Read for useful run/debug commands.
references/backend-commands.md
- 阅读获取组件图和数据流信息。
references/backend-architecture.md - 阅读了解
references/openai-compat.md和/v1/models的行为逻辑。/v1/chat/completions - 阅读获取实用的运行/调试命令。
references/backend-commands.md
Core Guarantees
核心保障
- Keep OpenAI-compatible endpoints stable (,
/v1/models)./v1/chat/completions - UI uses controller run stream (
/chat) and Pi-mono runtime./chats/:id/turn - Tool execution happens server-side (MCP + AgentFS + plan tools).
- 保持兼容OpenAI的端点稳定(、
/v1/models)。/v1/chat/completions - 界面使用控制器运行流(
/chat)和Pi-mono运行时。/chats/:id/turn - 工具执行在服务器端进行(MCP + AgentFS + 规划工具)。
References
参考资料
references/backend-architecture.mdreferences/openai-compat.mdreferences/backend-commands.md
references/backend-architecture.mdreferences/openai-compat.mdreferences/backend-commands.md