vllm-studio-backend

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

vLLM Studio Backend Architecture

vLLM Studio后端架构

Overview

概述

This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.
本技能介绍了后端的架构连接方式:控制器运行时、兼容OpenAI的代理、Pi-mono agent循环、LiteLLM网关以及推理流程管理。

When To Use

适用场景

  • Modifying controller routes or run streaming.
  • Debugging OpenAI-compatible endpoint behavior.
  • Updating Pi-mono agent runtime or tool execution.
  • Understanding how inference + LiteLLM fit together.
  • 修改控制器路由或运行流处理。
  • 调试兼容OpenAI的端点行为。
  • 更新Pi-mono agent运行时或工具执行逻辑。
  • 理解推理流程与LiteLLM的协同工作机制。

Quick Start

快速入门

  • Read
    references/backend-architecture.md
    for the component map and data flow.
  • Read
    references/openai-compat.md
    for
    /v1/models
    and
    /v1/chat/completions
    behavior.
  • Read
    references/backend-commands.md
    for useful run/debug commands.
  • 阅读
    references/backend-architecture.md
    获取组件图和数据流信息。
  • 阅读
    references/openai-compat.md
    了解
    /v1/models
    /v1/chat/completions
    的行为逻辑。
  • 阅读
    references/backend-commands.md
    获取实用的运行/调试命令。

Core Guarantees

核心保障

  • Keep OpenAI-compatible endpoints stable (
    /v1/models
    ,
    /v1/chat/completions
    ).
  • /chat
    UI uses controller run stream (
    /chats/:id/turn
    ) and Pi-mono runtime.
  • Tool execution happens server-side (MCP + AgentFS + plan tools).
  • 保持兼容OpenAI的端点稳定(
    /v1/models
    /v1/chat/completions
    )。
  • /chat
    界面使用控制器运行流(
    /chats/:id/turn
    )和Pi-mono运行时。
  • 工具执行在服务器端进行(MCP + AgentFS + 规划工具)。

References

参考资料

  • references/backend-architecture.md
  • references/openai-compat.md
  • references/backend-commands.md
  • references/backend-architecture.md
  • references/openai-compat.md
  • references/backend-commands.md