python-skills

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Python Skills for LlamaFarm

LlamaFarm的Python最佳实践

Shared Python best practices and code review checklists for all Python components in the LlamaFarm monorepo.
本文分享LlamaFarm单一代码库中所有Python组件的最佳实践及代码审查清单。

Applicable Components

适用组件

ComponentPathPythonKey Dependencies
Server
server/
3.12+FastAPI, Celery, Pydantic, structlog
RAG
rag/
3.11+LlamaIndex, ChromaDB, Celery
Universal Runtime
runtimes/universal/
3.11+PyTorch, transformers, FastAPI
Config
config/
3.11+Pydantic, JSONSchema
Common
common/
3.10+HuggingFace Hub
组件路径Python版本核心依赖
服务端
server/
3.12+FastAPI, Celery, Pydantic, structlog
RAG
rag/
3.11+LlamaIndex, ChromaDB, Celery
通用运行时
runtimes/universal/
3.11+PyTorch, transformers, FastAPI
配置
config/
3.11+Pydantic, JSONSchema
公共模块
common/
3.10+HuggingFace Hub

Quick Reference

快速参考

TopicFileKey Points
Patternspatterns.mdDataclasses, Pydantic, comprehensions, imports
Asyncasync.mdasync/await, asyncio, concurrent execution
Typingtyping.mdType hints, generics, protocols, Pydantic
Testingtesting.mdPytest fixtures, mocking, async tests
Errorserror-handling.mdCustom exceptions, logging, context managers
Securitysecurity.mdPath traversal, injection, secrets, deserialization
主题文件核心要点
设计模式patterns.mdDataclasses、Pydantic、推导式、导入规范
异步编程async.mdasync/await、asyncio、并发执行
类型注解typing.md类型提示、泛型、协议、Pydantic
测试testing.mdPytest fixtures、Mocking、异步测试
错误处理error-handling.md自定义异常、日志、上下文管理器
安全security.md路径遍历、注入攻击、密钥管理、反序列化

Code Style

代码风格

LlamaFarm uses
ruff
with shared configuration in
ruff.toml
:
toml
line-length = 88
target-version = "py311"
select = ["E", "F", "I", "B", "UP", "SIM"]
Key rules:
  • E, F: Core pyflakes and pycodestyle
  • I: Import sorting (isort)
  • B: Bugbear (common pitfalls)
  • UP: Upgrade syntax to modern Python
  • SIM: Simplify code patterns
LlamaFarm使用
ruff
进行代码检查,共享配置文件为
ruff.toml
toml
line-length = 88
target-version = "py311"
select = ["E", "F", "I", "B", "UP", "SIM"]
核心规则:
  • E, F:pyflakes和pycodestyle核心规则
  • I:导入排序(isort)
  • B:Bugbear(常见陷阱检测)
  • UP:升级至现代Python语法
  • SIM:简化代码模式

Architecture Patterns

架构设计模式

Settings with pydantic-settings

基于pydantic-settings的配置管理

python
from pydantic_settings import BaseSettings

class Settings(BaseSettings, env_file=".env"):
    LOG_LEVEL: str = "INFO"
    HOST: str = "0.0.0.0"
    PORT: int = 14345

settings = Settings()  # Singleton at module level
python
from pydantic_settings import BaseSettings

class Settings(BaseSettings, env_file=".env"):
    LOG_LEVEL: str = "INFO"
    HOST: str = "0.0.0.0"
    PORT: int = 14345

settings = Settings()  # Singleton at module level

Structured Logging with structlog

基于structlog的结构化日志

python
from core.logging import FastAPIStructLogger  # Server
from core.logging import RAGStructLogger      # RAG
from core.logging import UniversalRuntimeLogger  # Runtime

logger = FastAPIStructLogger(__name__)
logger.info("Operation completed", extra={"count": 10, "duration_ms": 150})
python
from core.logging import FastAPIStructLogger  # Server
from core.logging import RAGStructLogger      # RAG
from core.logging import UniversalRuntimeLogger  # Runtime

logger = FastAPIStructLogger(__name__)
logger.info("Operation completed", extra={"count": 10, "duration_ms": 150})

Abstract Base Classes for Extensibility

用于扩展的抽象基类

python
from abc import ABC, abstractmethod

class Component(ABC):
    def __init__(self, name: str, config: dict[str, Any] | None = None):
        self.name = name or self.__class__.__name__
        self.config = config or {}

    @abstractmethod
    def process(self, documents: list[Document]) -> ProcessingResult:
        pass
python
from abc import ABC, abstractmethod

class Component(ABC):
    def __init__(self, name: str, config: dict[str, Any] | None = None):
        self.name = name or self.__class__.__name__
        self.config = config or {}

    @abstractmethod
    def process(self, documents: list[Document]) -> ProcessingResult:
        pass

Dataclasses for Internal Data

用于内部数据的Dataclass

python
from dataclasses import dataclass, field

@dataclass
class Document:
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
python
from dataclasses import dataclass, field

@dataclass
class Document:
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)
    id: str = field(default_factory=lambda: str(uuid.uuid4()))

Pydantic Models for API Boundaries

用于API边界的Pydantic模型

python
from pydantic import BaseModel, Field, ConfigDict

class EmbeddingRequest(BaseModel):
    model: str
    input: str | list[str]
    encoding_format: Literal["float", "base64"] | None = "float"

    model_config = ConfigDict(str_strip_whitespace=True)
python
from pydantic import BaseModel, Field, ConfigDict

class EmbeddingRequest(BaseModel):
    model: str
    input: str | list[str]
    encoding_format: Literal["float", "base64"] | None = "float"

    model_config = ConfigDict(str_strip_whitespace=True)

Directory Structure

目录结构

Each Python component follows this structure:
component/
├── pyproject.toml     # UV-managed dependencies
├── core/              # Core functionality
│   ├── __init__.py
│   ├── settings.py    # Pydantic Settings
│   └── logging.py     # structlog setup
├── services/          # Business logic (server)
├── models/            # ML models (runtime)
├── tasks/             # Celery tasks (rag)
├── utils/             # Utility functions
└── tests/
    ├── conftest.py    # Shared fixtures
    └── test_*.py
每个Python组件遵循以下目录结构:
component/
├── pyproject.toml     # 由UV管理的依赖
├── core/              # 核心功能
│   ├── __init__.py
│   ├── settings.py    # Pydantic配置
│   └── logging.py     # structlog配置
├── services/          # 业务逻辑(服务端)
├── models/            # 机器学习模型(运行时)
├── tasks/             # Celery任务(RAG)
├── utils/             # 工具函数
└── tests/
    ├── conftest.py    # 共享fixtures
    └── test_*.py

Review Checklist Summary

代码审查清单摘要

When reviewing Python code in LlamaFarm:
  1. Patterns (Medium priority)
    • Modern Python syntax (3.10+ type hints)
    • Dataclass vs Pydantic used appropriately
    • No mutable default arguments
  2. Async (High priority)
    • No blocking calls in async functions
    • Proper asyncio.Lock usage
    • Cancellation handled correctly
  3. Typing (Medium priority)
    • Complete return type hints
    • Generic types parameterized
    • Pydantic v2 patterns
  4. Testing (Medium priority)
    • Fixtures properly scoped
    • Async tests use pytest-asyncio
    • Mocks cleaned up
  5. Errors (High priority)
    • Custom exceptions with context
    • Structured logging with extra dict
    • Proper exception chaining
  6. Security (Critical priority)
    • Path traversal prevention
    • Input sanitization
    • Safe deserialization
See individual topic files for detailed checklists with grep patterns.
在审查LlamaFarm的Python代码时,请关注以下要点:
  1. 设计模式(中等优先级)
    • 使用现代Python语法(3.10+类型注解)
    • 合理选择Dataclass或Pydantic
    • 避免使用可变默认参数
  2. 异步编程(高优先级)
    • 异步函数中无阻塞调用
    • 正确使用asyncio.Lock
    • 妥善处理任务取消
  3. 类型注解(中等优先级)
    • 完整的返回类型注解
    • 泛型类型已参数化
    • 遵循Pydantic v2规范
  4. 测试(中等优先级)
    • Fixture作用域设置合理
    • 异步测试使用pytest-asyncio
    • 模拟对象已正确清理
  5. 错误处理(高优先级)
    • 包含上下文信息的自定义异常
    • 使用带extra字典的结构化日志
    • 正确的异常链式处理
  6. 安全(最高优先级)
    • 防止路径遍历攻击
    • 输入内容已清理
    • 安全的反序列化操作
各主题的详细清单及grep模式请参考对应主题文件。