phoenix-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Phoenix - AI Observability Platform

Phoenix - AI可观测性平台

Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.
面向LLM应用的开源AI可观测性与评估平台,提供追踪、评估、数据集、实验和实时监控功能。

When to use Phoenix

适用场景

Use Phoenix when:
  • Debugging LLM application issues with detailed traces
  • Running systematic evaluations on datasets
  • Monitoring production LLM systems in real-time
  • Building experiment pipelines for prompt/model comparison
  • Self-hosted observability without vendor lock-in
Key features:
  • Tracing: OpenTelemetry-based trace collection for any LLM framework
  • Evaluation: LLM-as-judge evaluators for quality assessment
  • Datasets: Versioned test sets for regression testing
  • Experiments: Compare prompts, models, and configurations
  • Playground: Interactive prompt testing with multiple models
  • Open-source: Self-hosted with PostgreSQL or SQLite
Use alternatives instead:
  • LangSmith: Managed platform with LangChain-first integration
  • Weights & Biases: Deep learning experiment tracking focus
  • Arize Cloud: Managed Phoenix with enterprise features
  • MLflow: General ML lifecycle, model registry focus
以下场景推荐使用Phoenix:
  • 通过详细追踪调试LLM应用问题
  • 在数据集上开展系统性评估
  • 实时监控生产环境LLM系统
  • 构建用于提示词/模型对比的实验流水线
  • 无需供应商锁定的自托管可观测性方案
核心功能:
  • 追踪:基于OpenTelemetry的追踪采集,支持任意LLM框架
  • 评估:以LLM作为评判者的质量评估器
  • 数据集:带版本管理的测试集,用于回归测试
  • 实验:对比不同提示词、模型和配置
  • 交互式测试台:支持多模型的交互式提示词测试
  • 开源:支持PostgreSQL或SQLite的自托管部署
以下场景可选择替代方案:
  • LangSmith:以LangChain优先集成的托管平台
  • Weights & Biases:专注深度学习实验追踪
  • Arize Cloud:具备企业级功能的托管版Phoenix
  • MLflow:通用机器学习生命周期管理,聚焦模型注册

Quick start

快速开始

Installation

安装

bash
pip install arize-phoenix
bash
pip install arize-phoenix

With specific backends

安装特定后端依赖

pip install arize-phoenix[embeddings] # Embedding analysis pip install arize-phoenix-otel # OpenTelemetry config pip install arize-phoenix-evals # Evaluation framework pip install arize-phoenix-client # Lightweight REST client
undefined
pip install arize-phoenix[embeddings] # 嵌入分析 pip install arize-phoenix-otel # OpenTelemetry配置 pip install arize-phoenix-evals # 评估框架 pip install arize-phoenix-client # 轻量REST客户端
undefined

Launch Phoenix server

启动Phoenix服务器

python
import phoenix as px
python
import phoenix as px

Launch in notebook (ThreadServer mode)

在Notebook中启动(ThreadServer模式)

session = px.launch_app()
session = px.launch_app()

View UI

查看UI

session.view() # Embedded iframe print(session.url) # http://localhost:6006
undefined
session.view() # 嵌入式iframe print(session.url) # http://localhost:6006
undefined

Command-line server (production)

命令行服务器(生产环境)

bash
undefined
bash
undefined

Start Phoenix server

启动Phoenix服务器

phoenix serve
phoenix serve

With PostgreSQL

搭配PostgreSQL使用

export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db" phoenix serve --port 6006
undefined
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db" phoenix serve --port 6006
undefined

Basic tracing

基础追踪

python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

Configure OpenTelemetry with Phoenix

配置Phoenix与OpenTelemetry

tracer_provider = register( project_name="my-llm-app", endpoint="http://localhost:6006/v1/traces" )
tracer_provider = register( project_name="my-llm-app", endpoint="http://localhost:6006/v1/traces" )

Instrument OpenAI SDK

为OpenAI SDK埋点

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

All OpenAI calls are now traced

所有OpenAI调用现在都会被追踪

from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )
undefined
from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )
undefined

Core concepts

核心概念

Traces and spans

追踪与跨度

A trace represents a complete execution flow, while spans are individual operations within that trace.
python
from phoenix.otel import register
from opentelemetry import trace
**追踪(trace)代表完整的执行流程,而跨度(span)**是追踪中的单个操作。
python
from phoenix.otel import register
from opentelemetry import trace

Setup tracing

配置追踪

tracer_provider = register(project_name="my-app") tracer = trace.get_tracer(name)
tracer_provider = register(project_name="my-app") tracer = trace.get_tracer(name)

Create custom spans

创建自定义跨度

with tracer.start_as_current_span("process_query") as span: span.set_attribute("input.value", query)
# Child spans are automatically nested
with tracer.start_as_current_span("retrieve_context"):
    context = retriever.search(query)

with tracer.start_as_current_span("generate_response"):
    response = llm.generate(query, context)

span.set_attribute("output.value", response)
undefined
with tracer.start_as_current_span("process_query") as span: span.set_attribute("input.value", query)
# 子跨度会自动嵌套
with tracer.start_as_current_span("retrieve_context"):
    context = retriever.search(query)

with tracer.start_as_current_span("generate_response"):
    response = llm.generate(query, context)

span.set_attribute("output.value", response)
undefined

Projects

项目

Projects organize related traces:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"
项目用于组织相关的追踪数据:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"

Or per-trace

或为单个追踪指定

from phoenix.otel import register tracer_provider = register(project_name="experiment-v2")
undefined
from phoenix.otel import register tracer_provider = register(project_name="experiment-v2")
undefined

Framework instrumentation

框架埋点

OpenAI

OpenAI

python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

LangChain

LangChain

python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor

tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)
python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor

tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)

All LangChain operations traced

所有LangChain操作都会被追踪

from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o") response = llm.invoke("Hello!")
undefined
from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o") response = llm.invoke("Hello!")
undefined

LlamaIndex

LlamaIndex

python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

Anthropic

Anthropic

python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor

tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)
python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor

tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)

Evaluation framework

评估框架

Built-in evaluators

内置评估器

python
from phoenix.evals import (
    OpenAIModel,
    HallucinationEvaluator,
    RelevanceEvaluator,
    ToxicityEvaluator,
    llm_classify
)
python
from phoenix.evals import (
    OpenAIModel,
    HallucinationEvaluator,
    RelevanceEvaluator,
    ToxicityEvaluator,
    llm_classify
)

Setup model for evaluation

配置评估用模型

eval_model = OpenAIModel(model="gpt-4o")
eval_model = OpenAIModel(model="gpt-4o")

Evaluate hallucination

评估幻觉情况

hallucination_eval = HallucinationEvaluator(eval_model) results = hallucination_eval.evaluate( input="What is the capital of France?", output="The capital of France is Paris.", reference="Paris is the capital of France." )
undefined
hallucination_eval = HallucinationEvaluator(eval_model) results = hallucination_eval.evaluate( input="What is the capital of France?", output="The capital of France is Paris.", reference="Paris is the capital of France." )
undefined

Custom evaluators

自定义评估器

python
from phoenix.evals import llm_classify
python
from phoenix.evals import llm_classify

Define custom evaluation

定义自定义评估逻辑

def evaluate_helpfulness(input_text, output_text): template = """ Evaluate if the response is helpful for the given question.
Question: {input}
Response: {output}

Is this response helpful? Answer 'helpful' or 'not_helpful'.
"""

result = llm_classify(
    model=eval_model,
    template=template,
    input=input_text,
    output=output_text,
    rails=["helpful", "not_helpful"]
)
return result
undefined
def evaluate_helpfulness(input_text, output_text): template = """ 评估给定问题的回复是否有帮助。
问题: {input}
回复: {output}

该回复是否有帮助?请回答'helpful'或'not_helpful'。
"""

result = llm_classify(
    model=eval_model,
    template=template,
    input=input_text,
    output=output_text,
    rails=["helpful", "not_helpful"]
)
return result
undefined

Run evaluations on dataset

在数据集上运行评估

python
from phoenix import Client
from phoenix.evals import run_evals

client = Client()
python
from phoenix import Client
from phoenix.evals import run_evals

client = Client()

Get spans to evaluate

获取待评估的跨度数据

spans_df = client.get_spans_dataframe( project_name="my-app", filter_condition="span_kind == 'LLM'" )
spans_df = client.get_spans_dataframe( project_name="my-app", filter_condition="span_kind == 'LLM'" )

Run evaluations

运行评估

eval_results = run_evals( dataframe=spans_df, evaluators=[ HallucinationEvaluator(eval_model), RelevanceEvaluator(eval_model) ], provide_explanation=True )
eval_results = run_evals( dataframe=spans_df, evaluators=[ HallucinationEvaluator(eval_model), RelevanceEvaluator(eval_model) ], provide_explanation=True )

Log results back to Phoenix

将评估结果记录回Phoenix

client.log_evaluations(eval_results)
undefined
client.log_evaluations(eval_results)
undefined

Datasets and experiments

数据集与实验

Create dataset

创建数据集

python
from phoenix import Client

client = Client()
python
from phoenix import Client

client = Client()

Create dataset

创建数据集

dataset = client.create_dataset( name="qa-test-set", description="QA evaluation dataset" )
dataset = client.create_dataset( name="qa-test-set", description="QA评估数据集" )

Add examples

添加示例

client.add_examples_to_dataset( dataset_name="qa-test-set", examples=[ { "input": {"question": "What is Python?"}, "output": {"answer": "A programming language"} }, { "input": {"question": "What is ML?"}, "output": {"answer": "Machine learning"} } ] )
undefined
client.add_examples_to_dataset( dataset_name="qa-test-set", examples=[ { "input": {"question": "What is Python?"}, "output": {"answer": "A programming language"} }, { "input": {"question": "What is ML?"}, "output": {"answer": "Machine learning"} } ] )
undefined

Run experiment

运行实验

python
from phoenix import Client
from phoenix.experiments import run_experiment

client = Client()

def my_model(input_data):
    """Your model function."""
    question = input_data["question"]
    return {"answer": generate_answer(question)}

def accuracy_evaluator(input_data, output, expected):
    """Custom evaluator."""
    return {
        "score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
        "label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
    }
python
from phoenix import Client
from phoenix.experiments import run_experiment

client = Client()

def my_model(input_data):
    """你的模型函数。"""
    question = input_data["question"]
    return {"answer": generate_answer(question)}

def accuracy_evaluator(input_data, output, expected):
    """自定义评估器。"""
    return {
        "score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
        "label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
    }

Run experiment

运行实验

results = run_experiment( dataset_name="qa-test-set", task=my_model, evaluators=[accuracy_evaluator], experiment_name="baseline-v1" )
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefined
results = run_experiment( dataset_name="qa-test-set", task=my_model, evaluators=[accuracy_evaluator], experiment_name="baseline-v1" )
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefined

Client API

客户端API

Query traces and spans

查询追踪与跨度

python
from phoenix import Client

client = Client(endpoint="http://localhost:6006")
python
from phoenix import Client

client = Client(endpoint="http://localhost:6006")

Get spans as DataFrame

获取跨度数据为DataFrame格式

spans_df = client.get_spans_dataframe( project_name="my-app", filter_condition="span_kind == 'LLM'", limit=1000 )
spans_df = client.get_spans_dataframe( project_name="my-app", filter_condition="span_kind == 'LLM'", limit=1000 )

Get specific span

获取特定跨度

span = client.get_span(span_id="abc123")
span = client.get_span(span_id="abc123")

Get trace

获取追踪数据

trace = client.get_trace(trace_id="xyz789")
undefined
trace = client.get_trace(trace_id="xyz789")
undefined

Log feedback

记录反馈

python
from phoenix import Client

client = Client()
python
from phoenix import Client

client = Client()

Log user feedback

记录用户反馈

client.log_annotation( span_id="abc123", name="user_rating", annotator_kind="HUMAN", score=0.8, label="helpful", metadata={"comment": "Good response"} )
undefined
client.log_annotation( span_id="abc123", name="user_rating", annotator_kind="HUMAN", score=0.8, label="helpful", metadata={"comment": "Good response"} )
undefined

Export data

导出数据

python
undefined
python
undefined

Export to pandas

导出为pandas DataFrame

df = client.get_spans_dataframe(project_name="my-app")
df = client.get_spans_dataframe(project_name="my-app")

Export traces

导出追踪数据

traces = client.list_traces(project_name="my-app")
undefined
traces = client.list_traces(project_name="my-app")
undefined

Production deployment

生产环境部署

Docker

Docker

bash
docker run -p 6006:6006 arizephoenix/phoenix:latest
bash
docker run -p 6006:6006 arizephoenix/phoenix:latest

With PostgreSQL

搭配PostgreSQL

bash
undefined
bash
undefined

Set database URL

设置数据库URL

export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"

Start server

启动服务器

phoenix serve --host 0.0.0.0 --port 6006
undefined
phoenix serve --host 0.0.0.0 --port 6006
undefined

Environment variables

环境变量

VariableDescriptionDefault
PHOENIX_PORT
HTTP server port
6006
PHOENIX_HOST
Server bind address
127.0.0.1
PHOENIX_GRPC_PORT
gRPC/OTLP port
4317
PHOENIX_SQL_DATABASE_URL
Database connectionSQLite temp
PHOENIX_WORKING_DIR
Data storage directoryOS temp
PHOENIX_ENABLE_AUTH
Enable authentication
false
PHOENIX_SECRET
JWT signing secretRequired if auth enabled
变量名描述默认值
PHOENIX_PORT
HTTP服务器端口
6006
PHOENIX_HOST
服务器绑定地址
127.0.0.1
PHOENIX_GRPC_PORT
gRPC/OTLP端口
4317
PHOENIX_SQL_DATABASE_URL
数据库连接地址SQLite临时数据库
PHOENIX_WORKING_DIR
数据存储目录系统临时目录
PHOENIX_ENABLE_AUTH
启用认证
false
PHOENIX_SECRET
JWT签名密钥启用认证时必填

With authentication

启用认证

bash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"

phoenix serve
bash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"

phoenix serve

Best practices

最佳实践

  1. Use projects: Separate traces by environment (dev/staging/prod)
  2. Add metadata: Include user IDs, session IDs for debugging
  3. Evaluate regularly: Run automated evaluations in CI/CD
  4. Version datasets: Track test set changes over time
  5. Monitor costs: Track token usage via Phoenix dashboards
  6. Self-host: Use PostgreSQL for production deployments
  1. 使用项目管理:按环境(开发/预发布/生产)分离追踪数据
  2. 添加元数据:包含用户ID、会话ID以辅助调试
  3. 定期评估:在CI/CD中运行自动化评估
  4. 版本化数据集:跟踪测试集随时间的变化
  5. 监控成本:通过Phoenix仪表盘跟踪Token使用情况
  6. 自托管部署:生产环境使用PostgreSQL作为数据库

Common issues

常见问题

Traces not appearing:
python
from phoenix.otel import register
追踪数据未显示:
python
from phoenix.otel import register

Verify endpoint

验证端点地址

tracer_provider = register( project_name="my-app", endpoint="http://localhost:6006/v1/traces" # Correct endpoint )
tracer_provider = register( project_name="my-app", endpoint="http://localhost:6006/v1/traces" # 正确的端点地址 )

Force flush

强制刷新

from opentelemetry import trace trace.get_tracer_provider().force_flush()

**High memory in notebook:**
```python
from opentelemetry import trace trace.get_tracer_provider().force_flush()

**Notebook中内存占用过高:**
```python

Close session when done

使用完成后关闭会话

session = px.launch_app()
session = px.launch_app()

... do work ...

... 执行操作 ...

session.close() px.close_app()

**Database connection issues:**
```bash
session.close() px.close_app()

**数据库连接问题:**
```bash

Verify PostgreSQL connection

验证PostgreSQL连接

psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"

Check Phoenix logs

查看Phoenix日志

phoenix serve --log-level debug
undefined
phoenix serve --log-level debug
undefined

References

参考资料

  • Advanced Usage - Custom evaluators, experiments, production setup
  • Troubleshooting - Common issues, debugging, performance
  • 高级用法 - 自定义评估器、实验、生产环境配置
  • 故障排查 - 常见问题、调试、性能优化

Resources

资源