phoenix-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePhoenix - AI Observability Platform
Phoenix - AI可观测平台
Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.
用于LLM应用的开源AI可观测与评估平台,支持链路追踪、评估、数据集管理、实验对比和实时监控。
When to use Phoenix
适用场景
Use Phoenix when:
- Debugging LLM application issues with detailed traces
- Running systematic evaluations on datasets
- Monitoring production LLM systems in real-time
- Building experiment pipelines for prompt/model comparison
- Self-hosted observability without vendor lock-in
Key features:
- Tracing: OpenTelemetry-based trace collection for any LLM framework
- Evaluation: LLM-as-judge evaluators for quality assessment
- Datasets: Versioned test sets for regression testing
- Experiments: Compare prompts, models, and configurations
- Playground: Interactive prompt testing with multiple models
- Open-source: Self-hosted with PostgreSQL or SQLite
Use alternatives instead:
- LangSmith: Managed platform with LangChain-first integration
- Weights & Biases: Deep learning experiment tracking focus
- Arize Cloud: Managed Phoenix with enterprise features
- MLflow: General ML lifecycle, model registry focus
你可以在以下场景使用Phoenix:
- 通过详细链路排查LLM应用的问题
- 对数据集执行系统化评估
- 实时监控生产环境的LLM系统
- 搭建实验流水线用于提示词/模型对比
- 无需绑定服务商的自托管可观测方案
核心特性:
- 链路追踪:基于OpenTelemetry的链路采集能力,支持所有LLM框架
- 评估能力:基于LLM-as-judge的评估器,用于质量检测
- 数据集管理:带版本控制的测试集,用于回归测试
- 实验对比:对比不同提示词、模型和配置的效果
- 沙盒环境:支持多模型的交互式提示词测试
- 开源免费:可基于PostgreSQL或SQLite自托管部署
可选替代方案:
- LangSmith:优先集成LangChain的托管平台
- Weights & Biases:侧重深度学习实验追踪的工具
- Arize Cloud:带企业级特性的Phoenix托管版本
- MLflow:侧重通用ML生命周期、模型注册的平台
Quick start
快速开始
Installation
安装
bash
pip install arize-phoenixbash
pip install arize-phoenixWith specific backends
With specific backends
pip install arize-phoenix[embeddings] # Embedding analysis
pip install arize-phoenix-otel # OpenTelemetry config
pip install arize-phoenix-evals # Evaluation framework
pip install arize-phoenix-client # Lightweight REST client
undefinedpip install arize-phoenix[embeddings] # Embedding analysis
pip install arize-phoenix-otel # OpenTelemetry config
pip install arize-phoenix-evals # Evaluation framework
pip install arize-phoenix-client # Lightweight REST client
undefinedLaunch Phoenix server
启动Phoenix服务
python
import phoenix as pxpython
import phoenix as pxLaunch in notebook (ThreadServer mode)
Launch in notebook (ThreadServer mode)
session = px.launch_app()
session = px.launch_app()
View UI
View UI
session.view() # Embedded iframe
print(session.url) # http://localhost:6006
undefinedsession.view() # Embedded iframe
print(session.url) # http://localhost:6006
undefinedCommand-line server (production)
命令行启动服务(生产环境)
bash
undefinedbash
undefinedStart Phoenix server
Start Phoenix server
phoenix serve
phoenix serve
With PostgreSQL
With PostgreSQL
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006
undefinedexport PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006
undefinedBasic tracing
基础链路追踪
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentorpython
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentorConfigure OpenTelemetry with Phoenix
Configure OpenTelemetry with Phoenix
tracer_provider = register(
project_name="my-llm-app",
endpoint="http://localhost:6006/v1/traces"
)
tracer_provider = register(
project_name="my-llm-app",
endpoint="http://localhost:6006/v1/traces"
)
Instrument OpenAI SDK
Instrument OpenAI SDK
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
All OpenAI calls are now traced
All OpenAI calls are now traced
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedfrom openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedCore concepts
核心概念
Traces and spans
链路与Span
A trace represents a complete execution flow, while spans are individual operations within that trace.
python
from phoenix.otel import register
from opentelemetry import traceTrace代表完整的执行流程,而Span是该流程中的单个操作单元。
python
from phoenix.otel import register
from opentelemetry import traceSetup tracing
Setup tracing
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(name)
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(name)
Create custom spans
Create custom spans
with tracer.start_as_current_span("process_query") as span:
span.set_attribute("input.value", query)
# Child spans are automatically nested
with tracer.start_as_current_span("retrieve_context"):
context = retriever.search(query)
with tracer.start_as_current_span("generate_response"):
response = llm.generate(query, context)
span.set_attribute("output.value", response)undefinedwith tracer.start_as_current_span("process_query") as span:
span.set_attribute("input.value", query)
# Child spans are automatically nested
with tracer.start_as_current_span("retrieve_context"):
context = retriever.search(query)
with tracer.start_as_current_span("generate_response"):
response = llm.generate(query, context)
span.set_attribute("output.value", response)undefinedProjects
项目
Projects organize related traces:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"项目用于归类相关的链路数据:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"Or per-trace
Or per-trace
from phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")
undefinedfrom phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")
undefinedFramework instrumentation
框架埋点集成
OpenAI
OpenAI
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)LangChain
LangChain
python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor
tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor
tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)All LangChain operations traced
All LangChain operations traced
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke("Hello!")
undefinedfrom langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke("Hello!")
undefinedLlamaIndex
LlamaIndex
python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)Anthropic
Anthropic
python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor
tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor
tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)Evaluation framework
评估框架
Built-in evaluators
内置评估器
python
from phoenix.evals import (
OpenAIModel,
HallucinationEvaluator,
RelevanceEvaluator,
ToxicityEvaluator,
llm_classify
)python
from phoenix.evals import (
OpenAIModel,
HallucinationEvaluator,
RelevanceEvaluator,
ToxicityEvaluator,
llm_classify
)Setup model for evaluation
Setup model for evaluation
eval_model = OpenAIModel(model="gpt-4o")
eval_model = OpenAIModel(model="gpt-4o")
Evaluate hallucination
Evaluate hallucination
hallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
input="What is the capital of France?",
output="The capital of France is Paris.",
reference="Paris is the capital of France."
)
undefinedhallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
input="What is the capital of France?",
output="The capital of France is Paris.",
reference="Paris is the capital of France."
)
undefinedCustom evaluators
自定义评估器
python
from phoenix.evals import llm_classifypython
from phoenix.evals import llm_classifyDefine custom evaluation
Define custom evaluation
def evaluate_helpfulness(input_text, output_text):
template = """
Evaluate if the response is helpful for the given question.
Question: {input}
Response: {output}
Is this response helpful? Answer 'helpful' or 'not_helpful'.
"""
result = llm_classify(
model=eval_model,
template=template,
input=input_text,
output=output_text,
rails=["helpful", "not_helpful"]
)
return resultundefineddef evaluate_helpfulness(input_text, output_text):
template = """
Evaluate if the response is helpful for the given question.
Question: {input}
Response: {output}
Is this response helpful? Answer 'helpful' or 'not_helpful'.
"""
result = llm_classify(
model=eval_model,
template=template,
input=input_text,
output=output_text,
rails=["helpful", "not_helpful"]
)
return resultundefinedRun evaluations on dataset
对数据集运行评估
python
from phoenix import Client
from phoenix.evals import run_evals
client = Client()python
from phoenix import Client
from phoenix.evals import run_evals
client = Client()Get spans to evaluate
Get spans to evaluate
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'"
)
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'"
)
Run evaluations
Run evaluations
eval_results = run_evals(
dataframe=spans_df,
evaluators=[
HallucinationEvaluator(eval_model),
RelevanceEvaluator(eval_model)
],
provide_explanation=True
)
eval_results = run_evals(
dataframe=spans_df,
evaluators=[
HallucinationEvaluator(eval_model),
RelevanceEvaluator(eval_model)
],
provide_explanation=True
)
Log results back to Phoenix
Log results back to Phoenix
client.log_evaluations(eval_results)
undefinedclient.log_evaluations(eval_results)
undefinedDatasets and experiments
数据集与实验
Create dataset
创建数据集
python
from phoenix import Client
client = Client()python
from phoenix import Client
client = Client()Create dataset
Create dataset
dataset = client.create_dataset(
name="qa-test-set",
description="QA evaluation dataset"
)
dataset = client.create_dataset(
name="qa-test-set",
description="QA evaluation dataset"
)
Add examples
Add examples
client.add_examples_to_dataset(
dataset_name="qa-test-set",
examples=[
{
"input": {"question": "What is Python?"},
"output": {"answer": "A programming language"}
},
{
"input": {"question": "What is ML?"},
"output": {"answer": "Machine learning"}
}
]
)
undefinedclient.add_examples_to_dataset(
dataset_name="qa-test-set",
examples=[
{
"input": {"question": "What is Python?"},
"output": {"answer": "A programming language"}
},
{
"input": {"question": "What is ML?"},
"output": {"answer": "Machine learning"}
}
]
)
undefinedRun experiment
运行实验
python
from phoenix import Client
from phoenix.experiments import run_experiment
client = Client()
def my_model(input_data):
"""Your model function."""
question = input_data["question"]
return {"answer": generate_answer(question)}
def accuracy_evaluator(input_data, output, expected):
"""Custom evaluator."""
return {
"score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
"label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
}python
from phoenix import Client
from phoenix.experiments import run_experiment
client = Client()
def my_model(input_data):
"""Your model function."""
question = input_data["question"]
return {"answer": generate_answer(question)}
def accuracy_evaluator(input_data, output, expected):
"""Custom evaluator."""
return {
"score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
"label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
}Run experiment
Run experiment
results = run_experiment(
dataset_name="qa-test-set",
task=my_model,
evaluators=[accuracy_evaluator],
experiment_name="baseline-v1"
)
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefinedresults = run_experiment(
dataset_name="qa-test-set",
task=my_model,
evaluators=[accuracy_evaluator],
experiment_name="baseline-v1"
)
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefinedClient API
客户端API
Query traces and spans
查询链路与Span
python
from phoenix import Client
client = Client(endpoint="http://localhost:6006")python
from phoenix import Client
client = Client(endpoint="http://localhost:6006")Get spans as DataFrame
Get spans as DataFrame
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'",
limit=1000
)
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'",
limit=1000
)
Get specific span
Get specific span
span = client.get_span(span_id="abc123")
span = client.get_span(span_id="abc123")
Get trace
Get trace
trace = client.get_trace(trace_id="xyz789")
undefinedtrace = client.get_trace(trace_id="xyz789")
undefinedLog feedback
上报反馈
python
from phoenix import Client
client = Client()python
from phoenix import Client
client = Client()Log user feedback
Log user feedback
client.log_annotation(
span_id="abc123",
name="user_rating",
annotator_kind="HUMAN",
score=0.8,
label="helpful",
metadata={"comment": "Good response"}
)
undefinedclient.log_annotation(
span_id="abc123",
name="user_rating",
annotator_kind="HUMAN",
score=0.8,
label="helpful",
metadata={"comment": "Good response"}
)
undefinedExport data
数据导出
python
undefinedpython
undefinedExport to pandas
Export to pandas
df = client.get_spans_dataframe(project_name="my-app")
df = client.get_spans_dataframe(project_name="my-app")
Export traces
Export traces
traces = client.list_traces(project_name="my-app")
undefinedtraces = client.list_traces(project_name="my-app")
undefinedProduction deployment
生产环境部署
Docker
Docker
bash
docker run -p 6006:6006 arizephoenix/phoenix:latestbash
docker run -p 6006:6006 arizephoenix/phoenix:latestWith PostgreSQL
搭配PostgreSQL使用
bash
undefinedbash
undefinedSet database URL
Set database URL
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"
Start server
Start server
phoenix serve --host 0.0.0.0 --port 6006
undefinedphoenix serve --host 0.0.0.0 --port 6006
undefinedEnvironment variables
环境变量
| Variable | Description | Default |
|---|---|---|
| HTTP server port | |
| Server bind address | |
| gRPC/OTLP port | |
| Database connection | SQLite temp |
| Data storage directory | OS temp |
| Enable authentication | |
| JWT signing secret | Required if auth enabled |
| 变量名 | 描述 | 默认值 |
|---|---|---|
| HTTP服务端口 | |
| 服务绑定地址 | |
| gRPC/OTLP端口 | |
| 数据库连接地址 | 临时SQLite |
| 数据存储目录 | 系统临时目录 |
| 开启身份验证 | |
| JWT签名密钥 | 开启验证时必填 |
| 管理员初始化Token | 开启验证时必填 |
With authentication
开启身份验证
bash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"
phoenix servebash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"
phoenix serveBest practices
最佳实践
- Use projects: Separate traces by environment (dev/staging/prod)
- Add metadata: Include user IDs, session IDs for debugging
- Evaluate regularly: Run automated evaluations in CI/CD
- Version datasets: Track test set changes over time
- Monitor costs: Track token usage via Phoenix dashboards
- Self-host: Use PostgreSQL for production deployments
- 合理使用项目:按环境(开发/测试/生产)拆分链路数据
- 添加元数据:上报用户ID、会话ID便于排查问题
- 定期执行评估:在CI/CD中运行自动化评估
- 数据集版本控制:跟踪测试集的历史变更
- 成本监控:通过Phoenix看板跟踪Token使用量
- 自托管部署建议:生产环境使用PostgreSQL作为数据库
Common issues
常见问题
Traces not appearing:
python
from phoenix.otel import register链路不显示:
python
from phoenix.otel import registerVerify endpoint
Verify endpoint
tracer_provider = register(
project_name="my-app",
endpoint="http://localhost:6006/v1/traces" # Correct endpoint
)
tracer_provider = register(
project_name="my-app",
endpoint="http://localhost:6006/v1/traces" # Correct endpoint
)
Force flush
Force flush
from opentelemetry import trace
trace.get_tracer_provider().force_flush()
**High memory in notebook:**
```pythonfrom opentelemetry import trace
trace.get_tracer_provider().force_flush()
**Notebook内存占用过高:**
```pythonClose session when done
Close session when done
session = px.launch_app()
session = px.launch_app()
... do work ...
... do work ...
session.close()
px.close_app()
**Database connection issues:**
```bashsession.close()
px.close_app()
**数据库连接问题:**
```bashVerify PostgreSQL connection
Verify PostgreSQL connection
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
Check Phoenix logs
Check Phoenix logs
phoenix serve --log-level debug
undefinedphoenix serve --log-level debug
undefinedReferences
参考资料
- Advanced Usage - Custom evaluators, experiments, production setup
- Troubleshooting - Common issues, debugging, performance
- 高阶用法 - 自定义评估器、实验、生产环境配置
- 故障排查 - 常见问题、调试、性能优化
Resources
资源
- Documentation: https://docs.arize.com/phoenix
- Repository: https://github.com/Arize-ai/phoenix
- Docker Hub: https://hub.docker.com/r/arizephoenix/phoenix
- Version: 12.0.0+
- License: Apache 2.0
- 官方文档:https://docs.arize.com/phoenix
- 代码仓库:https://github.com/Arize-ai/phoenix
- Docker Hub地址:https://hub.docker.com/r/arizephoenix/phoenix
- 版本要求:12.0.0+
- 开源协议:Apache 2.0