phoenix-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePhoenix - AI Observability Platform
Phoenix - AI可观测性平台
Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.
面向LLM应用的开源AI可观测性与评估平台,提供追踪、评估、数据集、实验和实时监控功能。
When to use Phoenix
适用场景
Use Phoenix when:
- Debugging LLM application issues with detailed traces
- Running systematic evaluations on datasets
- Monitoring production LLM systems in real-time
- Building experiment pipelines for prompt/model comparison
- Self-hosted observability without vendor lock-in
Key features:
- Tracing: OpenTelemetry-based trace collection for any LLM framework
- Evaluation: LLM-as-judge evaluators for quality assessment
- Datasets: Versioned test sets for regression testing
- Experiments: Compare prompts, models, and configurations
- Playground: Interactive prompt testing with multiple models
- Open-source: Self-hosted with PostgreSQL or SQLite
Use alternatives instead:
- LangSmith: Managed platform with LangChain-first integration
- Weights & Biases: Deep learning experiment tracking focus
- Arize Cloud: Managed Phoenix with enterprise features
- MLflow: General ML lifecycle, model registry focus
以下场景推荐使用Phoenix:
- 通过详细追踪调试LLM应用问题
- 在数据集上开展系统性评估
- 实时监控生产环境LLM系统
- 构建用于提示词/模型对比的实验流水线
- 无需供应商锁定的自托管可观测性方案
核心功能:
- 追踪:基于OpenTelemetry的追踪采集,支持任意LLM框架
- 评估:以LLM作为评判者的质量评估器
- 数据集:带版本管理的测试集,用于回归测试
- 实验:对比不同提示词、模型和配置
- 交互式测试台:支持多模型的交互式提示词测试
- 开源:支持PostgreSQL或SQLite的自托管部署
以下场景可选择替代方案:
- LangSmith:以LangChain优先集成的托管平台
- Weights & Biases:专注深度学习实验追踪
- Arize Cloud:具备企业级功能的托管版Phoenix
- MLflow:通用机器学习生命周期管理,聚焦模型注册
Quick start
快速开始
Installation
安装
bash
pip install arize-phoenixbash
pip install arize-phoenixWith specific backends
安装特定后端依赖
pip install arize-phoenix[embeddings] # Embedding analysis
pip install arize-phoenix-otel # OpenTelemetry config
pip install arize-phoenix-evals # Evaluation framework
pip install arize-phoenix-client # Lightweight REST client
undefinedpip install arize-phoenix[embeddings] # 嵌入分析
pip install arize-phoenix-otel # OpenTelemetry配置
pip install arize-phoenix-evals # 评估框架
pip install arize-phoenix-client # 轻量REST客户端
undefinedLaunch Phoenix server
启动Phoenix服务器
python
import phoenix as pxpython
import phoenix as pxLaunch in notebook (ThreadServer mode)
在Notebook中启动(ThreadServer模式)
session = px.launch_app()
session = px.launch_app()
View UI
查看UI
session.view() # Embedded iframe
print(session.url) # http://localhost:6006
undefinedsession.view() # 嵌入式iframe
print(session.url) # http://localhost:6006
undefinedCommand-line server (production)
命令行服务器(生产环境)
bash
undefinedbash
undefinedStart Phoenix server
启动Phoenix服务器
phoenix serve
phoenix serve
With PostgreSQL
搭配PostgreSQL使用
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006
undefinedexport PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"
phoenix serve --port 6006
undefinedBasic tracing
基础追踪
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentorpython
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentorConfigure OpenTelemetry with Phoenix
配置Phoenix与OpenTelemetry
tracer_provider = register(
project_name="my-llm-app",
endpoint="http://localhost:6006/v1/traces"
)
tracer_provider = register(
project_name="my-llm-app",
endpoint="http://localhost:6006/v1/traces"
)
Instrument OpenAI SDK
为OpenAI SDK埋点
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
All OpenAI calls are now traced
所有OpenAI调用现在都会被追踪
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedfrom openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedCore concepts
核心概念
Traces and spans
追踪与跨度
A trace represents a complete execution flow, while spans are individual operations within that trace.
python
from phoenix.otel import register
from opentelemetry import trace**追踪(trace)代表完整的执行流程,而跨度(span)**是追踪中的单个操作。
python
from phoenix.otel import register
from opentelemetry import traceSetup tracing
配置追踪
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(name)
tracer_provider = register(project_name="my-app")
tracer = trace.get_tracer(name)
Create custom spans
创建自定义跨度
with tracer.start_as_current_span("process_query") as span:
span.set_attribute("input.value", query)
# Child spans are automatically nested
with tracer.start_as_current_span("retrieve_context"):
context = retriever.search(query)
with tracer.start_as_current_span("generate_response"):
response = llm.generate(query, context)
span.set_attribute("output.value", response)undefinedwith tracer.start_as_current_span("process_query") as span:
span.set_attribute("input.value", query)
# 子跨度会自动嵌套
with tracer.start_as_current_span("retrieve_context"):
context = retriever.search(query)
with tracer.start_as_current_span("generate_response"):
response = llm.generate(query, context)
span.set_attribute("output.value", response)undefinedProjects
项目
Projects organize related traces:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"项目用于组织相关的追踪数据:
python
import os
os.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"Or per-trace
或为单个追踪指定
from phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")
undefinedfrom phoenix.otel import register
tracer_provider = register(project_name="experiment-v2")
undefinedFramework instrumentation
框架埋点
OpenAI
OpenAI
python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor
tracer_provider = register()
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)LangChain
LangChain
python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor
tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.langchain import LangChainInstrumentor
tracer_provider = register()
LangChainInstrumentor().instrument(tracer_provider=tracer_provider)All LangChain operations traced
所有LangChain操作都会被追踪
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke("Hello!")
undefinedfrom langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke("Hello!")
undefinedLlamaIndex
LlamaIndex
python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
tracer_provider = register()
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)Anthropic
Anthropic
python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor
tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)python
from phoenix.otel import register
from openinference.instrumentation.anthropic import AnthropicInstrumentor
tracer_provider = register()
AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)Evaluation framework
评估框架
Built-in evaluators
内置评估器
python
from phoenix.evals import (
OpenAIModel,
HallucinationEvaluator,
RelevanceEvaluator,
ToxicityEvaluator,
llm_classify
)python
from phoenix.evals import (
OpenAIModel,
HallucinationEvaluator,
RelevanceEvaluator,
ToxicityEvaluator,
llm_classify
)Setup model for evaluation
配置评估用模型
eval_model = OpenAIModel(model="gpt-4o")
eval_model = OpenAIModel(model="gpt-4o")
Evaluate hallucination
评估幻觉情况
hallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
input="What is the capital of France?",
output="The capital of France is Paris.",
reference="Paris is the capital of France."
)
undefinedhallucination_eval = HallucinationEvaluator(eval_model)
results = hallucination_eval.evaluate(
input="What is the capital of France?",
output="The capital of France is Paris.",
reference="Paris is the capital of France."
)
undefinedCustom evaluators
自定义评估器
python
from phoenix.evals import llm_classifypython
from phoenix.evals import llm_classifyDefine custom evaluation
定义自定义评估逻辑
def evaluate_helpfulness(input_text, output_text):
template = """
Evaluate if the response is helpful for the given question.
Question: {input}
Response: {output}
Is this response helpful? Answer 'helpful' or 'not_helpful'.
"""
result = llm_classify(
model=eval_model,
template=template,
input=input_text,
output=output_text,
rails=["helpful", "not_helpful"]
)
return resultundefineddef evaluate_helpfulness(input_text, output_text):
template = """
评估给定问题的回复是否有帮助。
问题: {input}
回复: {output}
该回复是否有帮助?请回答'helpful'或'not_helpful'。
"""
result = llm_classify(
model=eval_model,
template=template,
input=input_text,
output=output_text,
rails=["helpful", "not_helpful"]
)
return resultundefinedRun evaluations on dataset
在数据集上运行评估
python
from phoenix import Client
from phoenix.evals import run_evals
client = Client()python
from phoenix import Client
from phoenix.evals import run_evals
client = Client()Get spans to evaluate
获取待评估的跨度数据
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'"
)
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'"
)
Run evaluations
运行评估
eval_results = run_evals(
dataframe=spans_df,
evaluators=[
HallucinationEvaluator(eval_model),
RelevanceEvaluator(eval_model)
],
provide_explanation=True
)
eval_results = run_evals(
dataframe=spans_df,
evaluators=[
HallucinationEvaluator(eval_model),
RelevanceEvaluator(eval_model)
],
provide_explanation=True
)
Log results back to Phoenix
将评估结果记录回Phoenix
client.log_evaluations(eval_results)
undefinedclient.log_evaluations(eval_results)
undefinedDatasets and experiments
数据集与实验
Create dataset
创建数据集
python
from phoenix import Client
client = Client()python
from phoenix import Client
client = Client()Create dataset
创建数据集
dataset = client.create_dataset(
name="qa-test-set",
description="QA evaluation dataset"
)
dataset = client.create_dataset(
name="qa-test-set",
description="QA评估数据集"
)
Add examples
添加示例
client.add_examples_to_dataset(
dataset_name="qa-test-set",
examples=[
{
"input": {"question": "What is Python?"},
"output": {"answer": "A programming language"}
},
{
"input": {"question": "What is ML?"},
"output": {"answer": "Machine learning"}
}
]
)
undefinedclient.add_examples_to_dataset(
dataset_name="qa-test-set",
examples=[
{
"input": {"question": "What is Python?"},
"output": {"answer": "A programming language"}
},
{
"input": {"question": "What is ML?"},
"output": {"answer": "Machine learning"}
}
]
)
undefinedRun experiment
运行实验
python
from phoenix import Client
from phoenix.experiments import run_experiment
client = Client()
def my_model(input_data):
"""Your model function."""
question = input_data["question"]
return {"answer": generate_answer(question)}
def accuracy_evaluator(input_data, output, expected):
"""Custom evaluator."""
return {
"score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
"label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
}python
from phoenix import Client
from phoenix.experiments import run_experiment
client = Client()
def my_model(input_data):
"""你的模型函数。"""
question = input_data["question"]
return {"answer": generate_answer(question)}
def accuracy_evaluator(input_data, output, expected):
"""自定义评估器。"""
return {
"score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,
"label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"
}Run experiment
运行实验
results = run_experiment(
dataset_name="qa-test-set",
task=my_model,
evaluators=[accuracy_evaluator],
experiment_name="baseline-v1"
)
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefinedresults = run_experiment(
dataset_name="qa-test-set",
task=my_model,
evaluators=[accuracy_evaluator],
experiment_name="baseline-v1"
)
print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
undefinedClient API
客户端API
Query traces and spans
查询追踪与跨度
python
from phoenix import Client
client = Client(endpoint="http://localhost:6006")python
from phoenix import Client
client = Client(endpoint="http://localhost:6006")Get spans as DataFrame
获取跨度数据为DataFrame格式
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'",
limit=1000
)
spans_df = client.get_spans_dataframe(
project_name="my-app",
filter_condition="span_kind == 'LLM'",
limit=1000
)
Get specific span
获取特定跨度
span = client.get_span(span_id="abc123")
span = client.get_span(span_id="abc123")
Get trace
获取追踪数据
trace = client.get_trace(trace_id="xyz789")
undefinedtrace = client.get_trace(trace_id="xyz789")
undefinedLog feedback
记录反馈
python
from phoenix import Client
client = Client()python
from phoenix import Client
client = Client()Log user feedback
记录用户反馈
client.log_annotation(
span_id="abc123",
name="user_rating",
annotator_kind="HUMAN",
score=0.8,
label="helpful",
metadata={"comment": "Good response"}
)
undefinedclient.log_annotation(
span_id="abc123",
name="user_rating",
annotator_kind="HUMAN",
score=0.8,
label="helpful",
metadata={"comment": "Good response"}
)
undefinedExport data
导出数据
python
undefinedpython
undefinedExport to pandas
导出为pandas DataFrame
df = client.get_spans_dataframe(project_name="my-app")
df = client.get_spans_dataframe(project_name="my-app")
Export traces
导出追踪数据
traces = client.list_traces(project_name="my-app")
undefinedtraces = client.list_traces(project_name="my-app")
undefinedProduction deployment
生产环境部署
Docker
Docker
bash
docker run -p 6006:6006 arizephoenix/phoenix:latestbash
docker run -p 6006:6006 arizephoenix/phoenix:latestWith PostgreSQL
搭配PostgreSQL
bash
undefinedbash
undefinedSet database URL
设置数据库URL
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"
export PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"
Start server
启动服务器
phoenix serve --host 0.0.0.0 --port 6006
undefinedphoenix serve --host 0.0.0.0 --port 6006
undefinedEnvironment variables
环境变量
| Variable | Description | Default |
|---|---|---|
| HTTP server port | |
| Server bind address | |
| gRPC/OTLP port | |
| Database connection | SQLite temp |
| Data storage directory | OS temp |
| Enable authentication | |
| JWT signing secret | Required if auth enabled |
| 变量名 | 描述 | 默认值 |
|---|---|---|
| HTTP服务器端口 | |
| 服务器绑定地址 | |
| gRPC/OTLP端口 | |
| 数据库连接地址 | SQLite临时数据库 |
| 数据存储目录 | 系统临时目录 |
| 启用认证 | |
| JWT签名密钥 | 启用认证时必填 |
With authentication
启用认证
bash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"
phoenix servebash
export PHOENIX_ENABLE_AUTH=true
export PHOENIX_SECRET="your-secret-key-min-32-chars"
export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"
phoenix serveBest practices
最佳实践
- Use projects: Separate traces by environment (dev/staging/prod)
- Add metadata: Include user IDs, session IDs for debugging
- Evaluate regularly: Run automated evaluations in CI/CD
- Version datasets: Track test set changes over time
- Monitor costs: Track token usage via Phoenix dashboards
- Self-host: Use PostgreSQL for production deployments
- 使用项目管理:按环境(开发/预发布/生产)分离追踪数据
- 添加元数据:包含用户ID、会话ID以辅助调试
- 定期评估:在CI/CD中运行自动化评估
- 版本化数据集:跟踪测试集随时间的变化
- 监控成本:通过Phoenix仪表盘跟踪Token使用情况
- 自托管部署:生产环境使用PostgreSQL作为数据库
Common issues
常见问题
Traces not appearing:
python
from phoenix.otel import register追踪数据未显示:
python
from phoenix.otel import registerVerify endpoint
验证端点地址
tracer_provider = register(
project_name="my-app",
endpoint="http://localhost:6006/v1/traces" # Correct endpoint
)
tracer_provider = register(
project_name="my-app",
endpoint="http://localhost:6006/v1/traces" # 正确的端点地址
)
Force flush
强制刷新
from opentelemetry import trace
trace.get_tracer_provider().force_flush()
**High memory in notebook:**
```pythonfrom opentelemetry import trace
trace.get_tracer_provider().force_flush()
**Notebook中内存占用过高:**
```pythonClose session when done
使用完成后关闭会话
session = px.launch_app()
session = px.launch_app()
... do work ...
... 执行操作 ...
session.close()
px.close_app()
**Database connection issues:**
```bashsession.close()
px.close_app()
**数据库连接问题:**
```bashVerify PostgreSQL connection
验证PostgreSQL连接
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
psql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"
Check Phoenix logs
查看Phoenix日志
phoenix serve --log-level debug
undefinedphoenix serve --log-level debug
undefinedReferences
参考资料
- Advanced Usage - Custom evaluators, experiments, production setup
- Troubleshooting - Common issues, debugging, performance
- 高级用法 - 自定义评估器、实验、生产环境配置
- 故障排查 - 常见问题、调试、性能优化
Resources
资源
- Documentation: https://docs.arize.com/phoenix
- Repository: https://github.com/Arize-ai/phoenix
- Docker Hub: https://hub.docker.com/r/arizephoenix/phoenix
- Version: 12.0.0+
- License: Apache 2.0
- 文档:https://docs.arize.com/phoenix
- 代码仓库:https://github.com/Arize-ai/phoenix
- Docker Hub:https://hub.docker.com/r/arizephoenix/phoenix
- 版本:12.0.0+
- 许可证:Apache 2.0