langfuse-observability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Langfuse Observability

Langfuse 可观测性

Instrument LLM applications with Langfuse tracing, following best practices and tailored to your use case.

使用Langfuse追踪为LLM应用程序添加观测能力，遵循最佳实践并根据你的使用场景定制方案。

When to Use

适用场景

Setting up Langfuse in a new project
Auditing existing Langfuse instrumentation
Adding observability to LLM calls

在新项目中搭建Langfuse
审计现有Langfuse观测实现
为LLM调用添加可观测性

Workflow

工作流程

1. Assess Current State

1. 评估当前状态

Check the project:

Is Langfuse SDK installed?
What LLM frameworks are used? (OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK, etc.)
Is there existing instrumentation?

No integration yet: Set up Langfuse using a framework integration if available. Integrations capture more context automatically and require less code than manual instrumentation.

Integration exists: Audit against baseline requirements below.

检查项目情况：

是否已安装Langfuse SDK？
使用了哪些LLM框架？（OpenAI SDK、LangChain、LlamaIndex、Vercel AI SDK等）
是否已有观测实现？

尚未集成： 如果有可用的框架集成，使用框架集成来搭建Langfuse。集成会自动捕获更多上下文，比手动实现所需代码更少。

已集成： 对照下方的基线要求进行审计。

2. Verify Baseline Requirements

2. 验证基线要求

Every trace should have these fundamentals:

Requirement	Check	Why
Model name	Is the LLM model captured?	Enables model comparison and filtering
Token usage	Are input/output tokens tracked?	Enables automatic cost calculation
Good trace names	Are names descriptive? ( `chat-response` , not `trace-1` )	Makes traces findable and filterable
Span hierarchy	Are multi-step operations nested properly?	Shows which step is slow or failing
Correct observation types	Are generations marked as generations?	Enables model-specific analytics
Sensitive data masked	Is PII/confidential data excluded or masked?	Prevents data leakage
Trace input/output	Does the trace capture the full data being processed as input, and the result as output?	Enables debugging and understanding what was processed

Framework integrations (OpenAI, LangChain, etc.) handle model name, tokens, and observation types automatically. Prefer integrations over manual instrumentation.

Docs: https://langfuse.com/docs/tracing

每条追踪记录都应具备以下基础要素：

要求	检查项	原因
模型名称	是否捕获了LLM模型信息？	支持模型对比与筛选
Token使用量	是否追踪了输入/输出Token数量？	支持自动计算成本
清晰的追踪名称	名称是否具有描述性？（如 `chat-response` ，而非 `trace-1` ）	便于追踪记录的查找与筛选
调用层级结构	多步骤操作是否正确嵌套？	可定位哪个步骤缓慢或失败
正确的观测类型	生成结果是否标记为生成类型？	支持模型特定的分析
敏感数据掩码	是否排除或掩码了PII/机密数据？	防止数据泄露
追踪输入/输出	追踪是否捕获了处理的完整输入数据和结果输出？	便于调试和理解处理过程

框架集成（OpenAI、LangChain等）会自动处理模型名称、Token和观测类型。优先使用集成而非手动实现。

文档：https://langfuse.com/docs/tracing

3. Explore Traces First

3. 先探索追踪记录

Once baseline instrumentation is working, encourage the user to explore their traces in the Langfuse UI before adding more context:

"Your traces are now appearing in Langfuse. Take a look at a few of them—see what data is being captured, what's useful, and what's missing. This will help us decide what additional context to add."

This helps the user:

Understand what they're already getting
Form opinions about what's missing
Ask better questions about what they need

基线观测实现正常工作后，建议用户先在Langfuse UI中探索他们的追踪记录，再添加更多上下文：

"你的追踪记录现在已出现在Langfuse中。查看几条记录，了解当前捕获的数据、有用的信息以及缺失的内容。这将帮助我们决定需要添加哪些额外上下文。"

这有助于用户：

了解当前已获取的信息
明确缺失的内容
提出更精准的需求

4. Discover Additional Context Needs

4. 发掘额外上下文需求

Determine what additional instrumentation would be valuable. Infer from code when possible, only ask when unclear.

Infer from code:

If you see in code...	Infer	Suggest
Conversation history, chat endpoints, message arrays	Multi-turn app	`session_id`
User authentication, `user_id` variables	User-aware app	`user_id` on traces
Multiple distinct endpoints/features	Multi-feature app	`feature` tag
Customer/tenant identifiers	Multi-tenant app	`customer_id` or tier tag
Feedback collection, ratings	Has user feedback	Capture as scores

Only ask when not obvious from code:

"How do you know when a response is good vs bad?" → Determines scoring approach
"What would you want to filter by in a dashboard?" → Surfaces non-obvious tags
"Are there different user segments you'd want to compare?" → Customer tiers, plans, etc.

Additions and their value:

Addition	Why	Docs
`session_id`	Groups conversations together	https://langfuse.com/docs/tracing-features/sessions
`user_id`	Enables user filtering and cost attribution	https://langfuse.com/docs/tracing-features/users
User feedback score	Enables quality filtering and trends	https://langfuse.com/docs/scores/overview
`feature` tag	Per-feature analytics	https://langfuse.com/docs/tracing-features/tags
`customer_tier` tag	Cost/quality breakdown by segment	https://langfuse.com/docs/tracing-features/tags

These are NOT baseline requirements—only add what's relevant based on inference or user input.

确定哪些额外的观测实现有价值。尽可能从代码中推断，仅在不明确时询问用户。

从代码中推断：

如果在代码中看到...	推断结论	建议
对话历史、聊天端点、消息数组	多轮对话应用	添加 `session_id`
用户认证、 `user_id` 变量	感知用户的应用	在追踪记录中添加 `user_id`
多个不同的端点/功能	多功能应用	添加 `feature` 标签
客户/租户标识符	多租户应用	添加 `customer_id` 或层级标签
反馈收集、评分功能	具备用户反馈机制	捕获为评分

仅在代码中不明确时询问：

"你如何判断响应的好坏？" → 确定评分方式
"你希望在仪表盘中筛选哪些内容？" → 发现非显性标签需求
"是否有不同的用户群体需要对比？" → 客户层级、套餐等

额外添加项及其价值：

添加项	价值	文档
`session_id`	将同一场对话的消息分组	https://langfuse.com/docs/tracing-features/sessions
`user_id`	支持用户筛选与成本归因	https://langfuse.com/docs/tracing-features/users
用户反馈评分	支持质量筛选与趋势分析	https://langfuse.com/docs/scores/overview
`feature` 标签	按功能维度分析	https://langfuse.com/docs/tracing-features/tags
`customer_tier` 标签	按群体细分成本/质量	https://langfuse.com/docs/tracing-features/tags

这些并非基线要求——仅根据推断或用户输入添加相关内容。

5. Guide to UI

5. 引导使用UI

After adding context, point users to relevant UI features:

Traces view: See individual requests
Sessions view: See grouped conversations (if session_id added)
Dashboard: Build filtered views using tags
Scores: Filter by quality metrics

添加上下文后，引导用户使用相关UI功能：

追踪记录视图：查看单个请求
会话视图：查看分组的对话（如果添加了session_id）
仪表板：使用标签构建筛选视图
评分：按质量指标筛选

Framework Integrations

框架集成

Prefer these over manual instrumentation:

Framework	Integration	Docs
OpenAI SDK	Drop-in replacement	https://langfuse.com/docs/integrations/openai
LangChain	Callback handler	https://langfuse.com/docs/integrations/langchain
LlamaIndex	Callback handler	https://langfuse.com/docs/integrations/llama-index
Vercel AI SDK	OpenTelemetry exporter	https://langfuse.com/docs/integrations/vercel-ai-sdk
LiteLLM	Callback or proxy	https://langfuse.com/docs/integrations/litellm

Full list: https://langfuse.com/docs/integrations

优先使用框架集成而非手动实现：

框架	集成方式	文档
OpenAI SDK	直接替换使用	https://langfuse.com/docs/integrations/openai
LangChain	回调处理器	https://langfuse.com/docs/integrations/langchain
LlamaIndex	回调处理器	https://langfuse.com/docs/integrations/llama-index
Vercel AI SDK	OpenTelemetry 导出器	https://langfuse.com/docs/integrations/vercel-ai-sdk
LiteLLM	回调或代理	https://langfuse.com/docs/integrations/litellm

完整列表：https://langfuse.com/docs/integrations

Always Explain Why

始终解释原因

When suggesting additions, explain the user benefit:

"I recommend adding session_id to your traces.

Why: This groups messages from the same conversation together.
You'll be able to see full conversation flows in the Sessions view,
making it much easier to debug multi-turn interactions.

Learn more: https://langfuse.com/docs/tracing-features/sessions"

当建议添加内容时，向用户说明益处：

"我建议在追踪记录中添加session_id。

原因：它会将同一场对话的消息分组在一起。
你可以在会话视图中查看完整的对话流程，
这会让调试多轮交互变得更加容易。

了解更多：https://langfuse.com/docs/tracing-features/sessions"

Common Mistakes

常见错误

Mistake	Problem	Fix
No `flush()` in scripts	Traces never sent	Call `langfuse.flush()` before exit
Flat traces	Can't see which step failed	Use nested spans for distinct steps
Generic trace names	Hard to filter	Use descriptive names: `chat-response` , `doc-summary`
Logging sensitive data	Data leakage risk	Mask PII before tracing
Manual instrumentation when integration exists	More code, less context	Use framework integration
Langfuse import before env vars loaded	Langfuse initializes with missing/wrong credentials	Import Langfuse AFTER loading environment variables (e.g., after `load_dotenv()` )
Wrong import order with OpenAI	Langfuse can't patch the OpenAI client	Import Langfuse and call its setup BEFORE importing OpenAI client

错误	问题	修复方案
脚本中未调用 `flush()`	追踪记录从未发送	在退出前调用 `langfuse.flush()`
扁平的追踪结构	无法定位失败步骤	为不同步骤使用嵌套调用层级
通用的追踪名称	难以筛选	使用描述性名称： `chat-response` 、 `doc-summary`
记录敏感数据	存在数据泄露风险	在追踪前对PII进行掩码处理
已有集成却手动实现	代码更多，上下文更少	使用框架集成
加载环境变量前导入Langfuse	Langfuse初始化时缺少/错误的凭证	在加载环境变量（如 `load_dotenv()` ）后再导入Langfuse
OpenAI导入顺序错误	Langfuse无法修补OpenAI客户端	在导入OpenAI客户端前，先导入Langfuse并完成设置