langfuse-observability

Original🇺🇸 English
Translated

Instrument LLM applications with Langfuse tracing. Use when setting up Langfuse, adding observability to LLM calls, or auditing existing instrumentation.

4installs
Added on

NPX Install

npx skill4agent add langfuse/skills langfuse-observability

Langfuse Observability

Instrument LLM applications with Langfuse tracing, following best practices and tailored to your use case.

When to Use

  • Setting up Langfuse in a new project
  • Auditing existing Langfuse instrumentation
  • Adding observability to LLM calls

Workflow

1. Assess Current State

Check the project:
  • Is Langfuse SDK installed?
  • What LLM frameworks are used? (OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK, etc.)
  • Is there existing instrumentation?
No integration yet: Set up Langfuse using a framework integration if available. Integrations capture more context automatically and require less code than manual instrumentation.
Integration exists: Audit against baseline requirements below.

2. Verify Baseline Requirements

Every trace should have these fundamentals:
RequirementCheckWhy
Model nameIs the LLM model captured?Enables model comparison and filtering
Token usageAre input/output tokens tracked?Enables automatic cost calculation
Good trace namesAre names descriptive? (
chat-response
, not
trace-1
)
Makes traces findable and filterable
Span hierarchyAre multi-step operations nested properly?Shows which step is slow or failing
Correct observation typesAre generations marked as generations?Enables model-specific analytics
Sensitive data maskedIs PII/confidential data excluded or masked?Prevents data leakage
Trace input/outputDoes the trace capture the full data being processed as input, and the result as output?Enables debugging and understanding what was processed
Framework integrations (OpenAI, LangChain, etc.) handle model name, tokens, and observation types automatically. Prefer integrations over manual instrumentation.

3. Explore Traces First

Once baseline instrumentation is working, encourage the user to explore their traces in the Langfuse UI before adding more context:
"Your traces are now appearing in Langfuse. Take a look at a few of them—see what data is being captured, what's useful, and what's missing. This will help us decide what additional context to add."
This helps the user:
  • Understand what they're already getting
  • Form opinions about what's missing
  • Ask better questions about what they need

4. Discover Additional Context Needs

Determine what additional instrumentation would be valuable. Infer from code when possible, only ask when unclear.
Infer from code:
If you see in code...InferSuggest
Conversation history, chat endpoints, message arraysMulti-turn app
session_id
User authentication,
user_id
variables
User-aware app
user_id
on traces
Multiple distinct endpoints/featuresMulti-feature app
feature
tag
Customer/tenant identifiersMulti-tenant app
customer_id
or tier tag
Feedback collection, ratingsHas user feedbackCapture as scores
Only ask when not obvious from code:
  • "How do you know when a response is good vs bad?" → Determines scoring approach
  • "What would you want to filter by in a dashboard?" → Surfaces non-obvious tags
  • "Are there different user segments you'd want to compare?" → Customer tiers, plans, etc.
Additions and their value:
AdditionWhyDocs
session_id
Groups conversations togetherhttps://langfuse.com/docs/tracing-features/sessions
user_id
Enables user filtering and cost attributionhttps://langfuse.com/docs/tracing-features/users
User feedback scoreEnables quality filtering and trendshttps://langfuse.com/docs/scores/overview
feature
tag
Per-feature analyticshttps://langfuse.com/docs/tracing-features/tags
customer_tier
tag
Cost/quality breakdown by segmenthttps://langfuse.com/docs/tracing-features/tags
These are NOT baseline requirements—only add what's relevant based on inference or user input.

5. Guide to UI

After adding context, point users to relevant UI features:
  • Traces view: See individual requests
  • Sessions view: See grouped conversations (if session_id added)
  • Dashboard: Build filtered views using tags
  • Scores: Filter by quality metrics

Framework Integrations

Prefer these over manual instrumentation:
FrameworkIntegrationDocs
OpenAI SDKDrop-in replacementhttps://langfuse.com/docs/integrations/openai
LangChainCallback handlerhttps://langfuse.com/docs/integrations/langchain
LlamaIndexCallback handlerhttps://langfuse.com/docs/integrations/llama-index
Vercel AI SDKOpenTelemetry exporterhttps://langfuse.com/docs/integrations/vercel-ai-sdk
LiteLLMCallback or proxyhttps://langfuse.com/docs/integrations/litellm

Always Explain Why

When suggesting additions, explain the user benefit:
"I recommend adding session_id to your traces.

Why: This groups messages from the same conversation together.
You'll be able to see full conversation flows in the Sessions view,
making it much easier to debug multi-turn interactions.

Learn more: https://langfuse.com/docs/tracing-features/sessions"

Common Mistakes

MistakeProblemFix
No
flush()
in scripts
Traces never sentCall
langfuse.flush()
before exit
Flat tracesCan't see which step failedUse nested spans for distinct steps
Generic trace namesHard to filterUse descriptive names:
chat-response
,
doc-summary
Logging sensitive dataData leakage riskMask PII before tracing
Manual instrumentation when integration existsMore code, less contextUse framework integration
Langfuse import before env vars loadedLangfuse initializes with missing/wrong credentialsImport Langfuse AFTER loading environment variables (e.g., after
load_dotenv()
)
Wrong import order with OpenAILangfuse can't patch the OpenAI clientImport Langfuse and call its setup BEFORE importing OpenAI client