langsmith-observability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LangSmith - LLM Observability Platform

LangSmith - LLM可观测性平台

Development platform for debugging, evaluating, and monitoring language models and AI applications.

用于调试、评估和监控大语言模型（LLM）及AI应用的开发平台。

When to use LangSmith

何时使用LangSmith

Use LangSmith when:

Debugging LLM application issues (prompts, chains, agents)
Evaluating model outputs systematically against datasets
Monitoring production LLM systems
Building regression testing for AI features
Analyzing latency, token usage, and costs
Collaborating on prompt engineering

Key features:

Tracing: Capture inputs, outputs, latency for all LLM calls
Evaluation: Systematic testing with built-in and custom evaluators
Datasets: Create test sets from production traces or manually
Monitoring: Track metrics, errors, and costs in production
Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex

Use alternatives instead:

Weights & Biases: Deep learning experiment tracking, model training
MLflow: General ML lifecycle, model registry focus
Arize/WhyLabs: ML monitoring, data drift detection

在以下场景使用LangSmith：

调试LLM应用问题（提示词、链、Agent）
针对数据集系统化评估模型输出
监控生产环境中的LLM系统
为AI功能构建回归测试
分析延迟、Token使用量及成本
协作进行提示词工程

核心功能：

追踪（Tracing）：捕获所有LLM调用的输入、输出和延迟数据
评估（Evaluation）：使用内置及自定义评估器进行系统化测试
数据集（Datasets）：从生产追踪数据或手动创建测试集
监控（Monitoring）：在生产环境中追踪指标、错误及成本
集成（Integrations）：支持OpenAI、Anthropic、LangChain、LlamaIndex

可选择替代工具的场景：

Weights & Biases：深度学习实验追踪、模型训练
MLflow：通用机器学习生命周期管理、模型注册表
Arize/WhyLabs：机器学习监控、数据漂移检测

Quick start

快速开始

Installation

安装

bash

pip install langsmith

bash

pip install langsmith

Set environment variables

设置环境变量

export LANGSMITH_API_KEY="your-api-key" export LANGSMITH_TRACING=true

undefined

export LANGSMITH_API_KEY="your-api-key" export LANGSMITH_TRACING=true

undefined

Basic tracing with @traceable

使用@traceable进行基础追踪

python

from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable
def generate_response(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

python

from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable
def generate_response(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Automatically traced to LangSmith

自动追踪至LangSmith

result = generate_response("What is machine learning?")

undefined

result = generate_response("What is machine learning?")

undefined

OpenAI wrapper (automatic tracing)

OpenAI包装器（自动追踪）

python

from langsmith.wrappers import wrap_openai
from openai import OpenAI

python

from langsmith.wrappers import wrap_openai
from openai import OpenAI

Wrap client for automatic tracing

包装客户端以实现自动追踪

client = wrap_openai(OpenAI())

All calls automatically traced

所有调用都会被自动追踪

response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )

undefined

response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )

undefined

Core concepts

核心概念

Runs and traces

运行与追踪

A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow.

python

from langsmith import traceable

@traceable(run_type="chain")
def process_query(query: str) -> str:
    # Parent run
    context = retrieve_context(query)  # Child run
    response = generate_answer(query, context)  # Child run
    return response

@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
    return vector_store.search(query)

@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
    return llm.invoke(f"Context: {context}\n\nQuestion: {query}")

运行（Run）是单个执行单元（LLM调用、链、工具）。运行构成层级化的追踪（Traces），展示完整的执行流程。

python

from langsmith import traceable

@traceable(run_type="chain")
def process_query(query: str) -> str:
    # 父级运行
    context = retrieve_context(query)  # 子级运行
    response = generate_answer(query, context)  # 子级运行
    return response

@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
    return vector_store.search(query)

@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
    return llm.invoke(f"Context: {context}\n\nQuestion: {query}")

Projects

项目

Projects organize related runs. Set via environment or code:

python

import os
os.environ["LANGSMITH_PROJECT"] = "my-project"

项目用于组织相关的运行记录。可通过环境变量或代码设置：

python

import os
os.environ["LANGSMITH_PROJECT"] = "my-project"

Or per-function

或为单个函数设置

@traceable(project_name="my-project") def my_function(): pass

undefined

@traceable(project_name="my-project") def my_function(): pass

undefined

Client API

客户端API

python

from langsmith import Client

client = Client()

python

from langsmith import Client

client = Client()

List runs

列出运行记录

runs = list(client.list_runs( project_name="my-project", filter='eq(status, "success")', limit=100 ))

Get run details

获取运行详情

run = client.read_run(run_id="...")

Create feedback

创建反馈

client.create_feedback( run_id="...", key="correctness", score=0.9, comment="Good answer" )

undefined

client.create_feedback( run_id="...", key="correctness", score=0.9, comment="Good answer" )

undefined

Datasets and evaluation

数据集与评估

Create dataset

创建数据集

python

from langsmith import Client

client = Client()

python

from langsmith import Client

client = Client()

Create dataset

创建数据集

dataset = client.create_dataset("qa-test-set", description="QA evaluation")

Add examples

添加示例

client.create_examples( inputs=[ {"question": "What is Python?"}, {"question": "What is ML?"} ], outputs=[ {"answer": "A programming language"}, {"answer": "Machine learning"} ], dataset_id=dataset.id )

undefined

undefined

Run evaluation

运行评估

python

from langsmith import evaluate

def my_model(inputs: dict) -> dict:
    # Your model logic
    return {"answer": generate_answer(inputs["question"])}

def correctness_evaluator(run, example):
    prediction = run.outputs["answer"]
    reference = example.outputs["answer"]
    score = 1.0 if reference.lower() in prediction.lower() else 0.0
    return {"key": "correctness", "score": score}

results = evaluate(
    my_model,
    data="qa-test-set",
    evaluators=[correctness_evaluator],
    experiment_prefix="v1"
)

print(f"Average score: {results.aggregate_metrics['correctness']}")

python

from langsmith import evaluate

def my_model(inputs: dict) -> dict:
    # 你的模型逻辑
    return {"answer": generate_answer(inputs["question"])}

def correctness_evaluator(run, example):
    prediction = run.outputs["answer"]
    reference = example.outputs["answer"]
    score = 1.0 if reference.lower() in prediction.lower() else 0.0
    return {"key": "correctness", "score": score}

results = evaluate(
    my_model,
    data="qa-test-set",
    evaluators=[correctness_evaluator],
    experiment_prefix="v1"
)

print(f"Average score: {results.aggregate_metrics['correctness']}")

Built-in evaluators

内置评估器

python

from langsmith.evaluation import LangChainStringEvaluator

python

from langsmith.evaluation import LangChainStringEvaluator

Use LangChain evaluators

使用LangChain评估器

results = evaluate( my_model, data="qa-test-set", evaluators=[ LangChainStringEvaluator("qa"), LangChainStringEvaluator("cot_qa") ] )

undefined

results = evaluate( my_model, data="qa-test-set", evaluators=[ LangChainStringEvaluator("qa"), LangChainStringEvaluator("cot_qa") ] )

undefined

Advanced tracing

高级追踪

Tracing context

追踪上下文

python

from langsmith import tracing_context

with tracing_context(
    project_name="experiment-1",
    tags=["production", "v2"],
    metadata={"version": "2.0"}
):
    # All traceable calls inherit context
    result = my_function()

python

from langsmith import tracing_context

with tracing_context(
    project_name="experiment-1",
    tags=["production", "v2"],
    metadata={"version": "2.0"}
):
    # 所有可追踪调用都会继承此上下文
    result = my_function()

Manual runs

手动运行

python

from langsmith import trace

with trace(
    name="custom_operation",
    run_type="tool",
    inputs={"query": "test"}
) as run:
    result = do_something()
    run.end(outputs={"result": result})

python

from langsmith import trace

with trace(
    name="custom_operation",
    run_type="tool",
    inputs={"query": "test"}
) as run:
    result = do_something()
    run.end(outputs={"result": result})

Process inputs/outputs

处理输入/输出

python

def sanitize_inputs(inputs: dict) -> dict:
    if "password" in inputs:
        inputs["password"] = "***"
    return inputs

@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
    return authenticate(username, password)

python

def sanitize_inputs(inputs: dict) -> dict:
    if "password" in inputs:
        inputs["password"] = "***"
    return inputs

@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
    return authenticate(username, password)

Sampling

采样

python

import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"  # 10% sampling

python

import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"  # 10% 采样率

LangChain integration

LangChain集成

python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

Tracing enabled automatically with LANGSMITH_TRACING=true

设置LANGSMITH_TRACING=true后自动启用追踪

llm = ChatOpenAI(model="gpt-4o") prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("user", "{input}") ])

chain = prompt | llm

llm = ChatOpenAI(model="gpt-4o") prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("user", "{input}") ])

chain = prompt | llm

All chain runs traced automatically

所有链运行都会被自动追踪

response = chain.invoke({"input": "Hello!"})

undefined

response = chain.invoke({"input": "Hello!"})

undefined

Production monitoring

生产环境监控

Hub prompts

Hub提示词

python

from langsmith import Client

client = Client()

python

from langsmith import Client

client = Client()

Pull prompt from hub

从Hub拉取提示词

prompt = client.pull_prompt("my-org/qa-prompt")

Use in application

在应用中使用

result = prompt.invoke({"question": "What is AI?"})

undefined

result = prompt.invoke({"question": "What is AI?"})

undefined

Async client

异步客户端

python

from langsmith import AsyncClient

async def main():
    client = AsyncClient()

    runs = []
    async for run in client.list_runs(project_name="my-project"):
        runs.append(run)

    return runs

python

from langsmith import AsyncClient

async def main():
    client = AsyncClient()

    runs = []
    async for run in client.list_runs(project_name="my-project"):
        runs.append(run)

    return runs

Feedback collection

反馈收集

python

from langsmith import Client

client = Client()

python

from langsmith import Client

client = Client()

Collect user feedback

收集用户反馈

def record_feedback(run_id: str, user_rating: int, comment: str = None): client.create_feedback( run_id=run_id, key="user_rating", score=user_rating / 5.0, # Normalize to 0-1 comment=comment )

def record_feedback(run_id: str, user_rating: int, comment: str = None): client.create_feedback( run_id=run_id, key="user_rating", score=user_rating / 5.0, # 归一化至0-1 comment=comment )

In your application

在你的应用中调用

record_feedback(run_id="...", user_rating=4, comment="Helpful response")

undefined

record_feedback(run_id="...", user_rating=4, comment="Helpful response")

undefined

Testing integration

测试集成

Pytest integration

Pytest集成

python

from langsmith import test

@test
def test_qa_accuracy():
    result = my_qa_function("What is Python?")
    assert "programming" in result.lower()

python

from langsmith import test

@test
def test_qa_accuracy():
    result = my_qa_function("What is Python?")
    assert "programming" in result.lower()

Evaluation in CI/CD

CI/CD中的评估

python

from langsmith import evaluate

def run_evaluation():
    results = evaluate(
        my_model,
        data="regression-test-set",
        evaluators=[accuracy_evaluator]
    )

    # Fail CI if accuracy drops
    assert results.aggregate_metrics["accuracy"] >= 0.9, \
        f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"

python

from langsmith import evaluate

def run_evaluation():
    results = evaluate(
        my_model,
        data="regression-test-set",
        evaluators=[accuracy_evaluator]
    )

    # 如果准确率低于阈值则终止CI流程
    assert results.aggregate_metrics["accuracy"] >= 0.9, \
        f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"

Best practices

最佳实践

Structured naming - Use consistent project/run naming conventions
Add metadata - Include version, environment, user info
Sample in production - Use sampling rate to control volume
Create datasets - Build test sets from interesting production cases
Automate evaluation - Run evaluations in CI/CD pipelines
Monitor costs - Track token usage and latency trends

结构化命名 - 使用一致的项目/运行命名规范
添加元数据 - 包含版本、环境、用户信息
生产环境采样 - 使用采样率控制数据量
创建数据集 - 从生产环境中的典型案例构建测试集
自动化评估 - 在CI/CD流水线中运行评估
监控成本 - 追踪Token使用量和延迟趋势

Common issues

常见问题

Traces not appearing:

python

import os

追踪数据未显示：

python

import os

Ensure tracing is enabled

确保已启用追踪

os.environ["LANGSMITH_TRACING"] = "true" os.environ["LANGSMITH_API_KEY"] = "your-key"

Verify connection

验证连接

from langsmith import Client client = Client() print(client.list_projects()) # Should work


**High latency from tracing:**
```python

from langsmith import Client client = Client() print(client.list_projects()) # 应正常返回结果


**追踪导致高延迟：**
```python

Enable background batching (default)

启用后台批量处理（默认开启）

from langsmith import Client client = Client(auto_batch_tracing=True)

Or use sampling

或使用采样

os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"


**Large payloads:**
```python

os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"


**大负载数据：**
```python

Hide sensitive/large fields

隐藏敏感/大字段

@traceable( process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"} ) def my_function(data): pass

undefined

@traceable( process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"} ) def my_function(data): pass

undefined

References

参考资料

Advanced Usage - Custom evaluators, distributed tracing, hub prompts
Troubleshooting - Common issues, debugging, performance

高级用法 - 自定义评估器、分布式追踪、Hub提示词
故障排除 - 常见问题、调试、性能优化

Resources

资源

Documentation: https://docs.smith.langchain.com
Python SDK: https://github.com/langchain-ai/langsmith-sdk
Web App: https://smith.langchain.com
Version: 0.2.0+
License: MIT

文档：https://docs.smith.langchain.com
Python SDK：https://github.com/langchain-ai/langsmith-sdk
Web应用：https://smith.langchain.com
版本：0.2.0+
许可证：MIT