langsmith-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLangSmith - LLM Observability Platform
LangSmith - LLM可观测性平台
Development platform for debugging, evaluating, and monitoring language models and AI applications.
用于调试、评估和监控大语言模型与AI应用的开发平台。
When to use LangSmith
什么时候使用LangSmith
Use LangSmith when:
- Debugging LLM application issues (prompts, chains, agents)
- Evaluating model outputs systematically against datasets
- Monitoring production LLM systems
- Building regression testing for AI features
- Analyzing latency, token usage, and costs
- Collaborating on prompt engineering
Key features:
- Tracing: Capture inputs, outputs, latency for all LLM calls
- Evaluation: Systematic testing with built-in and custom evaluators
- Datasets: Create test sets from production traces or manually
- Monitoring: Track metrics, errors, and costs in production
- Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex
Use alternatives instead:
- Weights & Biases: Deep learning experiment tracking, model training
- MLflow: General ML lifecycle, model registry focus
- Arize/WhyLabs: ML monitoring, data drift detection
以下场景可使用LangSmith:
- 调试LLM应用问题(提示词、链、Agent)
- 基于数据集对模型输出进行系统化评估
- 监控生产环境的LLM系统
- 为AI功能搭建回归测试体系
- 分析延迟、Token使用量和成本
- 多人协作进行提示词工程
核心功能:
- 链路追踪: 捕获所有LLM调用的输入、输出、延迟数据
- 评估能力: 支持内置和自定义评估器的系统化测试
- 数据集: 可从生产链路记录或手动创建测试集
- 监控能力: 追踪生产环境的指标、错误和成本
- 集成能力: 支持对接OpenAI、Anthropic、LangChain、LlamaIndex
可选替代方案:
- Weights & Biases: 深度学习实验追踪、模型训练相关场景
- MLflow: 通用ML生命周期管理,侧重模型注册的场景
- Arize/WhyLabs: ML监控、数据漂移检测相关场景
Quick start
快速开始
Installation
安装
bash
pip install langsmithbash
pip install langsmithSet environment variables
设置环境变量
export LANGSMITH_API_KEY="your-api-key"
export LANGSMITH_TRACING=true
undefinedexport LANGSMITH_API_KEY="your-api-key"
export LANGSMITH_TRACING=true
undefinedBasic tracing with @traceable
使用@traceable实现基础链路追踪
python
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable
def generate_response(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentpython
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable
def generate_response(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentAutomatically traced to LangSmith
自动将链路记录上报到LangSmith
result = generate_response("What is machine learning?")
undefinedresult = generate_response("What is machine learning?")
undefinedOpenAI wrapper (automatic tracing)
OpenAI封装(自动链路追踪)
python
from langsmith.wrappers import wrap_openai
from openai import OpenAIpython
from langsmith.wrappers import wrap_openai
from openai import OpenAIWrap client for automatic tracing
封装客户端实现自动链路追踪
client = wrap_openai(OpenAI())
client = wrap_openai(OpenAI())
All calls automatically traced
所有调用都会自动记录链路
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedresponse = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedCore concepts
核心概念
Runs and traces
运行与链路
A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow.
python
from langsmith import traceable
@traceable(run_type="chain")
def process_query(query: str) -> str:
# Parent run
context = retrieve_context(query) # Child run
response = generate_answer(query, context) # Child run
return response
@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
return vector_store.search(query)
@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
return llm.invoke(f"Context: {context}\n\nQuestion: {query}")运行(run)是单个执行单元(LLM调用、链、工具)。多个运行会形成层级化的链路(trace),展示完整的执行流。
python
from langsmith import traceable
@traceable(run_type="chain")
def process_query(query: str) -> str:
# 父运行
context = retrieve_context(query) # 子运行
response = generate_answer(query, context) # 子运行
return response
@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
return vector_store.search(query)
@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
return llm.invoke(f"Context: {context}\n\nQuestion: {query}")Projects
项目
Projects organize related runs. Set via environment or code:
python
import os
os.environ["LANGSMITH_PROJECT"] = "my-project"项目用于归类相关的运行记录,可通过环境变量或代码设置:
python
import os
os.environ["LANGSMITH_PROJECT"] = "my-project"Or per-function
或者针对单个函数设置
@traceable(project_name="my-project")
def my_function():
pass
undefined@traceable(project_name="my-project")
def my_function():
pass
undefinedClient API
客户端API
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()List runs
列出运行记录
runs = list(client.list_runs(
project_name="my-project",
filter='eq(status, "success")',
limit=100
))
runs = list(client.list_runs(
project_name="my-project",
filter='eq(status, "success")',
limit=100
))
Get run details
获取运行详情
run = client.read_run(run_id="...")
run = client.read_run(run_id="...")
Create feedback
创建反馈
client.create_feedback(
run_id="...",
key="correctness",
score=0.9,
comment="Good answer"
)
undefinedclient.create_feedback(
run_id="...",
key="correctness",
score=0.9,
comment="Good answer"
)
undefinedDatasets and evaluation
数据集与评估
Create dataset
创建数据集
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Create dataset
创建数据集
dataset = client.create_dataset("qa-test-set", description="QA evaluation")
dataset = client.create_dataset("qa-test-set", description="QA evaluation")
Add examples
添加示例
client.create_examples(
inputs=[
{"question": "What is Python?"},
{"question": "What is ML?"}
],
outputs=[
{"answer": "A programming language"},
{"answer": "Machine learning"}
],
dataset_id=dataset.id
)
undefinedclient.create_examples(
inputs=[
{"question": "What is Python?"},
{"question": "What is ML?"}
],
outputs=[
{"answer": "A programming language"},
{"answer": "Machine learning"}
],
dataset_id=dataset.id
)
undefinedRun evaluation
执行评估
python
from langsmith import evaluate
def my_model(inputs: dict) -> dict:
# Your model logic
return {"answer": generate_answer(inputs["question"])}
def correctness_evaluator(run, example):
prediction = run.outputs["answer"]
reference = example.outputs["answer"]
score = 1.0 if reference.lower() in prediction.lower() else 0.0
return {"key": "correctness", "score": score}
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[correctness_evaluator],
experiment_prefix="v1"
)
print(f"Average score: {results.aggregate_metrics['correctness']}")python
from langsmith import evaluate
def my_model(inputs: dict) -> dict:
# 你的模型逻辑
return {"answer": generate_answer(inputs["question"])}
def correctness_evaluator(run, example):
prediction = run.outputs["answer"]
reference = example.outputs["answer"]
score = 1.0 if reference.lower() in prediction.lower() else 0.0
return {"key": "correctness", "score": score}
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[correctness_evaluator],
experiment_prefix="v1"
)
print(f"Average score: {results.aggregate_metrics['correctness']}")Built-in evaluators
内置评估器
python
from langsmith.evaluation import LangChainStringEvaluatorpython
from langsmith.evaluation import LangChainStringEvaluatorUse LangChain evaluators
使用LangChain评估器
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[
LangChainStringEvaluator("qa"),
LangChainStringEvaluator("cot_qa")
]
)
undefinedresults = evaluate(
my_model,
data="qa-test-set",
evaluators=[
LangChainStringEvaluator("qa"),
LangChainStringEvaluator("cot_qa")
]
)
undefinedAdvanced tracing
高级链路追踪
Tracing context
追踪上下文
python
from langsmith import tracing_context
with tracing_context(
project_name="experiment-1",
tags=["production", "v2"],
metadata={"version": "2.0"}
):
# All traceable calls inherit context
result = my_function()python
from langsmith import tracing_context
with tracing_context(
project_name="experiment-1",
tags=["production", "v2"],
metadata={"version": "2.0"}
):
# 所有可追踪调用都会继承上下文配置
result = my_function()Manual runs
手动创建运行记录
python
from langsmith import trace
with trace(
name="custom_operation",
run_type="tool",
inputs={"query": "test"}
) as run:
result = do_something()
run.end(outputs={"result": result})python
from langsmith import trace
with trace(
name="custom_operation",
run_type="tool",
inputs={"query": "test"}
) as run:
result = do_something()
run.end(outputs={"result": result})Process inputs/outputs
处理输入/输出
python
def sanitize_inputs(inputs: dict) -> dict:
if "password" in inputs:
inputs["password"] = "***"
return inputs
@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
return authenticate(username, password)python
def sanitize_inputs(inputs: dict) -> dict:
if "password" in inputs:
inputs["password"] = "***"
return inputs
@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
return authenticate(username, password)Sampling
采样
python
import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10% samplingpython
import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10%采样率LangChain integration
LangChain集成
python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplatepython
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplateTracing enabled automatically with LANGSMITH_TRACING=true
只要设置LANGSMITH_TRACING=true就会自动开启链路追踪
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
chain = prompt | llm
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
chain = prompt | llm
All chain runs traced automatically
所有链的运行都会自动记录链路
response = chain.invoke({"input": "Hello!"})
undefinedresponse = chain.invoke({"input": "Hello!"})
undefinedProduction monitoring
生产监控
Hub prompts
Hub提示词
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Pull prompt from hub
从Hub拉取提示词
prompt = client.pull_prompt("my-org/qa-prompt")
prompt = client.pull_prompt("my-org/qa-prompt")
Use in application
在应用中使用
result = prompt.invoke({"question": "What is AI?"})
undefinedresult = prompt.invoke({"question": "What is AI?"})
undefinedAsync client
异步客户端
python
from langsmith import AsyncClient
async def main():
client = AsyncClient()
runs = []
async for run in client.list_runs(project_name="my-project"):
runs.append(run)
return runspython
from langsmith import AsyncClient
async def main():
client = AsyncClient()
runs = []
async for run in client.list_runs(project_name="my-project"):
runs.append(run)
return runsFeedback collection
反馈收集
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Collect user feedback
收集用户反馈
def record_feedback(run_id: str, user_rating: int, comment: str = None):
client.create_feedback(
run_id=run_id,
key="user_rating",
score=user_rating / 5.0, # Normalize to 0-1
comment=comment
)
def record_feedback(run_id: str, user_rating: int, comment: str = None):
client.create_feedback(
run_id=run_id,
key="user_rating",
score=user_rating / 5.0, # 归一化到0-1区间
comment=comment
)
In your application
在你的应用中调用
record_feedback(run_id="...", user_rating=4, comment="Helpful response")
undefinedrecord_feedback(run_id="...", user_rating=4, comment="Helpful response")
undefinedTesting integration
测试集成
Pytest integration
Pytest集成
python
from langsmith import test
@test
def test_qa_accuracy():
result = my_qa_function("What is Python?")
assert "programming" in result.lower()python
from langsmith import test
@test
def test_qa_accuracy():
result = my_qa_function("What is Python?")
assert "programming" in result.lower()Evaluation in CI/CD
CI/CD中的评估
python
from langsmith import evaluate
def run_evaluation():
results = evaluate(
my_model,
data="regression-test-set",
evaluators=[accuracy_evaluator]
)
# Fail CI if accuracy drops
assert results.aggregate_metrics["accuracy"] >= 0.9, \
f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"python
from langsmith import evaluate
def run_evaluation():
results = evaluate(
my_model,
data="regression-test-set",
evaluators=[accuracy_evaluator]
)
# 如果准确率下降则CI失败
assert results.aggregate_metrics["accuracy"] >= 0.9, \
f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"Best practices
最佳实践
- Structured naming - Use consistent project/run naming conventions
- Add metadata - Include version, environment, user info
- Sample in production - Use sampling rate to control volume
- Create datasets - Build test sets from interesting production cases
- Automate evaluation - Run evaluations in CI/CD pipelines
- Monitor costs - Track token usage and latency trends
- 结构化命名 - 使用统一的项目/运行命名规范
- 添加元数据 - 包含版本、环境、用户信息
- 生产环境使用采样 - 通过采样率控制上报数据量
- 创建数据集 - 从有价值的生产案例中构建测试集
- 自动化评估 - 在CI/CD流水线中运行评估
- 成本监控 - 追踪Token使用量和延迟趋势
Common issues
常见问题
Traces not appearing:
python
import os链路记录不展示:
python
import osEnsure tracing is enabled
确认已经开启链路追踪
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
Verify connection
验证连接
from langsmith import Client
client = Client()
print(client.list_projects()) # Should work
**High latency from tracing:**
```pythonfrom langsmith import Client
client = Client()
print(client.list_projects()) # 应该正常返回结果
**链路追踪导致延迟过高:**
```pythonEnable background batching (default)
开启后台批量上报(默认开启)
from langsmith import Client
client = Client(auto_batch_tracing=True)
from langsmith import Client
client = Client(auto_batch_tracing=True)
Or use sampling
或者使用采样
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
**Large payloads:**
```pythonos.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
**负载过大:**
```pythonHide sensitive/large fields
隐藏敏感/大体积字段
@traceable(
process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
)
def my_function(data):
pass
undefined@traceable(
process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
)
def my_function(data):
pass
undefinedReferences
参考资料
- Advanced Usage - Custom evaluators, distributed tracing, hub prompts
- Troubleshooting - Common issues, debugging, performance
- 高级用法 - 自定义评估器、分布式链路追踪、Hub提示词
- 问题排查 - 常见问题、调试、性能优化
Resources
资源
- Documentation: https://docs.smith.langchain.com
- Python SDK: https://github.com/langchain-ai/langsmith-sdk
- Web App: https://smith.langchain.com
- Version: 0.2.0+
- License: MIT
- 官方文档: https://docs.smith.langchain.com
- Python SDK: https://github.com/langchain-ai/langsmith-sdk
- Web应用: https://smith.langchain.com
- 版本: 0.2.0+
- 许可证: MIT