langsmith-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLangSmith - LLM Observability Platform
LangSmith - LLM可观测性平台
Development platform for debugging, evaluating, and monitoring language models and AI applications.
用于调试、评估和监控大语言模型(LLM)及AI应用的开发平台。
When to use LangSmith
何时使用LangSmith
Use LangSmith when:
- Debugging LLM application issues (prompts, chains, agents)
- Evaluating model outputs systematically against datasets
- Monitoring production LLM systems
- Building regression testing for AI features
- Analyzing latency, token usage, and costs
- Collaborating on prompt engineering
Key features:
- Tracing: Capture inputs, outputs, latency for all LLM calls
- Evaluation: Systematic testing with built-in and custom evaluators
- Datasets: Create test sets from production traces or manually
- Monitoring: Track metrics, errors, and costs in production
- Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex
Use alternatives instead:
- Weights & Biases: Deep learning experiment tracking, model training
- MLflow: General ML lifecycle, model registry focus
- Arize/WhyLabs: ML monitoring, data drift detection
在以下场景使用LangSmith:
- 调试LLM应用问题(提示词、链、Agent)
- 针对数据集系统化评估模型输出
- 监控生产环境中的LLM系统
- 为AI功能构建回归测试
- 分析延迟、Token使用量及成本
- 协作进行提示词工程
核心功能:
- 追踪(Tracing):捕获所有LLM调用的输入、输出和延迟数据
- 评估(Evaluation):使用内置及自定义评估器进行系统化测试
- 数据集(Datasets):从生产追踪数据或手动创建测试集
- 监控(Monitoring):在生产环境中追踪指标、错误及成本
- 集成(Integrations):支持OpenAI、Anthropic、LangChain、LlamaIndex
可选择替代工具的场景:
- Weights & Biases:深度学习实验追踪、模型训练
- MLflow:通用机器学习生命周期管理、模型注册表
- Arize/WhyLabs:机器学习监控、数据漂移检测
Quick start
快速开始
Installation
安装
bash
pip install langsmithbash
pip install langsmithSet environment variables
设置环境变量
export LANGSMITH_API_KEY="your-api-key"
export LANGSMITH_TRACING=true
undefinedexport LANGSMITH_API_KEY="your-api-key"
export LANGSMITH_TRACING=true
undefinedBasic tracing with @traceable
使用@traceable进行基础追踪
python
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable
def generate_response(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentpython
from langsmith import traceable
from openai import OpenAI
client = OpenAI()
@traceable
def generate_response(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentAutomatically traced to LangSmith
自动追踪至LangSmith
result = generate_response("What is machine learning?")
undefinedresult = generate_response("What is machine learning?")
undefinedOpenAI wrapper (automatic tracing)
OpenAI包装器(自动追踪)
python
from langsmith.wrappers import wrap_openai
from openai import OpenAIpython
from langsmith.wrappers import wrap_openai
from openai import OpenAIWrap client for automatic tracing
包装客户端以实现自动追踪
client = wrap_openai(OpenAI())
client = wrap_openai(OpenAI())
All calls automatically traced
所有调用都会被自动追踪
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedresponse = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedCore concepts
核心概念
Runs and traces
运行与追踪
A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow.
python
from langsmith import traceable
@traceable(run_type="chain")
def process_query(query: str) -> str:
# Parent run
context = retrieve_context(query) # Child run
response = generate_answer(query, context) # Child run
return response
@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
return vector_store.search(query)
@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
return llm.invoke(f"Context: {context}\n\nQuestion: {query}")运行(Run)是单个执行单元(LLM调用、链、工具)。运行构成层级化的追踪(Traces),展示完整的执行流程。
python
from langsmith import traceable
@traceable(run_type="chain")
def process_query(query: str) -> str:
# 父级运行
context = retrieve_context(query) # 子级运行
response = generate_answer(query, context) # 子级运行
return response
@traceable(run_type="retriever")
def retrieve_context(query: str) -> list:
return vector_store.search(query)
@traceable(run_type="llm")
def generate_answer(query: str, context: list) -> str:
return llm.invoke(f"Context: {context}\n\nQuestion: {query}")Projects
项目
Projects organize related runs. Set via environment or code:
python
import os
os.environ["LANGSMITH_PROJECT"] = "my-project"项目用于组织相关的运行记录。可通过环境变量或代码设置:
python
import os
os.environ["LANGSMITH_PROJECT"] = "my-project"Or per-function
或为单个函数设置
@traceable(project_name="my-project")
def my_function():
pass
undefined@traceable(project_name="my-project")
def my_function():
pass
undefinedClient API
客户端API
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()List runs
列出运行记录
runs = list(client.list_runs(
project_name="my-project",
filter='eq(status, "success")',
limit=100
))
runs = list(client.list_runs(
project_name="my-project",
filter='eq(status, "success")',
limit=100
))
Get run details
获取运行详情
run = client.read_run(run_id="...")
run = client.read_run(run_id="...")
Create feedback
创建反馈
client.create_feedback(
run_id="...",
key="correctness",
score=0.9,
comment="Good answer"
)
undefinedclient.create_feedback(
run_id="...",
key="correctness",
score=0.9,
comment="Good answer"
)
undefinedDatasets and evaluation
数据集与评估
Create dataset
创建数据集
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Create dataset
创建数据集
dataset = client.create_dataset("qa-test-set", description="QA evaluation")
dataset = client.create_dataset("qa-test-set", description="QA evaluation")
Add examples
添加示例
client.create_examples(
inputs=[
{"question": "What is Python?"},
{"question": "What is ML?"}
],
outputs=[
{"answer": "A programming language"},
{"answer": "Machine learning"}
],
dataset_id=dataset.id
)
undefinedclient.create_examples(
inputs=[
{"question": "What is Python?"},
{"question": "What is ML?"}
],
outputs=[
{"answer": "A programming language"},
{"answer": "Machine learning"}
],
dataset_id=dataset.id
)
undefinedRun evaluation
运行评估
python
from langsmith import evaluate
def my_model(inputs: dict) -> dict:
# Your model logic
return {"answer": generate_answer(inputs["question"])}
def correctness_evaluator(run, example):
prediction = run.outputs["answer"]
reference = example.outputs["answer"]
score = 1.0 if reference.lower() in prediction.lower() else 0.0
return {"key": "correctness", "score": score}
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[correctness_evaluator],
experiment_prefix="v1"
)
print(f"Average score: {results.aggregate_metrics['correctness']}")python
from langsmith import evaluate
def my_model(inputs: dict) -> dict:
# 你的模型逻辑
return {"answer": generate_answer(inputs["question"])}
def correctness_evaluator(run, example):
prediction = run.outputs["answer"]
reference = example.outputs["answer"]
score = 1.0 if reference.lower() in prediction.lower() else 0.0
return {"key": "correctness", "score": score}
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[correctness_evaluator],
experiment_prefix="v1"
)
print(f"Average score: {results.aggregate_metrics['correctness']}")Built-in evaluators
内置评估器
python
from langsmith.evaluation import LangChainStringEvaluatorpython
from langsmith.evaluation import LangChainStringEvaluatorUse LangChain evaluators
使用LangChain评估器
results = evaluate(
my_model,
data="qa-test-set",
evaluators=[
LangChainStringEvaluator("qa"),
LangChainStringEvaluator("cot_qa")
]
)
undefinedresults = evaluate(
my_model,
data="qa-test-set",
evaluators=[
LangChainStringEvaluator("qa"),
LangChainStringEvaluator("cot_qa")
]
)
undefinedAdvanced tracing
高级追踪
Tracing context
追踪上下文
python
from langsmith import tracing_context
with tracing_context(
project_name="experiment-1",
tags=["production", "v2"],
metadata={"version": "2.0"}
):
# All traceable calls inherit context
result = my_function()python
from langsmith import tracing_context
with tracing_context(
project_name="experiment-1",
tags=["production", "v2"],
metadata={"version": "2.0"}
):
# 所有可追踪调用都会继承此上下文
result = my_function()Manual runs
手动运行
python
from langsmith import trace
with trace(
name="custom_operation",
run_type="tool",
inputs={"query": "test"}
) as run:
result = do_something()
run.end(outputs={"result": result})python
from langsmith import trace
with trace(
name="custom_operation",
run_type="tool",
inputs={"query": "test"}
) as run:
result = do_something()
run.end(outputs={"result": result})Process inputs/outputs
处理输入/输出
python
def sanitize_inputs(inputs: dict) -> dict:
if "password" in inputs:
inputs["password"] = "***"
return inputs
@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
return authenticate(username, password)python
def sanitize_inputs(inputs: dict) -> dict:
if "password" in inputs:
inputs["password"] = "***"
return inputs
@traceable(process_inputs=sanitize_inputs)
def login(username: str, password: str):
return authenticate(username, password)Sampling
采样
python
import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10% samplingpython
import os
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10% 采样率LangChain integration
LangChain集成
python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplatepython
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplateTracing enabled automatically with LANGSMITH_TRACING=true
设置LANGSMITH_TRACING=true后自动启用追踪
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
chain = prompt | llm
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}")
])
chain = prompt | llm
All chain runs traced automatically
所有链运行都会被自动追踪
response = chain.invoke({"input": "Hello!"})
undefinedresponse = chain.invoke({"input": "Hello!"})
undefinedProduction monitoring
生产环境监控
Hub prompts
Hub提示词
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Pull prompt from hub
从Hub拉取提示词
prompt = client.pull_prompt("my-org/qa-prompt")
prompt = client.pull_prompt("my-org/qa-prompt")
Use in application
在应用中使用
result = prompt.invoke({"question": "What is AI?"})
undefinedresult = prompt.invoke({"question": "What is AI?"})
undefinedAsync client
异步客户端
python
from langsmith import AsyncClient
async def main():
client = AsyncClient()
runs = []
async for run in client.list_runs(project_name="my-project"):
runs.append(run)
return runspython
from langsmith import AsyncClient
async def main():
client = AsyncClient()
runs = []
async for run in client.list_runs(project_name="my-project"):
runs.append(run)
return runsFeedback collection
反馈收集
python
from langsmith import Client
client = Client()python
from langsmith import Client
client = Client()Collect user feedback
收集用户反馈
def record_feedback(run_id: str, user_rating: int, comment: str = None):
client.create_feedback(
run_id=run_id,
key="user_rating",
score=user_rating / 5.0, # Normalize to 0-1
comment=comment
)
def record_feedback(run_id: str, user_rating: int, comment: str = None):
client.create_feedback(
run_id=run_id,
key="user_rating",
score=user_rating / 5.0, # 归一化至0-1
comment=comment
)
In your application
在你的应用中调用
record_feedback(run_id="...", user_rating=4, comment="Helpful response")
undefinedrecord_feedback(run_id="...", user_rating=4, comment="Helpful response")
undefinedTesting integration
测试集成
Pytest integration
Pytest集成
python
from langsmith import test
@test
def test_qa_accuracy():
result = my_qa_function("What is Python?")
assert "programming" in result.lower()python
from langsmith import test
@test
def test_qa_accuracy():
result = my_qa_function("What is Python?")
assert "programming" in result.lower()Evaluation in CI/CD
CI/CD中的评估
python
from langsmith import evaluate
def run_evaluation():
results = evaluate(
my_model,
data="regression-test-set",
evaluators=[accuracy_evaluator]
)
# Fail CI if accuracy drops
assert results.aggregate_metrics["accuracy"] >= 0.9, \
f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"python
from langsmith import evaluate
def run_evaluation():
results = evaluate(
my_model,
data="regression-test-set",
evaluators=[accuracy_evaluator]
)
# 如果准确率低于阈值则终止CI流程
assert results.aggregate_metrics["accuracy"] >= 0.9, \
f"Accuracy {results.aggregate_metrics['accuracy']} below threshold"Best practices
最佳实践
- Structured naming - Use consistent project/run naming conventions
- Add metadata - Include version, environment, user info
- Sample in production - Use sampling rate to control volume
- Create datasets - Build test sets from interesting production cases
- Automate evaluation - Run evaluations in CI/CD pipelines
- Monitor costs - Track token usage and latency trends
- 结构化命名 - 使用一致的项目/运行命名规范
- 添加元数据 - 包含版本、环境、用户信息
- 生产环境采样 - 使用采样率控制数据量
- 创建数据集 - 从生产环境中的典型案例构建测试集
- 自动化评估 - 在CI/CD流水线中运行评估
- 监控成本 - 追踪Token使用量和延迟趋势
Common issues
常见问题
Traces not appearing:
python
import os追踪数据未显示:
python
import osEnsure tracing is enabled
确保已启用追踪
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
Verify connection
验证连接
from langsmith import Client
client = Client()
print(client.list_projects()) # Should work
**High latency from tracing:**
```pythonfrom langsmith import Client
client = Client()
print(client.list_projects()) # 应正常返回结果
**追踪导致高延迟:**
```pythonEnable background batching (default)
启用后台批量处理(默认开启)
from langsmith import Client
client = Client(auto_batch_tracing=True)
from langsmith import Client
client = Client(auto_batch_tracing=True)
Or use sampling
或使用采样
os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
**Large payloads:**
```pythonos.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1"
**大负载数据:**
```pythonHide sensitive/large fields
隐藏敏感/大字段
@traceable(
process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
)
def my_function(data):
pass
undefined@traceable(
process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"}
)
def my_function(data):
pass
undefinedReferences
参考资料
- Advanced Usage - Custom evaluators, distributed tracing, hub prompts
- Troubleshooting - Common issues, debugging, performance
- 高级用法 - 自定义评估器、分布式追踪、Hub提示词
- 故障排除 - 常见问题、调试、性能优化
Resources
资源
- Documentation: https://docs.smith.langchain.com
- Python SDK: https://github.com/langchain-ai/langsmith-sdk
- Web App: https://smith.langchain.com
- Version: 0.2.0+
- License: MIT
- 文档:https://docs.smith.langchain.com
- Python SDK:https://github.com/langchain-ai/langsmith-sdk
- Web应用:https://smith.langchain.com
- 版本:0.2.0+
- 许可证:MIT