langsmith
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineselangsmith — LLM Observability, Evaluation & Prompt Management
langsmith — LLM可观测性、评估与提示词管理
Keyword:·langsmith·llm tracing·llm evaluation·@traceablelangsmith evaluateLangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.
关键词:·langsmith·llm tracing·llm evaluation·@traceablelangsmith evaluateLangSmith是一个不绑定框架的平台,用于开发、调试和部署LLM应用。 它提供端到端链路追踪、质量评估、提示词版本控制和生产环境监控能力。
When to use this skill
何时使用此工具
- Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
- Run offline evaluations with against a curated dataset
evaluate() - Set up production monitoring and online evaluation
- Manage and version prompts in the Prompt Hub
- Create datasets for regression testing and benchmarking
- Attach human or automated feedback to traces
- Use LLM-as-judge scoring with
openevals - Debug agent failures with end-to-end trace inspection
- 为任意LLM流水线添加链路追踪能力(支持OpenAI、Anthropic、LangChain、自定义模型)
- 使用对整理好的数据集运行离线评估
evaluate() - 搭建生产环境监控和在线评估体系
- 在Prompt Hub中管理提示词并做版本控制
- 创建用于回归测试和基准测试的数据集
- 为链路追踪数据添加人工或自动化反馈
- 通过使用LLM作为裁判进行评分
openevals - 通过端到端链路排查Agent运行故障
Instructions
使用说明
- Install SDK: (Python) or
pip install -U langsmith(TypeScript)npm install langsmith - Set environment variables: ,
LANGSMITH_TRACING=trueLANGSMITH_API_KEY=lsv2_... - Instrument with decorator or
@traceablewrapperwrap_openai() - View traces at smith.langchain.com
- For evaluation setup, see references/python-sdk.md
- For CLI commands, see references/cli.md
- Run to auto-configure environment
bash scripts/setup.sh
API Key: Get from smith.langchain.com → Settings → API Keys Docs: https://docs.langchain.com/langsmith
- 安装SDK:(Python版本)或
pip install -U langsmith(TypeScript版本)npm install langsmith - 设置环境变量:,
LANGSMITH_TRACING=trueLANGSMITH_API_KEY=lsv2_... - 使用装饰器或
@traceable封装器进行插桩wrap_openai() - 在smith.langchain.com查看链路数据
- 评估相关配置请参考references/python-sdk.md
- CLI命令相关请参考references/cli.md
- 运行自动配置环境
bash scripts/setup.sh
API Key获取地址:smith.langchain.com → 设置 → API Keys 官方文档:https://docs.langchain.com/langsmith
Quick Start
快速开始
Python
Python
bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""Automatically traced in LangSmith"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""Automatically traced in LangSmith"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")TypeScript
TypeScript
bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");Core Concepts
核心概念
| Concept | Description |
|---|---|
| Run | Individual operation (LLM call, tool call, retrieval). The fundamental unit. |
| Trace | All runs from a single user request, linked by |
| Thread | Multiple traces in a conversation, linked by |
| Project | Container grouping related traces (set via |
| Dataset | Collection of |
| Experiment | Result set from running |
| Feedback | Score/label attached to a run — numeric, categorical, or freeform. |
| 概念 | 描述 |
|---|---|
| Run | 单个操作(LLM调用、工具调用、检索),是最基础的单元。 |
| Trace | 单个用户请求产生的所有Run,通过 |
| Thread | 同一会话中的多条Trace,通过 |
| Project | 用于分组关联Trace的容器(通过 |
| Dataset | 用于离线评估的 |
| Experiment | 对某个数据集运行 |
| Feedback | 附加到Run上的评分/标签,支持数值、分类或自由文本格式。 |
Tracing
链路追踪
@traceable decorator (Python)
@traceable decorator (Python)
python
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)python
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)Selective tracing context
选择性链路追踪上下文
python
import langsmith as lspython
import langsmith as lsEnable tracing for this block only
Enable tracing for this block only
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
Disable tracing despite LANGSMITH_TRACING=true
Disable tracing despite LANGSMITH_TRACING=true
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
undefinedwith ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
undefinedWrap provider clients
封装服务商客户端
python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())Distributed tracing (microservices)
分布式链路追踪(微服务场景)
python
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # Pass to child service
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)python
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # Pass to child service
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)Evaluation
评估
Basic evaluation with evaluate()
使用evaluate()进行基础评估
python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())1. Create dataset
1. Create dataset
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
2. Target function
2. Target function
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
3. Evaluator
3. Evaluator
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
4. Run experiment
4. Run experiment
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
undefinedresults = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
undefinedLLM-as-judge with openevals
基于openevals实现LLM作为裁判
python
pip install -U openevalspython
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])python
pip install -U openevalspython
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])Evaluation types
评估类型
| Type | When to use |
|---|---|
| Code/Heuristic | Exact match, format checks, rule-based |
| LLM-as-judge | Subjective quality, safety, reference-free |
| Human | Annotation queues, pairwise comparison |
| Pairwise | Compare two app versions |
| Online | Production traces, real traffic |
| 类型 | 适用场景 |
|---|---|
| 代码/启发式规则 | 精确匹配、格式校验、基于规则的场景 |
| LLM-as-judge | 主观质量评估、安全性评估、无参考输出的场景 |
| 人工评估 | 标注队列、两两对比评估 |
| 两两对比 | 对比两个应用版本的效果 |
| 在线评估 | 生产环境链路、真实流量场景 |
Prompt Hub
Prompt Hub
python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()Push a prompt
Push a prompt
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
Pull and use
Pull and use
prompt = client.pull_prompt("my-assistant-prompt")
prompt = client.pull_prompt("my-assistant-prompt")
Pull specific version:
Pull specific version:
prompt = client.pull_prompt("my-assistant-prompt:abc123")
---prompt = client.pull_prompt("my-assistant-prompt:abc123")
---Feedback
反馈
python
from langsmith import Client
import uuid
client = Client()python
from langsmith import Client
import uuid
client = Client()Custom run ID for later feedback linking
Custom run ID for later feedback linking
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
Attach feedback
Attach feedback
client.create_feedback(
key="correctness",
score=1, # 0-1 numeric or categorical
run_id=my_run_id,
comment="Accurate and concise"
)
---client.create_feedback(
key="correctness",
score=1, # 0-1 numeric or categorical
run_id=my_run_id,
comment="Accurate and concise"
)
---References
参考资料
- Python SDK Reference — full Client API, @traceable signature, evaluate()
- TypeScript SDK Reference — Client, traceable, wrappers, evaluate
- CLI Reference — langsmith CLI commands
- Official Docs — langchain.com/langsmith
- SDK GitHub — MIT License, v0.7.17
- openevals — Prebuilt LLM evaluators
- Python SDK参考文档 — full Client API, @traceable signature, evaluate()
- TypeScript SDK参考文档 — Client, traceable, wrappers, evaluate
- CLI参考文档 — langsmith CLI commands
- 官方文档 — langchain.com/langsmith
- SDK GitHub仓库 — MIT License, v0.7.17
- openevals仓库 — Prebuilt LLM evaluators