langsmith
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineselangsmith — LLM Observability, Evaluation & Prompt Management
langsmith — LLM可观测性、评估与提示词管理
Keyword:·langsmith·llm tracing·llm evaluation·@traceablelangsmith evaluateLangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.
关键词:·langsmith·llm tracing·llm evaluation·@traceablelangsmith evaluateLangSmith是一个与框架无关的LLM应用开发、调试和部署平台。 它提供端到端追踪、质量评估、提示词版本控制和生产环境监控能力。
When to use this skill
何时使用该技能
- Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
- Run offline evaluations with against a curated dataset
evaluate() - Set up production monitoring and online evaluation
- Manage and version prompts in the Prompt Hub
- Create datasets for regression testing and benchmarking
- Attach human or automated feedback to traces
- Use LLM-as-judge scoring with
openevals - Debug agent failures with end-to-end trace inspection
- 为任意LLM流水线(OpenAI、Anthropic、LangChain、自定义模型)添加追踪功能
- 使用针对精选数据集运行离线评估
evaluate() - 设置生产环境监控与在线评估
- 在Prompt Hub中管理和版本化提示词
- 创建用于回归测试和基准测试的数据集
- 为追踪记录添加人工或自动化反馈
- 使用实现LLM-as-judge评分
openevals - 通过端到端追踪记录调试Agent故障
Instructions
操作步骤
- Install SDK: (Python) or
pip install -U langsmith(TypeScript)npm install langsmith - Set environment variables: ,
LANGSMITH_TRACING=trueLANGSMITH_API_KEY=lsv2_... - Instrument with decorator or
@traceablewrapperwrap_openai() - View traces at smith.langchain.com
- For evaluation setup, see references/python-sdk.md
- For CLI commands, see references/cli.md
- Run to auto-configure environment
bash scripts/setup.sh
API Key: Get from smith.langchain.com → Settings → API Keys Docs: https://docs.langchain.com/langsmith
- 安装SDK:(Python)或
pip install -U langsmith(TypeScript)npm install langsmith - 设置环境变量:,
LANGSMITH_TRACING=trueLANGSMITH_API_KEY=lsv2_... - 使用装饰器或
@traceable包装器进行埋点wrap_openai() - 在smith.langchain.com查看追踪记录
- 评估设置相关内容,请参考references/python-sdk.md
- CLI命令相关内容,请参考references/cli.md
- 运行自动配置环境
bash scripts/setup.sh
Quick Start
快速开始
Python
Python
bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""Automatically traced in LangSmith"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""自动在LangSmith中追踪"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")TypeScript
TypeScript
bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");Core Concepts
核心概念
| Concept | Description |
|---|---|
| Run | Individual operation (LLM call, tool call, retrieval). The fundamental unit. |
| Trace | All runs from a single user request, linked by |
| Thread | Multiple traces in a conversation, linked by |
| Project | Container grouping related traces (set via |
| Dataset | Collection of |
| Experiment | Result set from running |
| Feedback | Score/label attached to a run — numeric, categorical, or freeform. |
| 概念 | 描述 |
|---|---|
| Run(运行实例) | 单个操作(LLM调用、工具调用、检索操作),是最基础的单元。 |
| Trace(追踪链路) | 单个用户请求触发的所有运行实例,通过 |
| Thread(会话线程) | 一次对话中的多个追踪链路,通过 |
| Project(项目) | 关联追踪链路的容器(通过 |
| Dataset(数据集) | 用于离线评估的 |
| Experiment(实验) | 针对数据集运行 |
| Feedback(反馈) | 附加到运行实例的评分/标签——支持数值型、分类型或自由文本。 |
Tracing
追踪功能
@traceable decorator (Python)
@traceable装饰器(Python)
python
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)python
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)Selective tracing context
选择性追踪上下文
python
import langsmith as lspython
import langsmith as lsEnable tracing for this block only
仅为此代码块启用追踪
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
Disable tracing despite LANGSMITH_TRACING=true
即使LANGSMITH_TRACING=true也禁用追踪
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
undefinedwith ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
undefinedWrap provider clients
包装供应商客户端
python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # 所有调用自动被追踪
anthropic_client = wrap_anthropic(anthropic.Anthropic())Distributed tracing (microservices)
分布式追踪(微服务场景)
python
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # Pass to child service
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)python
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # 传递给子服务
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)Evaluation
评估功能
Basic evaluation with evaluate()
使用evaluate()进行基础评估
python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())1. Create dataset
1. 创建数据集
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
2. Target function
2. 目标函数
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
3. Evaluator
3. 评估器
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
4. Run experiment
4. 运行实验
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
undefinedresults = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
undefinedLLM-as-judge with openevals
使用openevals实现LLM-as-judge
python
pip install -U openevalspython
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])python
pip install -U openevalspython
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])Evaluation types
评估类型
| Type | When to use |
|---|---|
| Code/Heuristic | Exact match, format checks, rule-based |
| LLM-as-judge | Subjective quality, safety, reference-free |
| Human | Annotation queues, pairwise comparison |
| Pairwise | Compare two app versions |
| Online | Production traces, real traffic |
| 类型 | 使用场景 |
|---|---|
| 代码/启发式评估 | 精确匹配、格式检查、基于规则的评估 |
| LLM-as-judge评估 | 主观质量、安全性、无参考标准的评估 |
| 人工评估 | 标注队列、两两对比 |
| 两两对比评估 | 比较两个应用版本的差异 |
| 在线评估 | 生产环境追踪记录、真实流量评估 |
Prompt Hub
Prompt Hub(提示词中心)
python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()Push a prompt
推送提示词
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
Pull and use
拉取并使用
prompt = client.pull_prompt("my-assistant-prompt")
prompt = client.pull_prompt("my-assistant-prompt")
Pull specific version:
拉取指定版本:
prompt = client.pull_prompt("my-assistant-prompt:abc123")
---prompt = client.pull_prompt("my-assistant-prompt:abc123")
---Feedback
反馈功能
python
from langsmith import Client
import uuid
client = Client()python
from langsmith import Client
import uuid
client = Client()Custom run ID for later feedback linking
自定义运行ID用于后续关联反馈
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
Attach feedback
添加反馈
client.create_feedback(
key="correctness",
score=1, # 0-1 numeric or categorical
run_id=my_run_id,
comment="Accurate and concise"
)
---client.create_feedback(
key="correctness",
score=1, # 0-1数值型或分类型
run_id=my_run_id,
comment="Accurate and concise"
)
---References
参考资料
- Python SDK Reference — full Client API, @traceable signature, evaluate()
- TypeScript SDK Reference — Client, traceable, wrappers, evaluate
- CLI Reference — langsmith CLI commands
- Official Docs — langchain.com/langsmith
- SDK GitHub — MIT License, v0.7.17
- openevals — Prebuilt LLM evaluators
- Python SDK参考文档 — 完整Client API、@traceable签名、evaluate()方法说明
- TypeScript SDK参考文档 — Client、traceable、包装器、evaluate方法说明
- CLI参考文档 — langsmith CLI命令说明
- 官方文档 — langchain.com/langsmith
- SDK GitHub仓库 — MIT协议,版本v0.7.17
- openevals — 预构建的LLM评估器集合