langsmith

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

langsmith — LLM Observability, Evaluation & Prompt Management

langsmith — LLM可观测性、评估与提示词管理

Keyword:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.

关键词:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith是一个与框架无关的LLM应用开发、调试和部署平台。它提供端到端追踪、质量评估、提示词版本控制和生产环境监控能力。

When to use this skill

何时使用该技能

Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
Run offline evaluations with
```
evaluate()
```
against a curated dataset
Set up production monitoring and online evaluation
Manage and version prompts in the Prompt Hub
Create datasets for regression testing and benchmarking
Attach human or automated feedback to traces
Use LLM-as-judge scoring with
```
openevals
```
Debug agent failures with end-to-end trace inspection

为任意LLM流水线（OpenAI、Anthropic、LangChain、自定义模型）添加追踪功能
使用
```
evaluate()
```
针对精选数据集运行离线评估
设置生产环境监控与在线评估
在Prompt Hub中管理和版本化提示词
创建用于回归测试和基准测试的数据集
为追踪记录添加人工或自动化反馈
使用
```
openevals
```
实现LLM-as-judge评分
通过端到端追踪记录调试Agent故障

Instructions

操作步骤

Install SDK:

pip install -U langsmith

(Python) or

npm install langsmith

(TypeScript)

Set environment variables:

LANGSMITH_TRACING=true

LANGSMITH_API_KEY=lsv2_...

Instrument with
```
@traceable
```
decorator or
```
wrap_openai()
```
wrapper
View traces at smith.langchain.com
For evaluation setup, see references/python-sdk.md
For CLI commands, see references/cli.md
Run
```
bash scripts/setup.sh
```
to auto-configure environment

API Key: Get from smith.langchain.com → Settings → API Keys Docs: https://docs.langchain.com/langsmith

安装SDK：

pip install -U langsmith

（Python）或

npm install langsmith

（TypeScript）

设置环境变量：

LANGSMITH_TRACING=true

，

LANGSMITH_API_KEY=lsv2_...

使用
```
@traceable
```
装饰器或
```
wrap_openai()
```
包装器进行埋点
在smith.langchain.com查看追踪记录
评估设置相关内容，请参考references/python-sdk.md
CLI命令相关内容，请参考references/cli.md
运行
```
bash scripts/setup.sh
```
自动配置环境

API密钥: 从smith.langchain.com → Settings → API Keys获取文档: https://docs.langchain.com/langsmith

Quick Start

快速开始

Python

bash

pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."

python

from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """Automatically traced in LangSmith"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

bash

pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."

python

from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """自动在LangSmith中追踪"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

TypeScript

bash

npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."

typescript

import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

bash

npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."

typescript

import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

Core Concepts

核心概念

Concept	Description
Run	Individual operation (LLM call, tool call, retrieval). The fundamental unit.
Trace	All runs from a single user request, linked by `trace_id` .
Thread	Multiple traces in a conversation, linked by `session_id` or `thread_id` .
Project	Container grouping related traces (set via `LANGSMITH_PROJECT` ).
Dataset	Collection of `{inputs, outputs}` examples for offline evaluation.
Experiment	Result set from running `evaluate()` against a dataset.
Feedback	Score/label attached to a run — numeric, categorical, or freeform.

概念	描述
Run（运行实例）	单个操作（LLM调用、工具调用、检索操作），是最基础的单元。
Trace（追踪链路）	单个用户请求触发的所有运行实例，通过 `trace_id` 关联。
Thread（会话线程）	一次对话中的多个追踪链路，通过 `session_id` 或 `thread_id` 关联。
Project（项目）	关联追踪链路的容器（通过 `LANGSMITH_PROJECT` 设置）。
Dataset（数据集）	用于离线评估的 `{输入, 输出}` 示例集合。
Experiment（实验）	针对数据集运行 `evaluate()` 后得到的结果集。
Feedback（反馈）	附加到运行实例的评分/标签——支持数值型、分类型或自由文本。

Tracing

追踪功能

@traceable decorator (Python)

@traceable装饰器（Python）

python

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

python

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

Selective tracing context

选择性追踪上下文

python

import langsmith as ls

python

import langsmith as ls

Enable tracing for this block only

仅为此代码块启用追踪

with ls.tracing_context(enabled=True, project_name="debug"): result = chain.invoke({"input": "..."})

Disable tracing despite LANGSMITH_TRACING=true

即使LANGSMITH_TRACING=true也禁用追踪

with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})

undefined

with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})

undefined

Wrap provider clients

包装供应商客户端

python

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())

python

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # 所有调用自动被追踪
anthropic_client = wrap_anthropic(anthropic.Anthropic())

Distributed tracing (microservices)

分布式追踪（微服务场景）

python

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # Pass to child service
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

python

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # 传递给子服务
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

Evaluation

评估功能

Basic evaluation with evaluate()

使用evaluate()进行基础评估

python

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

python

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

1. Create dataset

1. 创建数据集

dataset = client.create_dataset("Geography QA") client.create_examples( dataset_id=dataset.id, examples=[ {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}}, {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}}, ] )

2. Target function

2. 目标函数

def target(inputs: dict) -> dict: res = oai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": inputs["q"]}] ) return {"a": res.choices[0].message.content}

3. Evaluator

3. 评估器

def exact_match(inputs, outputs, reference_outputs): return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()

4. Run experiment

4. 运行实验

results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )

undefined

results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )

undefined

LLM-as-judge with openevals

使用openevals实现LLM-as-judge

python

pip install -U openevals

python

from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

python

pip install -U openevals

python

from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

Evaluation types

评估类型

Type	When to use
Code/Heuristic	Exact match, format checks, rule-based
LLM-as-judge	Subjective quality, safety, reference-free
Human	Annotation queues, pairwise comparison
Pairwise	Compare two app versions
Online	Production traces, real traffic

类型	使用场景
代码/启发式评估	精确匹配、格式检查、基于规则的评估
LLM-as-judge评估	主观质量、安全性、无参考标准的评估
人工评估	标注队列、两两对比
两两对比评估	比较两个应用版本的差异
在线评估	生产环境追踪记录、真实流量评估

Prompt Hub

Prompt Hub（提示词中心）

python

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

python

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

Push a prompt

推送提示词

prompt = ChatPromptTemplate([ ("system", "You are a helpful assistant."), ("user", "{question}"), ]) client.push_prompt("my-assistant-prompt", object=prompt)

Pull and use

拉取并使用

prompt = client.pull_prompt("my-assistant-prompt")

Pull specific version:

拉取指定版本:

prompt = client.pull_prompt("my-assistant-prompt:abc123")

---

prompt = client.pull_prompt("my-assistant-prompt:abc123")

---

Feedback

反馈功能

python

from langsmith import Client
import uuid

client = Client()

python

from langsmith import Client
import uuid

client = Client()

Custom run ID for later feedback linking

自定义运行ID用于后续关联反馈

my_run_id = str(uuid.uuid4()) result = chain.invoke({"input": "..."}, {"run_id": my_run_id})

Attach feedback

添加反馈

client.create_feedback( key="correctness", score=1, # 0-1 numeric or categorical run_id=my_run_id, comment="Accurate and concise" )

---

client.create_feedback( key="correctness", score=1, # 0-1数值型或分类型 run_id=my_run_id, comment="Accurate and concise" )

---

References

参考资料

Python SDK Reference — full Client API, @traceable signature, evaluate()
TypeScript SDK Reference — Client, traceable, wrappers, evaluate
CLI Reference — langsmith CLI commands
Official Docs — langchain.com/langsmith
SDK GitHub — MIT License, v0.7.17
openevals — Prebuilt LLM evaluators

Python SDK参考文档 — 完整Client API、@traceable签名、evaluate()方法说明
TypeScript SDK参考文档 — Client、traceable、包装器、evaluate方法说明
CLI参考文档 — langsmith CLI命令说明
官方文档 — langchain.com/langsmith
SDK GitHub仓库 — MIT协议，版本v0.7.17
openevals — 预构建的LLM评估器集合