langsmith

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

langsmith — LLM Observability, Evaluation & Prompt Management

langsmith — LLM可观测性、评估与提示词管理

Keyword:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.

关键词:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith是一个不绑定框架的平台，用于开发、调试和部署LLM应用。它提供端到端链路追踪、质量评估、提示词版本控制和生产环境监控能力。

When to use this skill

何时使用此工具

Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
Run offline evaluations with
```
evaluate()
```
against a curated dataset
Set up production monitoring and online evaluation
Manage and version prompts in the Prompt Hub
Create datasets for regression testing and benchmarking
Attach human or automated feedback to traces
Use LLM-as-judge scoring with
```
openevals
```
Debug agent failures with end-to-end trace inspection

为任意LLM流水线添加链路追踪能力（支持OpenAI、Anthropic、LangChain、自定义模型）
使用
```
evaluate()
```
对整理好的数据集运行离线评估
搭建生产环境监控和在线评估体系
在Prompt Hub中管理提示词并做版本控制
创建用于回归测试和基准测试的数据集
为链路追踪数据添加人工或自动化反馈
通过
```
openevals
```
使用LLM作为裁判进行评分
通过端到端链路排查Agent运行故障

Instructions

使用说明

Install SDK:

pip install -U langsmith

(Python) or

npm install langsmith

(TypeScript)

Set environment variables:

LANGSMITH_TRACING=true

LANGSMITH_API_KEY=lsv2_...

Instrument with
```
@traceable
```
decorator or
```
wrap_openai()
```
wrapper
View traces at smith.langchain.com
For evaluation setup, see references/python-sdk.md
For CLI commands, see references/cli.md
Run
```
bash scripts/setup.sh
```
to auto-configure environment

API Key: Get from smith.langchain.com → Settings → API Keys Docs: https://docs.langchain.com/langsmith

安装SDK：
```
pip install -U langsmith
```
（Python版本）或
```
npm install langsmith
```
（TypeScript版本）

设置环境变量：

LANGSMITH_TRACING=true

，

LANGSMITH_API_KEY=lsv2_...

使用
```
@traceable
```
装饰器或
```
wrap_openai()
```
封装器进行插桩
在smith.langchain.com查看链路数据
评估相关配置请参考references/python-sdk.md
CLI命令相关请参考references/cli.md
运行
```
bash scripts/setup.sh
```
自动配置环境

API Key获取地址：smith.langchain.com → 设置 → API Keys 官方文档：https://docs.langchain.com/langsmith

Quick Start

快速开始

Python

bash

pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."

python

from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """Automatically traced in LangSmith"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

bash

pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."

python

from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """Automatically traced in LangSmith"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

TypeScript

bash

npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."

typescript

import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

bash

npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."

typescript

import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

Core Concepts

核心概念

Concept	Description
Run	Individual operation (LLM call, tool call, retrieval). The fundamental unit.
Trace	All runs from a single user request, linked by `trace_id` .
Thread	Multiple traces in a conversation, linked by `session_id` or `thread_id` .
Project	Container grouping related traces (set via `LANGSMITH_PROJECT` ).
Dataset	Collection of `{inputs, outputs}` examples for offline evaluation.
Experiment	Result set from running `evaluate()` against a dataset.
Feedback	Score/label attached to a run — numeric, categorical, or freeform.

概念	描述
Run	单个操作（LLM调用、工具调用、检索），是最基础的单元。
Trace	单个用户请求产生的所有Run，通过 `trace_id` 关联。
Thread	同一会话中的多条Trace，通过 `session_id` 或 `thread_id` 关联。
Project	用于分组关联Trace的容器（通过 `LANGSMITH_PROJECT` 环境变量设置）。
Dataset	用于离线评估的 `{inputs, outputs}` 示例集合。
Experiment	对某个数据集运行 `evaluate()` 得到的结果集合。
Feedback	附加到Run上的评分/标签，支持数值、分类或自由文本格式。

Tracing

链路追踪

@traceable decorator (Python)

python

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

python

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

Selective tracing context

选择性链路追踪上下文

python

import langsmith as ls

python

import langsmith as ls

Enable tracing for this block only

with ls.tracing_context(enabled=True, project_name="debug"): result = chain.invoke({"input": "..."})

Disable tracing despite LANGSMITH_TRACING=true

with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})

undefined

with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})

undefined

Wrap provider clients

封装服务商客户端

python

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())

python

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())

Distributed tracing (microservices)

分布式链路追踪（微服务场景）

python

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # Pass to child service
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

python

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # Pass to child service
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

Evaluation

评估

Basic evaluation with evaluate()

使用evaluate()进行基础评估

python

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

python

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

1. Create dataset

dataset = client.create_dataset("Geography QA") client.create_examples( dataset_id=dataset.id, examples=[ {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}}, {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}}, ] )

2. Target function

def target(inputs: dict) -> dict: res = oai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": inputs["q"]}] ) return {"a": res.choices[0].message.content}

3. Evaluator

def exact_match(inputs, outputs, reference_outputs): return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()

4. Run experiment

results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )

undefined

results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )

undefined

LLM-as-judge with openevals

基于openevals实现LLM作为裁判

python

pip install -U openevals

python

from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

python

pip install -U openevals

python

from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

Evaluation types

评估类型

Type	When to use
Code/Heuristic	Exact match, format checks, rule-based
LLM-as-judge	Subjective quality, safety, reference-free
Human	Annotation queues, pairwise comparison
Pairwise	Compare two app versions
Online	Production traces, real traffic

类型	适用场景
代码/启发式规则	精确匹配、格式校验、基于规则的场景
LLM-as-judge	主观质量评估、安全性评估、无参考输出的场景
人工评估	标注队列、两两对比评估
两两对比	对比两个应用版本的效果
在线评估	生产环境链路、真实流量场景

Prompt Hub

python

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

python

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

Push a prompt

prompt = ChatPromptTemplate([ ("system", "You are a helpful assistant."), ("user", "{question}"), ]) client.push_prompt("my-assistant-prompt", object=prompt)

Pull and use

prompt = client.pull_prompt("my-assistant-prompt")

Pull specific version:

prompt = client.pull_prompt("my-assistant-prompt:abc123")

---

prompt = client.pull_prompt("my-assistant-prompt:abc123")

---

Feedback

反馈

python

from langsmith import Client
import uuid

client = Client()

python

from langsmith import Client
import uuid

client = Client()

Custom run ID for later feedback linking

my_run_id = str(uuid.uuid4()) result = chain.invoke({"input": "..."}, {"run_id": my_run_id})

Attach feedback

client.create_feedback( key="correctness", score=1, # 0-1 numeric or categorical run_id=my_run_id, comment="Accurate and concise" )

---

client.create_feedback( key="correctness", score=1, # 0-1 numeric or categorical run_id=my_run_id, comment="Accurate and concise" )

---

References

参考资料

Python SDK Reference — full Client API, @traceable signature, evaluate()
TypeScript SDK Reference — Client, traceable, wrappers, evaluate
CLI Reference — langsmith CLI commands
Official Docs — langchain.com/langsmith
SDK GitHub — MIT License, v0.7.17
openevals — Prebuilt LLM evaluators

Python SDK参考文档 — full Client API, @traceable signature, evaluate()
TypeScript SDK参考文档 — Client, traceable, wrappers, evaluate
CLI参考文档 — langsmith CLI commands
官方文档 — langchain.com/langsmith
SDK GitHub仓库 — MIT License, v0.7.17
openevals仓库 — Prebuilt LLM evaluators