langsmith

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

langsmith — LLM Observability, Evaluation & Prompt Management

langsmith — LLM可观测性、评估与提示词管理

Keyword:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.
关键词:
langsmith
·
llm tracing
·
llm evaluation
·
@traceable
·
langsmith evaluate
LangSmith是一个与框架无关的LLM应用开发、调试和部署平台。 它提供端到端追踪、质量评估、提示词版本控制和生产环境监控能力。

When to use this skill

何时使用该技能

  • Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
  • Run offline evaluations with
    evaluate()
    against a curated dataset
  • Set up production monitoring and online evaluation
  • Manage and version prompts in the Prompt Hub
  • Create datasets for regression testing and benchmarking
  • Attach human or automated feedback to traces
  • Use LLM-as-judge scoring with
    openevals
  • Debug agent failures with end-to-end trace inspection
  • 为任意LLM流水线(OpenAI、Anthropic、LangChain、自定义模型)添加追踪功能
  • 使用
    evaluate()
    针对精选数据集运行离线评估
  • 设置生产环境监控与在线评估
  • 在Prompt Hub中管理和版本化提示词
  • 创建用于回归测试和基准测试的数据集
  • 为追踪记录添加人工或自动化反馈
  • 使用
    openevals
    实现LLM-as-judge评分
  • 通过端到端追踪记录调试Agent故障

Instructions

操作步骤

  1. Install SDK:
    pip install -U langsmith
    (Python) or
    npm install langsmith
    (TypeScript)
  2. Set environment variables:
    LANGSMITH_TRACING=true
    ,
    LANGSMITH_API_KEY=lsv2_...
  3. Instrument with
    @traceable
    decorator or
    wrap_openai()
    wrapper
  4. View traces at smith.langchain.com
  5. For evaluation setup, see references/python-sdk.md
  6. For CLI commands, see references/cli.md
  7. Run
    bash scripts/setup.sh
    to auto-configure environment

  1. 安装SDK:
    pip install -U langsmith
    (Python)或
    npm install langsmith
    (TypeScript)
  2. 设置环境变量:
    LANGSMITH_TRACING=true
    LANGSMITH_API_KEY=lsv2_...
  3. 使用
    @traceable
    装饰器或
    wrap_openai()
    包装器进行埋点
  4. smith.langchain.com查看追踪记录
  5. 评估设置相关内容,请参考references/python-sdk.md
  6. CLI命令相关内容,请参考references/cli.md
  7. 运行
    bash scripts/setup.sh
    自动配置环境

Quick Start

快速开始

Python

Python

bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """Automatically traced in LangSmith"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")
bash
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
python
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """自动在LangSmith中追踪"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

TypeScript

TypeScript

bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

bash
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
typescript
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

Core Concepts

核心概念

ConceptDescription
RunIndividual operation (LLM call, tool call, retrieval). The fundamental unit.
TraceAll runs from a single user request, linked by
trace_id
.
ThreadMultiple traces in a conversation, linked by
session_id
or
thread_id
.
ProjectContainer grouping related traces (set via
LANGSMITH_PROJECT
).
DatasetCollection of
{inputs, outputs}
examples for offline evaluation.
ExperimentResult set from running
evaluate()
against a dataset.
FeedbackScore/label attached to a run — numeric, categorical, or freeform.

概念描述
Run(运行实例)单个操作(LLM调用、工具调用、检索操作),是最基础的单元。
Trace(追踪链路)单个用户请求触发的所有运行实例,通过
trace_id
关联。
Thread(会话线程)一次对话中的多个追踪链路,通过
session_id
thread_id
关联。
Project(项目)关联追踪链路的容器(通过
LANGSMITH_PROJECT
设置)。
Dataset(数据集)用于离线评估的
{输入, 输出}
示例集合。
Experiment(实验)针对数据集运行
evaluate()
后得到的结果集。
Feedback(反馈)附加到运行实例的评分/标签——支持数值型、分类型或自由文本。

Tracing

追踪功能

@traceable decorator (Python)

@traceable装饰器(Python)

python
from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)
python
from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

Selective tracing context

选择性追踪上下文

python
import langsmith as ls
python
import langsmith as ls

Enable tracing for this block only

仅为此代码块启用追踪

with ls.tracing_context(enabled=True, project_name="debug"): result = chain.invoke({"input": "..."})
with ls.tracing_context(enabled=True, project_name="debug"): result = chain.invoke({"input": "..."})

Disable tracing despite LANGSMITH_TRACING=true

即使LANGSMITH_TRACING=true也禁用追踪

with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})
undefined
with ls.tracing_context(enabled=False): result = chain.invoke({"input": "..."})
undefined

Wrap provider clients

包装供应商客户端

python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())
python
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # 所有调用自动被追踪
anthropic_client = wrap_anthropic(anthropic.Anthropic())

Distributed tracing (microservices)

分布式追踪(微服务场景)

python
from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # Pass to child service
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

python
from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # 传递给子服务
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

Evaluation

评估功能

Basic evaluation with evaluate()

使用evaluate()进行基础评估

python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())
python
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

1. Create dataset

1. 创建数据集

dataset = client.create_dataset("Geography QA") client.create_examples( dataset_id=dataset.id, examples=[ {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}}, {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}}, ] )
dataset = client.create_dataset("Geography QA") client.create_examples( dataset_id=dataset.id, examples=[ {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}}, {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}}, ] )

2. Target function

2. 目标函数

def target(inputs: dict) -> dict: res = oai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": inputs["q"]}] ) return {"a": res.choices[0].message.content}
def target(inputs: dict) -> dict: res = oai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": inputs["q"]}] ) return {"a": res.choices[0].message.content}

3. Evaluator

3. 评估器

def exact_match(inputs, outputs, reference_outputs): return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
def exact_match(inputs, outputs, reference_outputs): return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()

4. Run experiment

4. 运行实验

results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )
undefined
results = client.evaluate( target, data="Geography QA", evaluators=[exact_match], experiment_prefix="gpt-4o-mini-v1", max_concurrency=4 )
undefined

LLM-as-judge with openevals

使用openevals实现LLM-as-judge

python
pip install -U openevals
python
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])
python
pip install -U openevals
python
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

Evaluation types

评估类型

TypeWhen to use
Code/HeuristicExact match, format checks, rule-based
LLM-as-judgeSubjective quality, safety, reference-free
HumanAnnotation queues, pairwise comparison
PairwiseCompare two app versions
OnlineProduction traces, real traffic

类型使用场景
代码/启发式评估精确匹配、格式检查、基于规则的评估
LLM-as-judge评估主观质量、安全性、无参考标准的评估
人工评估标注队列、两两对比
两两对比评估比较两个应用版本的差异
在线评估生产环境追踪记录、真实流量评估

Prompt Hub

Prompt Hub(提示词中心)

python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()
python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

Push a prompt

推送提示词

prompt = ChatPromptTemplate([ ("system", "You are a helpful assistant."), ("user", "{question}"), ]) client.push_prompt("my-assistant-prompt", object=prompt)
prompt = ChatPromptTemplate([ ("system", "You are a helpful assistant."), ("user", "{question}"), ]) client.push_prompt("my-assistant-prompt", object=prompt)

Pull and use

拉取并使用

prompt = client.pull_prompt("my-assistant-prompt")
prompt = client.pull_prompt("my-assistant-prompt")

Pull specific version:

拉取指定版本:

prompt = client.pull_prompt("my-assistant-prompt:abc123")

---
prompt = client.pull_prompt("my-assistant-prompt:abc123")

---

Feedback

反馈功能

python
from langsmith import Client
import uuid

client = Client()
python
from langsmith import Client
import uuid

client = Client()

Custom run ID for later feedback linking

自定义运行ID用于后续关联反馈

my_run_id = str(uuid.uuid4()) result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
my_run_id = str(uuid.uuid4()) result = chain.invoke({"input": "..."}, {"run_id": my_run_id})

Attach feedback

添加反馈

client.create_feedback( key="correctness", score=1, # 0-1 numeric or categorical run_id=my_run_id, comment="Accurate and concise" )

---
client.create_feedback( key="correctness", score=1, # 0-1数值型或分类型 run_id=my_run_id, comment="Accurate and concise" )

---

References

参考资料

  • Python SDK Reference — full Client API, @traceable signature, evaluate()
  • TypeScript SDK Reference — Client, traceable, wrappers, evaluate
  • CLI Reference — langsmith CLI commands
  • Official Docs — langchain.com/langsmith
  • SDK GitHub — MIT License, v0.7.17
  • openevals — Prebuilt LLM evaluators
  • Python SDK参考文档 — 完整Client API、@traceable签名、evaluate()方法说明
  • TypeScript SDK参考文档 — Client、traceable、包装器、evaluate方法说明
  • CLI参考文档 — langsmith CLI命令说明
  • 官方文档 — langchain.com/langsmith
  • SDK GitHub仓库 — MIT协议,版本v0.7.17
  • openevals — 预构建的LLM评估器集合