langsmith-dataset

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<oneliner> Create, manage, and upload evaluation datasets to LangSmith for testing and validation. </oneliner> <setup> Environment Variables
bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # Required
LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys
IMPORTANT: Always check the environment variables or
.env
file for
LANGSMITH_PROJECT
before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.
Python Dependencies
bash
pip install langsmith
JavaScript Dependencies
bash
npm install langsmith
CLI Tool
bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
</setup> <usage> Use the `langsmith` CLI to manage datasets and examples.
<oneliner> 创建、管理并将评估数据集上传至LangSmith,用于测试与验证。 </oneliner> <setup> 环境变量
bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # 必填
LANGSMITH_PROJECT=your-project-name                   # 查看此项以了解哪些项目包含追踪数据
LANGSMITH_WORKSPACE_ID=your-workspace-id              # 可选:适用于组织级密钥
重要提示: 在查询或与LangSmith交互前,请务必检查环境变量或.env文件中的
LANGSMITH_PROJECT
配置。这能告知你相关追踪数据和信息所在的项目。若LangSmith项目不可用,请根据判断选择合适的项目。
Python依赖
bash
pip install langsmith
JavaScript依赖
bash
npm install langsmith
CLI工具安装
bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
</setup> <usage> 使用`langsmith` CLI管理数据集和示例。

Dataset Commands

数据集命令

  • langsmith dataset list
    - List datasets in LangSmith
  • langsmith dataset get <name-or-id>
    - View dataset details
  • langsmith dataset create --name <name>
    - Create a new empty dataset
  • langsmith dataset delete <name-or-id>
    - Delete a dataset
  • langsmith dataset export <name-or-id> <output-file>
    - Export dataset to local JSON file
  • langsmith dataset upload <file> --name <name>
    - Upload a local JSON file as a dataset
  • langsmith dataset list
    - 列出LangSmith中的所有数据集
  • langsmith dataset get <name-or-id>
    - 查看数据集详情
  • langsmith dataset create --name <name>
    - 创建新的空数据集
  • langsmith dataset delete <name-or-id>
    - 删除数据集
  • langsmith dataset export <name-or-id> <output-file>
    - 将数据集导出至本地JSON文件
  • langsmith dataset upload <file> --name <name>
    - 将本地JSON文件上传为数据集

Example Commands

示例命令

  • langsmith example list --dataset <name>
    - List examples in a dataset
  • langsmith example create --dataset <name> --inputs <json>
    - Add an example to a dataset
  • langsmith example delete <example-id>
    - Delete an example
  • langsmith example list --dataset <name>
    - 列出数据集中的示例
  • langsmith example create --dataset <name> --inputs <json>
    - 向数据集中添加示例
  • langsmith example delete <example-id>
    - 删除示例

Experiment Commands

实验命令

  • langsmith experiment list --dataset <name>
    - List experiments for a dataset
  • langsmith experiment get <name>
    - View experiment results
  • langsmith experiment list --dataset <name>
    - 列出数据集对应的实验
  • langsmith experiment get <name>
    - 查看实验结果

Common Flags

通用参数

  • --limit N
    - Limit number of results
  • --yes
    - Skip confirmation prompts (use with caution)
IMPORTANT - Safety Prompts:
  • The CLI prompts for confirmation before destructive operations (delete, overwrite)
  • If you are running with user input: ALWAYS wait for user input; NEVER use
    --yes
    unless the user explicitly requests it
  • If you are running non-interactively: Use
    --yes
    to skip confirmation prompts </usage>
<dataset_types_overview> Common evaluation dataset types:
  • final_response - Full conversation with expected output. Tests complete agent behavior.
  • single_step - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
  • trajectory - Tool call sequence. Tests execution path (ordered list of tool names).
  • rag - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview>
<creating_datasets>
  • --limit N
    - 限制返回结果数量
  • --yes
    - 跳过确认提示(谨慎使用)
重要安全提示:
  • CLI会在执行破坏性操作(删除、覆盖)前提示确认
  • 若涉及用户输入: 务必等待用户输入确认;除非用户明确要求,否则绝不要使用
    --yes
    参数
  • 若非交互式运行: 可使用
    --yes
    参数跳过确认提示 </usage>
<dataset_types_overview> 常见的评估数据集类型:
  • final_response - 包含完整对话及预期输出,用于测试Agent的完整行为。
  • single_step - 单节点输入/输出,用于测试特定节点的行为(如单次LLM调用或工具调用)。
  • trajectory - 工具调用序列,用于测试执行路径(有序的工具名称列表)。
  • rag - 问题/文本块/答案/引用,用于测试检索质量。 </dataset_types_overview>
<creating_datasets>

Creating Datasets

创建数据集

Datasets are JSON files with an array of examples. Each example has
inputs
and
outputs
.
数据集为包含示例数组的JSON文件,每个示例需包含
inputs
outputs
字段。

From Exported Traces (Programmatic)

从导出的追踪数据创建(程序化方式)

Export traces first, then process them into dataset format using code:
bash
undefined
先导出追踪数据,再通过代码将其处理为数据集格式:
bash
undefined

1. Export traces to JSONL files

1. 将追踪数据导出为JSONL文件

langsmith trace export ./traces --project my-project --limit 20 --full

<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()
langsmith trace export ./traces --project my-project --limit 20 --full

<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

2. Process traces into dataset examples

2. 将追踪数据处理为数据集示例

examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })
examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })

3. Save locally

3. 保存至本地

with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)
</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));
</typescript>
with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)
</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. 将追踪数据处理为数据集示例
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. 保存至本地
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));
</typescript>

Upload to LangSmith

上传至LangSmith

bash
undefined
bash
undefined

Upload local JSON file as a dataset

将本地JSON文件上传为数据集

langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
undefined
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
undefined

Using the SDK Directly

直接使用SDK创建

<python> ```python from langsmith import Client
client = Client()
<python> ```python from langsmith import Client
client = Client()

Create dataset and add examples in one step

一步完成数据集创建和示例添加

dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )
</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});
</typescript> </creating_datasets>
<dataset_structures>
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )
</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// 创建数据集并添加示例
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});
</typescript> </creating_datasets>
<dataset_structures>

Dataset Structures by Type

不同类型的数据集结构

Final Response

Final Response

json
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}
json
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}

Single Step

Single Step

json
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}
json
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}

Trajectory

Trajectory

json
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}
json
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}

RAG

RAG

json
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
</dataset_structures>
<script_usage>
json
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
</dataset_structures>
<script_usage>

CLI Usage

CLI使用示例

bash
undefined
bash
undefined

List all datasets

列出所有数据集

langsmith dataset list
langsmith dataset list

Get dataset details

查看数据集详情

langsmith dataset get "My Dataset"
langsmith dataset get "My Dataset"

Create an empty dataset

创建空数据集

langsmith dataset create --name "New Dataset" --description "For evaluation"
langsmith dataset create --name "New Dataset" --description "For evaluation"

Upload a local JSON file

上传本地JSON文件

langsmith dataset upload /tmp/dataset.json --name "My Dataset"
langsmith dataset upload /tmp/dataset.json --name "My Dataset"

Export a dataset to local file

将数据集导出至本地文件

langsmith dataset export "My Dataset" /tmp/exported.json --limit 100
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100

Delete a dataset

删除数据集

langsmith dataset delete "My Dataset"
langsmith dataset delete "My Dataset"

List examples in a dataset

列出数据集中的示例

langsmith example list --dataset "My Dataset" --limit 10
langsmith example list --dataset "My Dataset" --limit 10

Add an example

添加示例

langsmith example create --dataset "My Dataset"
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'
langsmith example create --dataset "My Dataset"
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'

List experiments

列出实验

langsmith experiment list --dataset "My Dataset" langsmith experiment get "eval-v1"
</script_usage>

<example_workflow>
Complete workflow from traces to uploaded LangSmith dataset:

```bash
langsmith experiment list --dataset "My Dataset" langsmith experiment get "eval-v1"
</script_usage>

<example_workflow>
从追踪数据到上传至LangSmith数据集的完整流程:

```bash

1. Export traces from LangSmith

1. 从LangSmith导出追踪数据

langsmith trace export ./traces --project my-project --limit 20 --full
langsmith trace export ./traces --project my-project --limit 20 --full

2. Process traces into dataset format (using Python/JS code)

2. 将追踪数据处理为数据集格式(使用Python/JS代码)

See "Creating Datasets" section above

详见上方「创建数据集」章节

3. Upload to LangSmith

3. 上传至LangSmith

langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"

4. Verify upload

4. 验证上传结果

langsmith dataset list langsmith dataset get "Skills: Final Response" langsmith example list --dataset "Skills: Final Response" --limit 3
langsmith dataset list langsmith dataset get "Skills: Final Response" langsmith example list --dataset "Skills: Final Response" --limit 3

5. Run experiments

5. 运行实验

langsmith experiment list --dataset "Skills: Final Response"
</example_workflow>

<troubleshooting>
**Dataset upload fails:**
- Verify LANGSMITH_API_KEY is set
- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)
- Dataset name must be unique, or delete existing first with `langsmith dataset delete`

**Empty dataset after upload:**
- Verify JSON file contains an array of objects with `inputs` key
- Check file isn't empty: `langsmith example list --dataset "Name"`

**Export has no data:**
- Ensure traces were exported with `--full` flag to include inputs/outputs
- Verify traces have both `inputs` and `outputs` populated

**Example count mismatch:**
- Use `langsmith dataset get "Name"` to check remote count
- Compare with local file to verify upload completeness
</troubleshooting>
</output>
langsmith experiment list --dataset "Skills: Final Response"
</example_workflow>

<troubleshooting>
**数据集上传失败:**
- 确认已正确设置LANGSMITH_API_KEY
- 检查JSON文件格式是否有效:每个元素需包含`inputs`字段(可选`outputs`字段)
- 数据集名称需唯一,或先使用`langsmith dataset delete`命令删除现有同名数据集

**上传后数据集为空:**
- 确认JSON文件包含带`inputs`字段的对象数组
- 使用`langsmith example list --dataset "Name"`命令检查是否有示例

**导出的追踪数据无内容:**
- 确保导出时使用`--full`参数以包含输入/输出数据
- 确认追踪数据同时包含`inputs`和`outputs`字段

**示例数量不匹配:**
- 使用`langsmith dataset get "Name"`命令查看远程数据集的示例数量
- 与本地文件对比以确认上传是否完整
</troubleshooting>