langsmith-dataset

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

<oneliner> Create, manage, and upload evaluation datasets to LangSmith for testing and validation. </oneliner> <setup> Environment Variables

bash

LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # Required
LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys

IMPORTANT: Always check the environment variables or

.env

file for

LANGSMITH_PROJECT

before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.

Python Dependencies

bash

pip install langsmith

JavaScript Dependencies

bash

npm install langsmith

CLI Tool

bash

curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh

</setup> <usage> Use the `langsmith` CLI to manage datasets and examples.

<oneliner> 创建、管理并将评估数据集上传至LangSmith，用于测试与验证。 </oneliner> <setup> 环境变量

bash

LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # 必填
LANGSMITH_PROJECT=your-project-name                   # 查看此项以了解哪些项目包含追踪数据
LANGSMITH_WORKSPACE_ID=your-workspace-id              # 可选：适用于组织级密钥

重要提示： 在查询或与LangSmith交互前，请务必检查环境变量或.env文件中的

LANGSMITH_PROJECT

配置。这能告知你相关追踪数据和信息所在的项目。若LangSmith项目不可用，请根据判断选择合适的项目。

Python依赖

bash

pip install langsmith

JavaScript依赖

bash

npm install langsmith

CLI工具安装

bash

curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh

</setup> <usage> 使用`langsmith` CLI管理数据集和示例。

Dataset Commands

数据集命令

```
langsmith dataset list
```
- List datasets in LangSmith
```
langsmith dataset get <name-or-id>
```
- View dataset details
```
langsmith dataset create --name <name>
```
- Create a new empty dataset
```
langsmith dataset delete <name-or-id>
```
- Delete a dataset

langsmith dataset export <name-or-id> <output-file>

- Export dataset to local JSON file

langsmith dataset upload <file> --name <name>

- Upload a local JSON file as a dataset

```
langsmith dataset list
```
- 列出LangSmith中的所有数据集
```
langsmith dataset get <name-or-id>
```
- 查看数据集详情
```
langsmith dataset create --name <name>
```
- 创建新的空数据集
```
langsmith dataset delete <name-or-id>
```
- 删除数据集

langsmith dataset export <name-or-id> <output-file>

- 将数据集导出至本地JSON文件

langsmith dataset upload <file> --name <name>

- 将本地JSON文件上传为数据集

Example Commands

示例命令

```
langsmith example list --dataset <name>
```
- List examples in a dataset

langsmith example create --dataset <name> --inputs <json>

- Add an example to a dataset

```
langsmith example delete <example-id>
```
- Delete an example

```
langsmith example list --dataset <name>
```
- 列出数据集中的示例

langsmith example create --dataset <name> --inputs <json>

- 向数据集中添加示例

```
langsmith example delete <example-id>
```
- 删除示例

Experiment Commands

实验命令

langsmith experiment list --dataset <name>

- List experiments for a dataset

```
langsmith experiment get <name>
```
- View experiment results

langsmith experiment list --dataset <name>

- 列出数据集对应的实验

```
langsmith experiment get <name>
```
- 查看实验结果

Common Flags

通用参数

```
--limit N
```
- Limit number of results
```
--yes
```
- Skip confirmation prompts (use with caution)

IMPORTANT - Safety Prompts:

The CLI prompts for confirmation before destructive operations (delete, overwrite)
If you are running with user input: ALWAYS wait for user input; NEVER use
```
--yes
```
unless the user explicitly requests it
If you are running non-interactively: Use
```
--yes
```
to skip confirmation prompts </usage>

<dataset_types_overview> Common evaluation dataset types:

final_response - Full conversation with expected output. Tests complete agent behavior.
single_step - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
trajectory - Tool call sequence. Tests execution path (ordered list of tool names).
rag - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview>

<creating_datasets>

```
--limit N
```
- 限制返回结果数量
```
--yes
```
- 跳过确认提示（谨慎使用）

重要安全提示：

CLI会在执行破坏性操作（删除、覆盖）前提示确认
若涉及用户输入： 务必等待用户输入确认；除非用户明确要求，否则绝不要使用
```
--yes
```
参数
若非交互式运行： 可使用
```
--yes
```
参数跳过确认提示 </usage>

<dataset_types_overview> 常见的评估数据集类型：

final_response - 包含完整对话及预期输出，用于测试Agent的完整行为。
single_step - 单节点输入/输出，用于测试特定节点的行为（如单次LLM调用或工具调用）。
trajectory - 工具调用序列，用于测试执行路径（有序的工具名称列表）。
rag - 问题/文本块/答案/引用，用于测试检索质量。 </dataset_types_overview>

<creating_datasets>

Creating Datasets

创建数据集

Datasets are JSON files with an array of examples. Each example has

inputs

and

outputs

数据集为包含示例数组的JSON文件，每个示例需包含

inputs

和

outputs

字段。

From Exported Traces (Programmatic)

从导出的追踪数据创建（程序化方式）

Export traces first, then process them into dataset format using code:

bash

undefined

先导出追踪数据，再通过代码将其处理为数据集格式：

bash

undefined

1. Export traces to JSONL files

1. 将追踪数据导出为JSONL文件

langsmith trace export ./traces --project my-project --limit 20 --full


<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

langsmith trace export ./traces --project my-project --limit 20 --full


<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

2. Process traces into dataset examples

2. 将追踪数据处理为数据集示例

examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })

3. Save locally

3. 保存至本地

with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)

</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));

</typescript>

with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)

</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. 将追踪数据处理为数据集示例
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. 保存至本地
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));

</typescript>

Upload to LangSmith

上传至LangSmith

bash

undefined

bash

undefined

Upload local JSON file as a dataset

将本地JSON文件上传为数据集

langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"

undefined

langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"

undefined

Using the SDK Directly

直接使用SDK创建

<python> ```python from langsmith import Client

client = Client()

<python> ```python from langsmith import Client

client = Client()

Create dataset and add examples in one step

一步完成数据集创建和示例添加

dataset = client.create_dataset("My Dataset", description="Evaluation dataset")

client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )

</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});

</typescript> </creating_datasets>

<dataset_structures>

dataset = client.create_dataset("My Dataset", description="Evaluation dataset")

client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )

</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// 创建数据集并添加示例
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});

</typescript> </creating_datasets>

<dataset_structures>

Dataset Structures by Type

不同类型的数据集结构

Final Response

json

{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}

json

{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}

Single Step

json

{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}

json

{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}

Trajectory

json

{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}

json

{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}

RAG

json

{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}

</dataset_structures>

<script_usage>

json

{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}

</dataset_structures>

<script_usage>

CLI Usage

CLI使用示例

bash

undefined

bash

undefined

List all datasets

列出所有数据集

langsmith dataset list

Get dataset details

查看数据集详情

langsmith dataset get "My Dataset"

Create an empty dataset

创建空数据集

langsmith dataset create --name "New Dataset" --description "For evaluation"

Upload a local JSON file

上传本地JSON文件

langsmith dataset upload /tmp/dataset.json --name "My Dataset"

Export a dataset to local file

将数据集导出至本地文件

langsmith dataset export "My Dataset" /tmp/exported.json --limit 100

Delete a dataset

删除数据集

langsmith dataset delete "My Dataset"

List examples in a dataset

列出数据集中的示例

langsmith example list --dataset "My Dataset" --limit 10

Add an example

添加示例

langsmith example create --dataset "My Dataset"
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'

List experiments

列出实验

langsmith experiment list --dataset "My Dataset" langsmith experiment get "eval-v1"

</script_usage>

<example_workflow>
Complete workflow from traces to uploaded LangSmith dataset:

```bash

langsmith experiment list --dataset "My Dataset" langsmith experiment get "eval-v1"

</script_usage>

<example_workflow>
从追踪数据到上传至LangSmith数据集的完整流程：

```bash

1. Export traces from LangSmith

1. 从LangSmith导出追踪数据

langsmith trace export ./traces --project my-project --limit 20 --full

2. Process traces into dataset format (using Python/JS code)

2. 将追踪数据处理为数据集格式（使用Python/JS代码）

See "Creating Datasets" section above

详见上方「创建数据集」章节

3. Upload to LangSmith

3. 上传至LangSmith

langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"

4. Verify upload

4. 验证上传结果

langsmith dataset list langsmith dataset get "Skills: Final Response" langsmith example list --dataset "Skills: Final Response" --limit 3

5. Run experiments

5. 运行实验

langsmith experiment list --dataset "Skills: Final Response"

</example_workflow>

<troubleshooting>
**Dataset upload fails:**
- Verify LANGSMITH_API_KEY is set
- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)
- Dataset name must be unique, or delete existing first with `langsmith dataset delete`

**Empty dataset after upload:**
- Verify JSON file contains an array of objects with `inputs` key
- Check file isn't empty: `langsmith example list --dataset "Name"`

**Export has no data:**
- Ensure traces were exported with `--full` flag to include inputs/outputs
- Verify traces have both `inputs` and `outputs` populated

**Example count mismatch:**
- Use `langsmith dataset get "Name"` to check remote count
- Compare with local file to verify upload completeness
</troubleshooting>
</output>

langsmith experiment list --dataset "Skills: Final Response"

</example_workflow>

<troubleshooting>
**数据集上传失败：**
- 确认已正确设置LANGSMITH_API_KEY
- 检查JSON文件格式是否有效：每个元素需包含`inputs`字段（可选`outputs`字段）
- 数据集名称需唯一，或先使用`langsmith dataset delete`命令删除现有同名数据集

**上传后数据集为空：**
- 确认JSON文件包含带`inputs`字段的对象数组
- 使用`langsmith example list --dataset "Name"`命令检查是否有示例

**导出的追踪数据无内容：**
- 确保导出时使用`--full`参数以包含输入/输出数据
- 确认追踪数据同时包含`inputs`和`outputs`字段

**示例数量不匹配：**
- 使用`langsmith dataset get "Name"`命令查看远程数据集的示例数量
- 与本地文件对比以确认上传是否完整
</troubleshooting>