llamaparse

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LlamaParse Skill

Parse unstructured documents (such as PDF, DOCX, PPTX, XLSX) with LlamaParse and extract their contents (text, markdown, images...).

使用LlamaParse解析非结构化文档（如PDF、DOCX、PPTX、XLSX）并提取其内容（文本、Markdown、图片等）。

Initial Setup

初始设置

When this skill is invoked, respond with:

I'm ready to use LlamaParse to parse files. Before we begin, please confirm that:

- `LLAMA_CLOUD_API_KEY` is set as environment variable within the current environment
- `@llamaindex/llama-cloud@latest` is installed and available within the current Node environment

If both of them are set, please provide:

1. One or more files to be parsed
2. Specific parsing options, such as tier, API version, custom prompt, processing options...
3. Any requests you might have regarding the parsed content of the file.

I will produce a Typescript script to run the parsing job and, once you approved its execution, I will report the results back to you based on your request.

Then wait for the user's input.

当调用此Skill时，请回复以下内容：

I'm ready to use LlamaParse to parse files. Before we begin, please confirm that:

- `LLAMA_CLOUD_API_KEY` is set as environment variable within the current environment
- `@llamaindex/llama-cloud@latest` is installed and available within the current Node environment

If both of them are set, please provide:

1. One or more files to be parsed
2. Specific parsing options, such as tier, API version, custom prompt, processing options...
3. Any requests you might have regarding the parsed content of the file.

I will produce a Typescript script to run the parsing job and, once you approved its execution, I will report the results back to you based on your request.

然后等待用户输入。

Step 0 — Install

llama-cloud

(optional)

步骤0 — 安装

llama-cloud

（可选）

If the user does not have the

@llamaindex/llama-cloud

package installed, add it to the current environment by running:

bash

npm install @llamaindex/llama-cloud@latest

如果用户尚未安装

@llamaindex/llama-cloud

包，请通过运行以下命令将其添加到当前环境中：

bash

npm install @llamaindex/llama-cloud@latest

Step 1 — Produce a Typescript Script

步骤1 — 生成Typescript脚本

Once the user confirms the environment variables are set and provides the necessary details for the parsing job, produce a typescript script.

As a source of truth for the TS script, you can:

Refer to the example.ts script, which covers most of the necessary configurations for LlamaParse

Refer to the complete LlamaParse Documentation, fetching the

https://developers.llamaindex.ai/python/cloud/llamaparse/api-v2-guide/

page.

一旦用户确认环境变量已配置完成，并提供了解析任务所需的必要信息，即可生成一个Typescript脚本。

关于TS脚本的参考标准，你可以：

参考example.ts脚本，它涵盖了LlamaParse的大多数必要配置

参考完整的LlamaParse文档，访问

https://developers.llamaindex.ai/python/cloud/llamaparse/api-v2-guide/

页面。

Scripting Best Practices

脚本编写最佳实践

Follow these guidelines when generating scripts:

生成脚本时请遵循以下准则：

1. Always Use the Top-Level

LlamaCloud

Client

1. 始终使用顶层

LlamaCloud

客户端

Use

LlamaCloud

(the API client) for all parsing operations:

typescript

import LlamaCloud from "@llamaindex/llama-cloud";

// Define a client
const client = new LlamaCloud({
  apiKey: process.env["LLAMA_CLOUD_API_KEY"], // This is the default and can be omitted
});

所有解析操作都使用

LlamaCloud

（API客户端）：

typescript

import LlamaCloud from "@llamaindex/llama-cloud";

// Define a client
const client = new LlamaCloud({
  apiKey: process.env["LLAMA_CLOUD_API_KEY"], // This is the default and can be omitted
});

2. Two-Step Upload → Parse Pattern

2. 两步式「上传→解析」流程

Always upload first to get a file ID, then parse using the file ID. Never pass raw file bytes directly to

parse()

typescript

import { readFile, writeFile } from "fs/promises";
import { basename } from "path";

// 1. Convert the file path into a File object
const buffer = await readFile(filePath);
const fileName = basename(filePath);
const file = new File([buffer], fileName);
// 2. Upload the file to the cloud
const fileObj = await client.files.create({
  file: file,
  purpose: "parse",
});
// 3. Get the file ID
const fileId = fileObj.id;
// 4. Use the file ID to parse the file
const result = await client.parsing.parse({
  tier: "agentic",
  version: "latest",
  file_id: fileId,
  ...
});

If the user already has a file ID (e.g. from a prior upload), skip the upload step and use it directly.

始终先上传文件获取文件ID，再使用该文件ID进行解析。切勿直接将原始文件字节传递给

parse()

方法。

typescript

import { readFile, writeFile } from "fs/promises";
import { basename } from "path";

// 1. Convert the file path into a File object
const buffer = await readFile(filePath);
const fileName = basename(filePath);
const file = new File([buffer], fileName);
// 2. Upload the file to the cloud
const fileObj = await client.files.create({
  file: file,
  purpose: "parse",
});
// 3. Get the file ID
const fileId = fileObj.id;
// 4. Use the file ID to parse the file
const result = await client.parsing.parse({
  tier: "agentic",
  version: "latest",
  file_id: fileId,
  ...
});

如果用户已拥有文件ID（例如来自之前的上传），则跳过上传步骤，直接使用该ID。

3. Choose the Right Tier

3. 选择合适的服务层级

Tier	When to Use
`fast`	Speed is the priority; simple documents
`cost_effective`	Budget-conscious; straightforward text extraction
`agentic`	Complex layouts, tables, mixed content (default recommendation)
`agentic_plus`	Advanced analysis, highest accuracy

Default to

agentic

unless the user specifies otherwise or the document is simple.

层级	使用场景
`fast`	优先考虑速度；适用于简单文档
`cost_effective`	注重成本控制；适用于直接提取文本
`agentic`	复杂布局、表格、混合内容（默认推荐）
`agentic_plus`	高级分析、最高精度

除非用户指定或文档内容简单，否则默认使用

agentic

层级。

4. Always Include the

expand

Parameter

4. 始终包含

expand

参数

The

expand

parameter controls what content is returned. Omitting it returns minimal data. Always specify exactly what you need:

Value	Returns
`text_full`	Plain text via `result.text_full`
`markdown_full`	Markdown via `result.markdown_full`
`items`	Page-level JSON via `result.items.pages`
`text_content_metadata`	Per-page text metadata
`markdown_content_metadata`	Per-page markdown metadata
`items_content_metadata`	Per-page items metadata
`images_content_metadata`	Image list with presigned URLs
`output_pdf_content_metadata`	Output PDF metadata
`xlsx_content_metadata`	Excel-specific metadata

Only request metadata

*_content_metadata

variants when you need presigned URLs or per-page detail — they increase payload size.

expand

参数控制返回的内容类型。省略该参数将仅返回最小化数据。请始终明确指定所需的内容：

值	返回内容
`text_full`	纯文本，通过 `result.text_full` 获取
`markdown_full`	Markdown格式内容，通过 `result.markdown_full` 获取
`items`	按页划分的JSON数据，通过 `result.items.pages` 获取
`text_content_metadata`	每页文本的元数据
`markdown_content_metadata`	每页Markdown内容的元数据
`items_content_metadata`	每页条目内容的元数据
`images_content_metadata`	包含预签名URL的图片列表
`output_pdf_content_metadata`	输出PDF的元数据
`xlsx_content_metadata`	Excel专属元数据

仅当你需要预签名URL或每页详细信息时，才请求

*_content_metadata

类的元数据——它们会增加负载大小。

5. Handle None Results Defensively

5. 防御性处理空结果

result.text_full

result.markdown_full

, and

result.items

may be

undefined

on failure. Always guard against this:

typescript

const text = result.text_full ?? "";
const markdown = result.markdown_full ?? "";

解析失败时，

result.text_full

、

result.markdown_full

和

result.items

可能为

undefined

。请始终对此进行防护：

typescript

const text = result.text_full ?? "";
const markdown = result.markdown_full ?? "";

6. Use Structured Options for Advanced Configuration

6. 使用结构化选项进行高级配置

Group options using the correct nested keys:

typescript

const result = await client.parsing.parse({
  tier: "agentic",
  version: "latest",
  file_id: fileId,
  input_options: {
    presentation: {
      skip_embedded_data: false,
    },
  },
  output_options: {
    images_to_save: ["screenshot"],
    markdown: {
      tables: { output_tables_as_markdown: true },
      annotate_links: true,
    },
  },
  processing_options: {
    specialized_chart_parsing: "agentic",
    ocr_parameters: { languages: ["de", "en"] },
  },
  agentic_options: {
    custom_prompt:
      "Extract text from the provided file and translate it from German to English.",
  },
  expand: [
    "markdown_full",
    "images_content_metadata",
    "markdown_content_metadata",
  ],
});

Use

agentic_options.custom_prompt

whenever the user wants to guide extraction (translation, summarization, structured extraction, etc.).

使用正确的嵌套键对选项进行分组：

typescript

const result = await client.parsing.parse({
  tier: "agentic",
  version: "latest",
  file_id: fileId,
  input_options: {
    presentation: {
      skip_embedded_data: false,
    },
  },
  output_options: {
    images_to_save: ["screenshot"],
    markdown: {
      tables: { output_tables_as_markdown: true },
      annotate_links: true,
    },
  },
  processing_options: {
    specialized_chart_parsing: "agentic",
    ocr_parameters: { languages: ["de", "en"] },
  },
  agentic_options: {
    custom_prompt:
      "Extract text from the provided file and translate it from German to English.",
  },
  expand: [
    "markdown_full",
    "images_content_metadata",
    "markdown_content_metadata",
  ],
});

当用户需要引导提取操作（翻译、摘要、结构化提取等）时，请使用

agentic_options.custom_prompt

。

7. Downloading Images Requires

httpx

and Auth

7. 下载图片需要

httpx

和身份验证

When

images_content_metadata

is in

expand

, download images via presigned URLs with Bearer auth:

typescript

if (result.images_content_metadata) {
  for (const image of result.images_content_metadata.images) {
    if (image.presigned_url) {
      const response = await fetch(image.presigned_url, {
        headers: {
          Authorization: `Bearer ${process.env["LLAMA_CLOUD_API_KEY"]}`,
        },
      });
      if (response.ok) {
        const content = await response.bytes();
        await writeFile(image.filename, content);
      }
    }
  }
}

当

expand

参数中包含

images_content_metadata

时，需要使用Bearer身份验证通过预签名URL下载图片：

typescript

if (result.images_content_metadata) {
  for (const image of result.images_content_metadata.images) {
    if (image.presigned_url) {
      const response = await fetch(image.presigned_url, {
        headers: {
          Authorization: `Bearer ${process.env["LLAMA_CLOUD_API_KEY"]}`,
        },
      });
      if (response.ok) {
        const content = await response.bytes();
        await writeFile(image.filename, content);
      }
    }
  }
}

8. Use the Node shebang

8. 添加Node Shebang

Every generated script should include the node shebang:

typescript

#!/usr/bin/env node

每个生成的脚本都应包含Node shebang：

typescript

#!/usr/bin/env node

Step 2 — Execute the Typescript Script

步骤2 — 执行Typescript脚本

Once the typescript script has been produced, you should:

Present the script to the user and ask for permissions to run it (depending on the current permissions settings)
Once you obtained permission to run, execute the script
Explore the results based on the user's requests

In order to run typescript scripts, it is highly recommended to use:
npx tsx script.ts
.

生成Typescript脚本后，你需要：

将脚本呈现给用户，并请求运行权限（取决于当前的权限设置）
获取运行权限后，执行该脚本
根据用户的请求分析解析结果

为了运行Typescript脚本，强烈建议使用：
npx tsx script.ts
。

llamaparse

Original

Translation

LlamaParse Skill

LlamaParse Skill

Initial Setup

初始设置

Step 0 — Install
`llama-cloud`
(optional)

步骤0 — 安装
`llama-cloud`
（可选）

Step 1 — Produce a Typescript Script

步骤1 — 生成Typescript脚本

Scripting Best Practices

脚本编写最佳实践

1. Always Use the Top-Level
`LlamaCloud`
Client

1. 始终使用顶层
`LlamaCloud`
客户端

2. Two-Step Upload → Parse Pattern

2. 两步式「上传→解析」流程

3. Choose the Right Tier

3. 选择合适的服务层级

4. Always Include the
`expand`
Parameter

4. 始终包含
`expand`
参数

5. Handle None Results Defensively

5. 防御性处理空结果

6. Use Structured Options for Advanced Configuration

6. 使用结构化选项进行高级配置

7. Downloading Images Requires
`httpx`
and Auth

7. 下载图片需要
`httpx`
和身份验证

8. Use the Node shebang

8. 添加Node Shebang

Step 2 — Execute the Typescript Script

步骤2 — 执行Typescript脚本

llamaparse

Original

Translation

LlamaParse Skill

LlamaParse Skill

Initial Setup

初始设置

Step 0 — Install llama-cloud (optional)

步骤0 — 安装llama-cloud（可选）

Step 1 — Produce a Typescript Script

步骤1 — 生成Typescript脚本

Scripting Best Practices

脚本编写最佳实践

1. Always Use the Top-Level LlamaCloud Client

1. 始终使用顶层LlamaCloud客户端

2. Two-Step Upload → Parse Pattern

2. 两步式「上传→解析」流程

3. Choose the Right Tier

3. 选择合适的服务层级

4. Always Include the expand Parameter

4. 始终包含expand参数

5. Handle None Results Defensively

5. 防御性处理空结果

6. Use Structured Options for Advanced Configuration

6. 使用结构化选项进行高级配置

7. Downloading Images Requires httpx and Auth

7. 下载图片需要httpx和身份验证

8. Use the Node shebang

8. 添加Node Shebang

Step 2 — Execute the Typescript Script

步骤2 — 执行Typescript脚本

Step 0 — Install
`llama-cloud`
(optional)

步骤0 — 安装
`llama-cloud`
（可选）

1. Always Use the Top-Level
`LlamaCloud`
Client

1. 始终使用顶层
`LlamaCloud`
客户端

4. Always Include the
`expand`
Parameter

4. 始终包含
`expand`
参数

7. Downloading Images Requires
`httpx`
and Auth

7. 下载图片需要
`httpx`
和身份验证