Loading...
Loading...
Compare original and translation side by side
I'm ready to use LlamaParse to parse files. Before we begin, please confirm that:
- `LLAMA_CLOUD_API_KEY` is set as environment variable within the current environment
- `@llamaindex/llama-cloud@latest` is installed and available within the current Node environment
If both of them are set, please provide:
1. One or more files to be parsed
2. Specific parsing options, such as tier, API version, custom prompt, processing options...
3. Any requests you might have regarding the parsed content of the file.
I will produce a Typescript script to run the parsing job and, once you approved its execution, I will report the results back to you based on your request.I'm ready to use LlamaParse to parse files. Before we begin, please confirm that:
- `LLAMA_CLOUD_API_KEY` is set as environment variable within the current environment
- `@llamaindex/llama-cloud@latest` is installed and available within the current Node environment
If both of them are set, please provide:
1. One or more files to be parsed
2. Specific parsing options, such as tier, API version, custom prompt, processing options...
3. Any requests you might have regarding the parsed content of the file.
I will produce a Typescript script to run the parsing job and, once you approved its execution, I will report the results back to you based on your request.llama-cloudllama-cloud@llamaindex/llama-cloudnpm install @llamaindex/llama-cloud@latest@llamaindex/llama-cloudnpm install @llamaindex/llama-cloud@latesthttps://developers.llamaindex.ai/python/cloud/llamaparse/api-v2-guide/https://developers.llamaindex.ai/python/cloud/llamaparse/api-v2-guide/LlamaCloudLlamaCloudLlamaCloudimport LlamaCloud from "@llamaindex/llama-cloud";
// Define a client
const client = new LlamaCloud({
apiKey: process.env["LLAMA_CLOUD_API_KEY"], // This is the default and can be omitted
});
LlamaCloudimport LlamaCloud from "@llamaindex/llama-cloud";
// Define a client
const client = new LlamaCloud({
apiKey: process.env["LLAMA_CLOUD_API_KEY"], // This is the default and can be omitted
});
parse()import { readFile, writeFile } from "fs/promises";
import { basename } from "path";
// 1. Convert the file path into a File object
const buffer = await readFile(filePath);
const fileName = basename(filePath);
const file = new File([buffer], fileName);
// 2. Upload the file to the cloud
const fileObj = await client.files.create({
file: file,
purpose: "parse",
});
// 3. Get the file ID
const fileId = fileObj.id;
// 4. Use the file ID to parse the file
const result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
...
});parse()import { readFile, writeFile } from "fs/promises";
import { basename } from "path";
// 1. Convert the file path into a File object
const buffer = await readFile(filePath);
const fileName = basename(filePath);
const file = new File([buffer], fileName);
// 2. Upload the file to the cloud
const fileObj = await client.files.create({
file: file,
purpose: "parse",
});
// 3. Get the file ID
const fileId = fileObj.id;
// 4. Use the file ID to parse the file
const result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
...
});| Tier | When to Use |
|---|---|
| Speed is the priority; simple documents |
| Budget-conscious; straightforward text extraction |
| Complex layouts, tables, mixed content (default recommendation) |
| Advanced analysis, highest accuracy |
agentic| 层级 | 使用场景 |
|---|---|
| 优先考虑速度;适用于简单文档 |
| 注重成本控制;适用于直接提取文本 |
| 复杂布局、表格、混合内容(默认推荐) |
| 高级分析、最高精度 |
agenticexpandexpandexpand| Value | Returns |
|---|---|
| Plain text via |
| Markdown via |
| Page-level JSON via |
| Per-page text metadata |
| Per-page markdown metadata |
| Per-page items metadata |
| Image list with presigned URLs |
| Output PDF metadata |
| Excel-specific metadata |
*_content_metadataexpand| 值 | 返回内容 |
|---|---|
| 纯文本,通过 |
| Markdown格式内容,通过 |
| 按页划分的JSON数据,通过 |
| 每页文本的元数据 |
| 每页Markdown内容的元数据 |
| 每页条目内容的元数据 |
| 包含预签名URL的图片列表 |
| 输出PDF的元数据 |
| Excel专属元数据 |
*_content_metadataresult.text_fullresult.markdown_fullresult.itemsundefinedconst text = result.text_full ?? "";
const markdown = result.markdown_full ?? "";result.text_fullresult.markdown_fullresult.itemsundefinedconst text = result.text_full ?? "";
const markdown = result.markdown_full ?? "";const result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
input_options: {
presentation: {
skip_embedded_data: false,
},
},
output_options: {
images_to_save: ["screenshot"],
markdown: {
tables: { output_tables_as_markdown: true },
annotate_links: true,
},
},
processing_options: {
specialized_chart_parsing: "agentic",
ocr_parameters: { languages: ["de", "en"] },
},
agentic_options: {
custom_prompt:
"Extract text from the provided file and translate it from German to English.",
},
expand: [
"markdown_full",
"images_content_metadata",
"markdown_content_metadata",
],
});agentic_options.custom_promptconst result = await client.parsing.parse({
tier: "agentic",
version: "latest",
file_id: fileId,
input_options: {
presentation: {
skip_embedded_data: false,
},
},
output_options: {
images_to_save: ["screenshot"],
markdown: {
tables: { output_tables_as_markdown: true },
annotate_links: true,
},
},
processing_options: {
specialized_chart_parsing: "agentic",
ocr_parameters: { languages: ["de", "en"] },
},
agentic_options: {
custom_prompt:
"Extract text from the provided file and translate it from German to English.",
},
expand: [
"markdown_full",
"images_content_metadata",
"markdown_content_metadata",
],
});agentic_options.custom_prompthttpxhttpximages_content_metadataexpandif (result.images_content_metadata) {
for (const image of result.images_content_metadata.images) {
if (image.presigned_url) {
const response = await fetch(image.presigned_url, {
headers: {
Authorization: `Bearer ${process.env["LLAMA_CLOUD_API_KEY"]}`,
},
});
if (response.ok) {
const content = await response.bytes();
await writeFile(image.filename, content);
}
}
}
}expandimages_content_metadataif (result.images_content_metadata) {
for (const image of result.images_content_metadata.images) {
if (image.presigned_url) {
const response = await fetch(image.presigned_url, {
headers: {
Authorization: `Bearer ${process.env["LLAMA_CLOUD_API_KEY"]}`,
},
});
if (response.ok) {
const content = await response.bytes();
await writeFile(image.filename, content);
}
}
}
}#!/usr/bin/env node#!/usr/bin/env nodeIn order to run typescript scripts, it is highly recommended to use:.npx tsx script.ts
为了运行Typescript脚本,强烈建议使用:。npx tsx script.ts