Loading...
Loading...
Compare original and translation side by side
python scripts/vl_caller.pypython scripts/vl_caller.pypython scripts/vl_caller.py --file-url "URL provided by user" --prettypython scripts/vl_caller.py --file-path "file path" --prettypython scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty--file-type 0--file-type 1--output<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json--output--stdoutResult saved to: /absolute/path/...--stdouttextresult[n].markdownresult[n].prunedResultpython scripts/vl_caller.py --file-url "用户提供的URL" --prettypython scripts/vl_caller.py --file-path "文件路径" --prettypython scripts/vl_caller.py --file-url "用户提供的URL" --file-type 0 --pretty--file-type 0--file-type 1--output<系统临时目录>/paddleocr/doc-parsing/results/result_<时间戳>_<id>.json--output--stdoutResult saved to: /absolute/path/...--stdouttextresult[n].markdownresult[n].prunedResulttexttextresult[n].markdownresult[n].prunedResultUser: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"texttextresult[n].markdownresult[n].prunedResult用户:"提取这份文档的所有文本"
Agent: 我已完成文档解析,以下是提取的全部文本:
[展示整个text字段内容或按阅读顺序拼接的所有区域]
文档统计信息:
- 总区域数:25
- 文本块数:15
- 表格数:3
- 公式数:2
质量:优秀(置信度:0.92)用户:"提取所有文本"
Agent: "我发现一份包含多个章节的文档,以下是开头部分:
'引言...'(为简洁起见截断内容)"{
"ok": true,
"text": "Full markdown/HTML text extracted from all pages",
"result": { ... }, // raw provider response
"error": null
}textresultresult[n].prunedResultresult[n].markdownRaw result location (default): the temp-file path printed by the script on stderr
{
"ok": true,
"text": "从所有页面提取的完整markdown/HTML文本",
"result": { ... }, // 原始服务商响应
"error": null
}textresultresult[n].prunedResultresult[n].markdown原始结果默认存储位置:脚本在标准错误输出中打印的临时文件路径
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--prettytextresult[n].markdownpython scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--prettyresult[n].prunedResultresult[n].markdownpython scripts/vl_caller.py \
--file-url "URL" \
--stdout \
--prettytextresult[n].prunedResultresult[n].markdownpython scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--prettytextresult[n].markdownpython scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--prettyresult[n].prunedResultresult[n].markdownpython scripts/vl_caller.py \
--file-url "URL" \
--stdout \
--prettytextresult[n].prunedResultresult[n].markdownCONFIG_ERROR: PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com- PADDLEOCR_DOC_PARSING_API_URL
- PADDLEOCR_ACCESS_TOKEN
- Optional: PADDLEOCR_DOC_PARSING_TIMEOUTPADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123PADDLEOCR_DOC_PARSING_API_URLpaddleocr.comPADDLEOCR_DOC_PARSING_API_URL/layout-parsingPADDLEOCR_ACCESS_TOKENCONFIG_ERROR: PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com- PADDLEOCR_DOC_PARSING_API_URL
- PADDLEOCR_ACCESS_TOKEN
- 可选:PADDLEOCR_DOC_PARSING_TIMEOUTPADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...这是我的API:https://xxx 和令牌:abc123PADDLEOCR_DOC_PARSING_API_URLpaddleocr.comPADDLEOCR_DOC_PARSING_API_URL/layout-parsingPADDLEOCR_ACCESS_TOKEN--file-url--file-pathpython scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"--file-url--file-pathpython scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"undefinedundefinedundefinedundefinederror: Authentication failederror: API quota exceedederror: Unsupported file formaterror: Authentication failederror: API quota exceedederror: Unsupported file formatreferences/output_schema.mdNote: Model version and capabilities are determined by your API endpoint ().PADDLEOCR_DOC_PARSING_API_URL
references/output_schema.md注意:模型版本和功能由API端点()决定PADDLEOCR_DOC_PARSING_API_URL
python scripts/smoke_test.pypython scripts/smoke_test.py