Loading...
Loading...
Download PDFs (when available) and extract plain text to support full-text evidence, writing `papers/fulltext_index.jsonl` and `papers/fulltext/*.txt`. **Trigger**: PDF download, fulltext, extract text, papers/pdfs, 全文抽取, 下载PDF. **Use when**: `queries.md` 设置 `evidence_mode: fulltext`(或你明确需要全文证据)并希望为 paper notes/claims 提供更强 evidence。 **Skip if**: `evidence_mode: abstract`(默认);或你不希望进行下载/抽取(成本/权限/时间)。 **Network**: fulltext 下载通常需要网络(除非你手工提供 PDF 缓存在 `papers/pdfs/`)。 **Guardrail**: 缓存下载到 `papers/pdfs/`;默认不覆盖已有抽取文本(除非显式要求重抽)。
npx skill4agent add willoscar/research-units-pipeline-skills pdf-text-extractorpapers/core_set.csvpaper_idtitlepdf_urlarxiv_idurloutline/mapping.tsvpapers/fulltext_index.jsonlpapers/pdfs/<paper_id>.pdfpapers/fulltext/<paper_id>.txtqueries.mdevidence_mode: "abstract" | "fulltext"abstractfulltextpapers/fulltext/papers/pdfs/<paper_id>.pdf<paper_id>papers/core_set.csv- evidence_mode: "fulltext"queries.mdpython .codex/skills/pdf-text-extractor/scripts/run.py --workspace <ws> --local-pdfs-onlyoutput/MISSING_PDFS.mdpapers/missing_pdfs.csvpapers/core_set.csvoutline/mapping.tsvpdf_urlpdf_urlarxiv_idurlpapers/pdfs/<paper_id>.pdfpapers/fulltext/<paper_id>.txtpapers/fulltext_index.jsonl.txtpapers/fulltext_index.jsonlevidence_mode: "fulltext"evidence_mode: "abstract"python .codex/skills/pdf-text-extractor/scripts/run.py --helppython .codex/skills/pdf-text-extractor/scripts/run.py --workspace <workspace_dir>--max-papers <n>queries.md--max-pages <n>--min-chars <n>--sleep <sec>--local-pdfs-onlypapers/pdfs/<paper_id>.pdfqueries.mdevidence_modefulltext_max_papersfulltext_max_pagesfulltext_min_chars- evidence_mode: "abstract"queries.mdpapers/fulltext_index.jsonl- evidence_mode: "fulltext"queries.mdpapers/pdfs/python .codex/skills/pdf-text-extractor/scripts/run.py --workspace <ws> --local-pdfs-onlypython .codex/skills/pdf-text-extractor/scripts/run.py --workspace <ws> --max-papers 20 --max-pages 4 --min-chars 1200papers/pdfs/papers/fulltext/.txtevidence_mode: abstractpapers/pdfs/--local-pdfs-onlyabstract