Paper Navigator
Find and read academic papers in four stages:
┌──────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Disambiguate │ → │ Discover │ → │ Evaluate │ → │ Read │
└──────────────┘ └──────────┘ └──────────┘ └──────────┘
↓
┌──────────────────┐
│ research-survey │ (for survey reports)
│ research-ideation│ (for idea generation)
└──────────────────┘
Setup: Scripts are in
skills/paper-navigator/scripts/
. Run via
python skills/paper-navigator/scripts/<name>.py
. Optional env vars for higher rate limits:
(Semantic Scholar),
(Jina Reader),
,
.
Step 0: Search Strategy Principles (MANDATORY)
Every discovery task MUST follow these principles before executing any workflow.
Query Reformulation
Before searching, decompose the user's topic and generate 4-6 variant queries covering distinct research angles. This is critical because different papers use different terminology for the same concept, and a single research topic often spans multiple sub-communities.
Step 1: Sub-topic decomposition. Identify 3-5 distinct research angles within the user's query. Most research topics span multiple perspectives:
- Empirical vs. theoretical — papers that observe/measure the phenomenon vs. papers that prove/explain it formally
- Mechanism vs. condition — papers about how something works vs. when/why it emerges
- Method keywords — different communities use different terms for the same concept (e.g., "gradient descent" vs. "meta-optimization" vs. "implicit learning")
- Adjacent formulations — the same idea framed differently (e.g., "in-context learning" vs. "few-shot learning" vs. "learning from demonstrations")
Step 2: Generate queries. Create at least one query per identified angle, using synonym substitution, specificity adjustment, and structural variants:
- Synonym substitution: "data pruning" → "data selection", "data filtering", "data curation"
- Specificity adjustment: broaden ("pretraining data quality") or narrow ("perplexity-based data pruning LLM")
- Structural variants: swap word order, add/remove qualifiers, use abbreviations
Example: User asks "how LLMs gain in-context learning during pretraining"
- Angles: (a) mechanistic/circuit, (b) training dynamics, (c) ICL-as-optimization theory, (d) data/task conditions, (e) formal theory
- Query 1:
"in-context learning emergence pretraining language model"
(general)
- Query 2:
"induction heads formation training transformer"
(mechanistic)
- Query 3:
"transformers learn in-context gradient descent meta-learning"
(optimization view)
- Query 4:
"pretraining task diversity data structure in-context learning"
(data conditions)
- Query 5:
"in-context learning theory linear attention generalization"
(formal theory)
Example: User asks "papers about data pruning for LLM pretraining"
- Angles: (a) selection methods, (b) quality metrics, (c) scaling effects
- Query 1:
"data pruning pretraining language model"
- Query 2:
"data selection pretraining LLM"
- Query 3:
"training data curation large language model quality"
- Query 4:
"data quality scoring pretraining scaling"
Multi-Source Parallel Search
Never rely on a single search source. For every discovery task, run at least 2 sources:
- Primary: (S2 with automatic arXiv fallback on rate limit)
- Secondary:
arxiv_monitor --keywords "<variants>" --match-mode flexible
for broader keyword coverage
- Tertiary (when S2 is rate limited): web search for recent blog posts/surveys that reference papers
CRITICAL — S2 parallelization rule:
- With set (100 req/min): You MAY run multiple calls in parallel.
- Without (100 req/5min, ~1 req/3s): You MUST run calls sequentially, one at a time. Parallel S2 calls without a key will exhaust the rate limit immediately, causing all calls to fail with 429 and fall back to the lower-quality arXiv search. This applies to ALL S2-dependent scripts: , , , , .
- How to check: Before starting discovery, run or check if the env var is set. If empty, switch to sequential mode.
- arXiv-only scripts () are NOT affected by this rule and can always run in parallel with other calls.
Rate-Limit-Aware Fallback Chain
When Semantic Scholar returns 429 or empty results:
- automatically falls back to arXiv (built-in since v1.2)
- Use with for broader coverage
- Switch to web search for blog posts, surveys, GitHub repos that reference papers
- Space S2-dependent calls (, ) at least 5s apart and reduce
Prevention is better than fallback: The arXiv fallback produces lower-quality results (no citation counts, less precise relevance ranking). To avoid triggering it, always follow the S2 parallelization rule above — run S2 calls sequentially when no API key is set.
Mandatory Citation Expansion (for multi-paper discovery tasks)
After finding ≥3 relevant seed papers, you MUST expand coverage using the citation graph. The goal is to discover papers that keyword search cannot reach.
Seed selection: Rank all found relevant papers by citation count. Pick the top 3 as primary seeds.
Expansion steps (all mandatory):
- Co-citation on the single highest-cited seed:
citation_traverse --direction co-citation --limit 15
— this is the strongest signal for finding closely related work that uses different terminology
- Forward citations on the top 2 seeds:
citation_traverse --direction forward --limit 20
— finds follow-up work
- Backward citations on 1-2 seeds whose topic coverage differs:
citation_traverse --direction backward --limit 20
— finds foundational and adjacent work that seeds build on. Pick seeds from different sub-topics to maximize coverage breadth
- Recommendations with diverse seeds:
recommend --positive <seed1>,<seed2>,<seed3>
— serendipitous discovery of semantically related work not connected by citations
Seed diversity principle: When selecting seeds for backward traversal or recommendations, prefer seeds from different sub-topics identified in query reformulation. This prevents the citation graph from staying within a single research community.
Applies to: WF1 (Survey), WF3 (Quick Search with >10 results), WF5 (Track Developments), WF9 (Ideation), WF10 (User-specified count).
Does NOT apply to: WF2 (Find specific paper), WF7 (Read paper by URL).
Coverage Gap Check (for multi-paper discovery tasks)
After initial search + citation expansion, review the collected papers against the sub-topics identified during query reformulation.
For each sub-topic angle:
- Count how many collected papers address it
- If a sub-topic has 0-1 papers, run a targeted with a query specific to that angle
- If targeted search finds new relevant papers, optionally run one more or round on the new finds
This step catches systematic blind spots where an entire research perspective was missed by all prior queries. It is lightweight — typically 1-2 additional searches for gaps, not a full re-search.
Applies to: Same workflows as Mandatory Citation Expansion.
Step 1: Classify Intent and Select Workflow
Start here. Determine what the user wants and route to the right workflow. Match complexity to intent — simple queries get simple answers.
| Intent | Signal | Workflow | Complexity |
|---|
| Find a specific paper | Title, author name, or URL | WF 2 | Single search call |
| Quick paper search | "give me papers about X", "find papers on X" | WF 3 | Single search call |
| Metadata search | Author + year, venue filter | WF 4 | Single search + filter |
| Track recent advances | "latest", "recent", "what's new" | WF 5 | 1-2 calls |
| Find a baseline | Code, SOTA, implementation | WF 6 | Search + code check |
| Read a paper | URL or "read this paper" | WF 7 | Fetch + read |
| Ambiguous term | Project name, module name, nickname | WF 8 | Web search + resolve |
| Literature survey | "survey X", comprehensive coverage | WF 1 → then hand off to | Iterative collection |
| Related work map | Connections between papers | WF 1 | Citation traversal |
| Ideation support | Called from research-ideation | WF 9 | Iterative + strict filter |
| User-specified count | "find me exactly N papers about X" | WF 10 | Adaptive |
Key principle: Simple "find me papers about X" queries should return results from a single search call, not trigger the full iterative collection workflow. Only use iterative expansion for comprehensive surveys or ideation support.
Step 2: Resolve Ambiguous Terms (if needed)
When the user's query might be a colloquial name, project name, or module name (rather than a paper title):
- Quick academic search — Try with the exact query
- If zero results — Broaden the search:
- Web search: Find GitHub repos, blog posts, or social media that reveal the actual paper title or arXiv ID
- GitHub search:
github_search.py --query "USER_QUERY"
— repos often link to papers
- Extract identifiers — Actual paper title, arXiv ID, GitHub repo URL, author names
- Re-enter the appropriate workflow with resolved identifiers
Example disambiguation report:
🔍 Disambiguation Report for "deepseek engram"
├── Intent: Track recent advances (ambiguous term)
├── Resolution: "Engram" is a module name from DeepSeek AI
│ ├── Actual paper: "Conditional Memory via Scalable Lookup" (ArXiv:2601.07372)
│ └── GitHub: https://github.com/deepseek-ai/Engram
└── Search Plan:
├── scholar_search --query "Conditional Memory Scalable Lookup" --sort-by year
├── citation_traverse --paper-id ArXiv:2601.07372 --direction forward
└── github_search --query "deepseek engram"
Standard Output Formats
Use these formats when presenting results to the user. Match the format to the intent.
Format A: Single Paper Card (for navigational search, WF 2)
📄 **Highly accurate protein structure prediction with AlphaFold**
Authors: Jumper et al.
Year: 2021 | Venue: Nature
Citations: 25,000+
DOI: 10.1038/s41586-021-03819-2 | S2 ID: 235959867
Link: https://doi.org/10.1038/s41586-021-03819-2
TLDR: End-to-end neural network for protein structure prediction achieving atomic accuracy...
Format B: Paper List Table (for quick search, metadata search, trending — WF 3/4/5)
| # | Title | Authors | Year | Venue | Citations | ID |
|---|-------|---------|------|-------|-----------|-----|
| 1 | Paper Title | First Author et al. | 2024 | NeurIPS | 150 | arXiv:2401.xxxxx |
| 2 | ... | ... | ... | ... | ... | ... |
After the table, briefly note how many results were found and whether the list was filtered.
Format C: Baseline Recommendation (for baseline hunt, WF 6)
📦 **Recommended Baseline: [Model Name]**
Paper: [Title] ([Year], [Venue]) — [arXiv ID]
Code: [GitHub URL] ⭐ [stars] | Framework: [PyTorch/TF]
Performance: [key metric = value] on [dataset]
HuggingFace: [model page URL] | Downloads: [N]
Format D: Reading Notes (for read a paper, WF 7)
Use the template at
assets/paper-summary-template.md
. Save to
/artifacts/paper-notes/{paper-id}.md
.
Format E: Disambiguation Report (for ambiguous queries, WF 8)
🔍 Disambiguation Report for "[query]"
├── Intent: [classified intent]
├── Resolution: [what the term actually refers to]
│ ├── Paper: [resolved title] ([arXiv ID])
│ └── Code: [GitHub URL]
└── Search Plan:
├── [script call 1]
└── [script call 2]
Common Workflows
Workflow 1: Collect Papers for Survey
"Help me survey CRISPR-based gene therapy for sickle cell disease"
Use iterative collection (target 30-80 papers). See
Appendix A for the full iterative methodology.
- Discover: Initial
scholar_search --query "CRISPR gene therapy sickle cell" --limit 20 --sort-by citations
→ iterative expansion with EXPLORE/EXPLOIT strategy → citation_traverse --direction forward
on seminal papers
- Evaluate: Review each paper's title + abstract for relevance → filter by abstract quality → prefer top-tier venues → shortlist
- Read: for key papers → L2 reading → notes using
assets/paper-summary-template.md
- Hand off to to synthesize the collected papers into a structured survey report
Workflow 2: Navigational Search
"Find me the attention is all you need paper"
"Find me the original GPT 3 paper"
- Discover:
scholar_search --query "Attention Is All You Need"
— single call, return top result
- Output: Use Format A (Single Paper Card)
Do NOT proceed to Read unless the user explicitly asks.
Workflow 3: Quick Paper Search
"Give me papers about perovskite solar cell stability under humidity"
"Find papers on gut microbiome modulation for autoimmune diseases"
- Sub-topic decomposition + query reformulation: Identify 3-5 research angles within the topic, generate 4-6 variant queries covering distinct angles (see Step 0)
- Discover: Run
scholar_search --query "<variant>" --limit 20 --sort-by relevance
on each variant. If is set, parallelize these calls; if not, run them sequentially one at a time to avoid rate-limit exhaustion (see "S2 parallelization rule" in Step 0). Also run arxiv_monitor --keywords "<variants>" --match-mode flexible
for additional coverage (arXiv calls can always run in parallel with other non-S2 calls)
- Citation expansion (if initial results ≥ 3 relevant papers): Follow Mandatory Citation Expansion (Step 0) — co-citation on highest-cited seed, forward on top 2, backward on 1-2 diverse seeds, recommend with 3 seeds
- Coverage gap check: Review collected papers against identified sub-topics. Run targeted searches for any uncovered angles (see Step 0)
- Filter: Review all results, deduplicate, keep relevant papers based on title + abstract
- Output: Use Format B (Paper List Table)
Only escalate to full iterative workflow (WF1) if results are clearly insufficient or the user explicitly asks for more.
Workflow 4: Metadata Search
"2012 papers by David Harel"
"Papers by David Harel from 2020 to 2022"
"Journal articles by David Harel from 2020 to 2022"
- Parse query: Extract author name, year range, venue type (journal/conference)
- Discover:
author_search --name "David Harel" --papers --limit 50 --sort-by year
- Filter: Year range, venue type (check field), other attributes
- Output: Use Format B (Paper List Table)
For keyword + year filter (no author):
scholar_search --query "<keywords>" --year-min YYYY --year-max YYYY
Workflow 5: Track Field Developments
"What's new in condensed matter physics this week?"
- Discover:
arxiv_monitor --categories cond-mat --days 7
(see references/arxiv-categories.md
for codes) + trending --query "topological insulator" --period 30
- Output: Use Format B (Paper List Table), highlight high-potential papers with TLDRs
Workflow 6: Find a Baseline with Code
"I need a baseline for protein structure prediction with code"
- Discover:
scholar_search --query "protein structure prediction" --sort-by citations
- Evaluate: on top results +
sota --task "protein-structure-prediction"
→ pick one with official code + high downloads
- Output: Use Format C (Baseline Recommendation)
Workflow 7: Read a Paper by URL
"Read this paper: arxiv.org/abs/2301.12345"
Output: Use Format D (Reading Notes)
- Fetch:
fetch_paper --url "https://arxiv.org/abs/2301.12345"
- Choose reading depth (see
references/reading-strategy.md
):
| Level | Goal | When to use | Effort |
|---|
| L1 Technical | Can reimplement | Building directly on this paper | High |
| L2 Analytical | Understand motivation + design choices | Most papers in a survey | Medium |
| L3 Contextual | Know what it is and where it fits | Quick scanning | Low |
- Take notes using
assets/paper-summary-template.md
. Save to /artifacts/paper-notes/{paper-id}.md
.
Workflow 8: Ambiguous Query Resolution
"Find the latest about deepseek engram"
- Disambiguate: Follow Step 2 above
- Discover: with resolved title + with original term + on arXiv ID
- Evaluate: Review results, check code via or GitHub
- Read: for top papers
- If user wants a survey: hand off to
Workflow 9: Ideation Support (called from research-ideation)
research-ideation Step 2 needs papers to build a literature tree
Iterative collection with strict filter (target 30-50 papers, recent 2020+). See
Appendix A and
Appendix B.
- Disambiguate: Parse the research goal → extract domain + method type
- Discover: Initial broad search (60 candidates) → iterative expansion up to 15 rounds:
- EXPLORE: new keyword queries for diverse sub-areas
- EXPLOIT: or on strongly relevant papers
- Evaluate: Only keep strongly relevant papers. Prefer top-tier venues + 2020+ papers.
- Deduplicate: Track seen titles and abstracts.
- Output: 30-50 high-quality papers → feed into novelty tree + challenge-insight tree.
Workflow 10: User-Specified Paper Count
"Find me exactly 15 papers about reinforcement learning from human feedback"
- Use the user's number as the target
- Apply the closest profile's quality settings
- Run iterative collection until target met or max iterations exhausted
- If not enough, progressively relax relevance standard and inform the user
Discovery Paths (Stage 1 Detail)
Seven paths, used by workflows above.
Path A: Keyword Search (most common)
bash
python scripts/scholar_search.py --query "transformer attention mechanism" --limit 20 --sort-by citations
Options:
,
,
--sort-by relevance|citations|year
.
Path B: Citation Traversal
bash
# Forward — who cited this paper
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction forward --limit 20
# Backward — what this paper cites
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction backward --limit 20
# Co-citation — papers frequently cited alongside this one (most powerful for finding related work)
python scripts/citation_traverse.py --paper-id ArXiv:1706.03762 --direction co-citation --limit 15
Path C: Recommendations
bash
python scripts/recommend.py --positive ArXiv:1706.03762,ArXiv:2005.14165 --limit 15
python scripts/recommend.py --positive ArXiv:1706.03762 --negative ArXiv:2301.00001 --limit 10
Path D: Author Tracking
bash
python scripts/author_search.py --name "Geoffrey Hinton" --papers --limit 20 --sort-by citations
Path E: arXiv Monitoring
bash
python scripts/arxiv_monitor.py --categories cs.CL,cs.AI --days 3 --limit 30
python scripts/arxiv_monitor.py --keywords "chain of thought,reasoning" --days 7
python scripts/arxiv_monitor.py --keywords "data pruning pretraining" --match-mode flexible --days 365
Options:
(default, AND-of-words for better recall) or
(phrase matching for precision). See
references/arxiv-categories.md
for category codes.
Path F: Trending Detection
bash
python scripts/trending.py --query "large language models" --period 90 --limit 15
Ranks by citation velocity (citations/month).
Path G: GitHub Search
bash
python scripts/github_search.py --query "deepseek engram" --limit 10
python scripts/github_search.py --query "mamba state space model" --sort stars
Useful when papers haven't been published on arXiv yet or industry labs release code before papers.
Citation Graph Visualization
After traversal, visualize with Mermaid (keep ≤30 nodes):
mermaid
graph TD
SEED["Attention Is All You Need<br/>2017 · 100k+"]
A["BERT · 2018"] --> SEED
B["GPT-2 · 2019"] --> SEED
C["Vision Transformer · 2020"] --> SEED
Evaluation Tools (Stage 2 Detail)
Quick Assessment (from scholar_search output)
| Signal | What it tells you |
|---|
| TLDR | One-sentence understanding |
| Citation count | Overall impact |
| Influential citations | Quality of impact |
| Year + venue | Recency and authority |
| Open Access PDF | Whether you can read full text |
Code Availability
bash
python scripts/find_code.py --arxiv-id 1706.03762
Top Models by Task
bash
python scripts/sota.py --task "text-generation" --limit 10
python scripts/sota.py --task "translation" --list-tasks
Dataset Discovery
bash
python scripts/dataset_search.py --query "sentiment analysis" --limit 10
Reproducibility Assessment
| Dimension | Check |
|---|
| Code | Open-source? Official? Stars? Last update? |
| Results | Reproduced on SOTA leaderboard? |
| Data | Dataset publicly available? |
| Overall | High / Medium / Low / None |
After Collecting Papers: Next Steps
| Goal | Hand off to |
|---|
| Generate a literature survey report | — synthesizes papers into a structured 8-section report |
| Generate research ideas | — builds novelty tree + challenge-insight tree from papers |
| Write a Related Work section | — uses paper notes as input |
Quick Report (optional, stays in paper-navigator)
For a brief summary table without a full survey report, use
:
bash
python scripts/literature_report.py --paper-ids ArXiv:2601.07372,ArXiv:2501.12948 --intent quick_scan
| Intent | Output |
|---|
| Brief table: title, authors, year, citations, TLDR |
| Code availability, SOTA position, dataset access, reproducibility |
For full survey reports (
,
intents), use
instead.
Appendix A: Iterative Collection Workflow
For workflows requiring many papers (survey, ideation support), use iterative expand-and-filter:
1. Parse query → extract goal, search terms, key term definitions
2. Define task attributes → identify domain + method type
3. Initial search → scholar_search with broad query
4. Review each paper's title + abstract → judge relevance (keep/reject)
5. LOOP until target met or max iterations reached:
a. From kept papers, pick the most relevant as "grounding set"
b. Generate next search query:
- EXPLORE: new keyword query to broaden coverage
- EXPLOIT: citation_traverse or recommend on a high-relevance paper
c. Fetch new papers → review → deduplicate → add to collection
6. Final filter: apply quality checks, take top N
Relevance judging: You (Claude) evaluate each paper directly from title + abstract against the user's goal. No separate API call needed.
Deduplication: Track seen titles (normalized) and abstract prefixes. Skip already-evaluated papers.
Quality filtering:
- Skip papers with very short abstracts (< 20 words)
- For ideation/survey: prefer top-tier venues and journals in the user's field (e.g., Nature, Science, Cell, Lancet, PNAS for broad science; field-specific top venues like NeurIPS/ICML for ML, Physical Review Letters for physics, JACS for chemistry, etc.)
- For ideation: prefer 2020+ papers; include older only if foundational
Appendix B: Ideation vs Survey Collection
| Aspect | Ideation Support | Literature Survey |
|---|
| Goal | Find gaps and transferable techniques | Comprehensive field coverage |
| Relevance standard | Strict — only strongly relevant | Moderate — include tangentially relevant |
| Recency | Strong bias toward 2020+ | Include foundational older work |
| Initial search size | 60 candidates | 20 candidates |
| Coverage strategy | Deep on core topic + cross-domain | Balanced across sub-topics |
| Output use | Novelty tree + challenge-insight tree | Comprehensive report |
Appendix C: Script & API Reference
All scripts output Markdown to stdout, errors to stderr. Common flags:
,
.
Paper ID Formats
Scripts accept and normalize automatically: S2 ID, arXiv (
or
or URL), DOI (
).
Rate Limits
| API | Without key | With key | When rate limited |
|---|
| Semantic Scholar | 100 req/5min (~1 req/3s); NO parallel calls | 100 req/min; parallel OK | Auto-fallback to arXiv in ; global pacer enforces 3s interval |
| arXiv | 1 req/3s (courtesy) | N/A | Primary fallback when S2 is limited; no auth needed |
| Jina Reader | Free tier | Higher with key | — |
| HuggingFace | 500 req / 300s | Higher with | — |
| GitHub | 10 req/min | 5,000 req/hr (set ) | — |
All scripts retry on 429 and 5xx errors with exponential backoff (3s, 6s, 12s, 24s, 48s — 5 retries). A global S2 request pacer enforces minimum interval between Semantic Scholar API calls to prevent budget exhaustion.
For detailed API endpoints, query parameters, and field specifications, see
references/api-reference.md
.
Integration
- research-survey: After collecting papers, hand off to research-survey for structured survey report generation (8-section goal-centric synthesis).
- research-ideation: After collecting papers, hand off to research-ideation for idea generation (novelty tree + challenge-insight tree + problem selection + solution design).
- experiment-pipeline: After finding a baseline via Workflow 6, hand off to experiment-pipeline.
- paper-writing: Paper notes serve as input for paper-writing's Related Work section.