Research Collector
This skill does only one thing:
- Batch collect YouTube videos + web articles on a specific topic, feed them into NotebookLM, run analysis queries, and save the results to a local directory (default , configurable)
It does NOT handle:
- Writing the final article (leave this to your own writing tools / skills)
- Choosing main titles
- Downloading videos (use the skill in this repository)
- Publishing to multiple platforms (use the skill in this repository)
One-sentence principle: When users say "help me collect materials on topic X" or "pull a batch of YouTube videos + articles into NotebookLM", follow this fixed workflow instead of redesigning it every time.
When To Use
Applicable scenarios:
- Users need to conduct background research before writing a recommendation/review/opinion article on a topic
- Users say "help me find popular YouTube videos and articles about X"
- Users say "collect them into NotebookLM for analysis"
- Users say "organize a material research report on topic X for me"
Inapplicable scenarios:
- Users already have a clear list of materials and only want summaries → directly run
- Users want to conduct real-time conversational research without persisting to a notebook → use WebSearch + WebFetch
- Users only need to download a single video → use
Preconditions
Must confirm the following before starting:
- CLI is installed and logged in:
- is in PATH:
- Users have clearly specified the topic and angle
- The output directory is writable (default , can be configured via the environment variable or directly specified in the conversation)
If preconditions are not met:
- If fails → ask the user to run ; session validity is ~20 minutes
- If is not installed → stop and inform the user
Working Rules
- Align the topic, angle, and volume with the user before taking action
- Default to 15 results per ytsearch, adjust as needed
- NotebookLM deep research can only run one task at a time, no concurrency allowed
- Sleep for 2 seconds between adding each source to avoid rate limiting
- All outputs (raw JSON + summary markdown) are saved to (or the user-specified directory)
- This skill only handles collection and analysis; do not automatically proceed to write the final article
- Do not delete the notebook, as users may need to run queries later
Core Workflow
Phase 0: Align Objectives
Before starting, you must confirm with the user:
- What is the topic (a keyword phrase that can be directly used for ytsearch)
- Angle (e.g., "most commonly used + personal creation" vs "latest release + technical details")
- Notebook name (default: "<Topic> Materials")
- Volume (default: 15 YouTube videos + ~40 web articles from NotebookLM deep research)
Phase 1: Create Notebook + Set Alias
bash
nlm notebook create "<Topic> Materials"
# Extract ID from output, then:
nlm alias set <short-name> <notebook-id>
Use a short alias, such as
or
, and use the alias for all subsequent commands.
Phase 2: Search for Popular YouTube Videos with yt-dlp ytsearch
Run 2-3 searches with different angles in parallel, 15 results each:
bash
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
"ytsearch15:<Keyword A>"
yt-dlp --simulate --print "%(title)s|%(webpage_url)s|%(view_count)s|%(uploader)s" \
"ytsearch15:<Keyword B>"
Ignore JS runtime warnings in the output.
Filter the top 15 results using the following rules:
- Remove duplicates (same video appearing in multiple searches)
- Prioritize official accounts (e.g., Anthropic, OpenAI, etc.)
- Sort by view count from highest to lowest, but reserve 2-3 mid-tier videos with vertical angles to avoid all being blockbuster press releases
- Keep at least 5 results for each angle
Phase 3: Add YouTube Videos as Sources
Use a bash loop to add them one by one, sleeping for 2 seconds each time:
bash
cat > /tmp/yt_urls.txt <<'EOF'
https://www.youtube.com/watch?v=XXX1
https://www.youtube.com/watch?v=XXX2
...
EOF
while IFS= read -r url; do
echo "=== Adding: $url ==="
nlm source add <alias> --url "$url" 2>&1 | tail -5
sleep 2
done < /tmp/yt_urls.txt
Occasionally, individual additions may fail (video not public, region-restricted), ignore and continue, then report the success rate at the end.
Phase 4: Run NotebookLM Deep Research to Discover Web Articles
bash
nlm research start "<English query suitable for web research>" \
--notebook-id <alias> --mode deep
Deep mode takes ~5 minutes and returns ~40 web sources.
Key: Only one research task can run in a notebook at the same time. If you want to run a second round, you must wait for the first round to finish importing or use
.
Wait for completion:
bash
nlm research status <alias> --max-wait 360
The Bash tool has a default timeout of 120 seconds; you must add
(i.e., 400 seconds).
Phase 5: Import Research Results
After the research is completed, get the task-id from the output, then:
bash
nlm research import <alias> <task-id> --timeout 600
Add
to the Bash tool call.
Note: If the user says "enough materials, no need to import more", stop and proceed directly to Phase 6.
Phase 6: Run 3 Analysis Queries
By default, run queries from 3 angles, redirect commands directly to files to avoid excessive output:
bash
mkdir -p "./research/<topic>"
nlm notebook query <alias> "<Chinese prompt for question 1>" \
> "./research/<topic>/query1-<slug>-raw.json" 2>&1
nlm notebook query <alias> "<Chinese prompt for question 2>" \
> "./research/<topic>/query2-<slug>-raw.json" 2>&1
nlm notebook query <alias> "<Chinese prompt for question 3>" \
> "./research/<topic>/query3-<slug>-raw.json" 2>&1
Add to each Bash query call.
Default 3 query templates (modify keywords as needed):
- Top List: "Based on all sources, please list the Top 10 X recommended by the most sources. For each X, explain: (1) Name (2) What it does specifically (3) Main usage scenarios (4) Number of sources recommending it (5) Type classification. Sort by recommendation frequency from highest to lowest, output in Chinese."
- Target Audience-Oriented: "I want to write an article for <audience portrait>. Please filter the Top 8 X that are most helpful to <audience>, explain each with: (1) Name (2) Specific pain points (3) One-sentence typical usage (4) Type (5) Most specific source number. Remove irrelevant content, focus on <scenario>, output in Chinese."
- Getting Started + Pitfalls: "For <audience> using X, please summarize: (1) Fastest way to get started (2) Where to obtain it (3) 5 easiest pitfalls to fall into (4) When it's actually not needed (5) Latest important updates. Attach source numbers to each point, output in Chinese."
Phase 7: Extract Answer Field and Generate Summary Markdown
The raw output is JSON containing answer + citations; use Python to extract the
field:
bash
python3 <<'PY'
import json, pathlib
base = pathlib.Path("./research/<topic>")
files = [
("query1-<slug>-raw.json", "## Query 1:<Title>"),
("query2-<slug>-raw.json", "## Query 2:<Title>"),
("query3-<slug>-raw.json", "## Query 3:<Title>"),
]
out = ["# <Topic> Material Research", "",
"> Analysis results based on NotebookLM notebook `<notebook-name>`", "",
"---", ""]
for fname, heading in files:
out.append(heading)
out.append("")
raw = (base/fname).read_text()
try:
data = json.loads(raw)
out.append(data.get("value",{}).get("answer",""))
except Exception as e:
out.append(f"(Parsing failed: {e})")
out.append("")
out.append("---")
out.append("")
(base/"Material Research Summary.md").write_text("\n".join(out))
print("Written:", (base/"Material Research Summary.md").stat().st_size, "bytes")
PY
Output Contract
After execution, provide the user with a report including:
- Notebook name + alias + actual number of sources
- Storage paths of the 3 raw JSON files and 1 summary markdown file
- Failed/skipped sources (if any)
- Preview of the summary file's header (first 20 lines or so)
- Suggested next steps (leave downstream usage to the user; this skill ends here)
Safety and Boundaries
- Do not run audio/video/slides generation by default, as these consume quotas; only do so if the user requests it
- Do not automatically run a second round of research; one round is sufficient for most scenarios
- Do not overwrite existing
Material Research Summary.md
; append if it exists
- Do not include users' private information in research queries (notebooks are searchable)
Troubleshooting
nlm Login Expired
bash
nlm login --check # Tells you if the session is valid
nlm login # Re-login
Session validity is approximately 20 minutes.
yt-dlp Search Returns No Output
First check the version:
If it's too old, prompt the user to update. JS runtime / ffmpeg warnings can be ignored and do not affect
mode.
Research Times Out or Gets Stuck
Check status separately (non-blocking):
bash
nlm research status <alias> --max-wait 0
If the status remains
for more than 10 minutes, restart with
:
bash
nlm research start "..." --notebook-id <alias> --mode deep --force
Query Output Is Too Large to View Directly
Redirect all queries to files, then use Python to extract the answer; do not attempt to print large JSON directly in the terminal.
Continuous Failures When Adding Sources
- Check for rate limiting → increase sleep time to 3-5 seconds
- Check URL format (YouTube must use the standard format, not shorts/live)
- Check login status →
References
- Complete NotebookLM CLI Guide: (pip package by jacob-bd) comes with nlm-skill, or refer to the upstream README https://github.com/jacob-bd/notebooklm-mcp-cli
- yt-dlp Command Library:
../yt-dlp-direct/SKILL.md
in the same repository
- Project Own Conventions: If your working directory has / , this skill does not depend on them; optional reading