notebooklm

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NotebookLM — Browser Automation

NotebookLM — 浏览器自动化

Requires: A browser automation environment (Claude Code CLI with computer-use, Claude Chrome Extension, or equivalent). Skill will gracefully fail in non-automation contexts with a clear "not supported" message.
Critical: This skill is the only browser-automation skill in the v2 collection. It does NOT follow the research-pack Agent Integrity Rules convention. Different constraints apply (UI dynamics, async generation, login walls).
要求: 浏览器自动化环境(支持计算机操作的Claude Code CLI、Claude Chrome扩展或等效工具)。在非自动化环境中,技能将优雅地失败并显示清晰的“不支持”提示信息。
重要提示: 本技能是v2集合中唯一的浏览器自动化技能。它不遵循research-pack Agent完整性规则约定。适用不同的约束条件(UI动态变化、异步生成、登录墙)。

Step 0: Browser Context Setup (Mandatory)

步骤0:浏览器上下文设置(必填)

Before any other action, verify browser automation is available:
  1. Check whether browser-control tools are loaded in the harness (screenshot, click, find-element, navigate)
  2. If unavailable → halt with clear message: "This skill requires browser automation. Currently in {context}. Cannot proceed. Use Claude Code CLI with computer-use, Claude Chrome Extension, or equivalent."
  3. If available → take initial screenshot, navigate to https://notebooklm.google.com
  4. Detect login wall via screenshot. If login screen detected: halt with "Please log in to NotebookLM in the browser, then re-invoke this skill." Never attempt to handle login automatically.
在执行任何其他操作之前,需验证浏览器自动化是否可用:
  1. 检查测试工具中是否加载了浏览器控制工具(截图、点击、查找元素、导航)
  2. 若不可用 → 终止并显示清晰提示:“本技能需要浏览器自动化环境。当前处于{context}环境,无法继续。请使用支持计算机操作的Claude Code CLI、Claude Chrome扩展或等效工具。”
  3. 若可用 → 截取初始截图,导航至https://notebooklm.google.com
  4. 通过截图检测登录墙。若检测到登录界面:终止并提示“请先在浏览器中登录NotebookLM,然后重新调用本技能。”切勿尝试自动处理登录流程。

Phase 0: Grill-Me Intake (Action-Routing)

阶段0:引导式输入(动作路由)

Up to 4 forcing questions, one at a time, dependency-ordered. Most invocations stop at Q3.
最多4个强制问题,按依赖顺序逐个提出。大多数调用会在第3个问题后停止。

Q1 (root) — Action

Q1(核心)—— 动作类型

What do you want me to do? Pick one:
  1. Read / extract — ask a question of an existing notebook
  2. Add a source — push content (URL, text, file, Google Doc, or synthesized content) into a notebook
  3. Generate a Studio output — Audio Overview, Study Guide, Briefing Doc, Timeline, FAQ, Infographic, Slides, or Mind Map
  4. Create a new notebook — initialize with title + initial sources
Why I'm asking: Each action takes a different path through the UI and requires different parameters. Naming the action upfront prevents wasted screenshots and lets me ask only the follow-up questions that apply.
Forcing choice. If the user says "open NotebookLM" without specifying an action, refuse to start and re-ask Q1.
你希望我执行什么操作?请选择一项:
  1. 读取/提取 — 向现有笔记本提问
  2. 添加来源 — 将内容(URL、文本、文件、Google文档或合成内容)导入笔记本
  3. 生成Studio输出 — Audio Overview、学习指南、简报文档、时间线、常见问题、信息图、幻灯片、思维导图
  4. 创建新笔记本 — 初始化标题及初始来源
提问原因: 每个操作的UI路径不同,所需参数也不同。提前明确操作类型可避免不必要的截图,仅提出相关的后续问题。
强制选择。若用户仅说“打开NotebookLM”而未指定操作,拒绝启动并重新询问Q1。

Q2 (depends on Q1) — Notebook identity

Q2(依赖Q1)—— 笔记本标识

Which notebook? (asked for actions 1, 2, 3 — not for "create new")
Why I'm asking: If you give me a name, I'll search the homepage; if you give me a URL, I'll navigate directly. Names that are ambiguous will get a disambiguation prompt with screenshots.
For action 4 (create new): replace with "What's the title for the new notebook?"
选择哪个笔记本?(针对操作1、2、3提问 — 不针对“创建新笔记本”)
提问原因: 若提供名称,我将在主页搜索;若提供URL,我将直接导航。名称存在歧义时,会显示截图并提示用户明确选择。
针对操作4(创建新笔记本):替换为“新笔记本的标题是什么?”

Q3 (depends on Q1) — Action-specific parameter

Q3(依赖Q1)—— 操作特定参数

Action 1 (read/extract):
"What's the question to ask the notebook? Use natural phrasing — the notebook's chat handles it best."
Action 2 (add source):
"What source type? Pick one:
  1. URL / website / YouTube link
  2. Copied text (paste here or point at content)
  3. File upload (provide absolute path)
  4. Google Doc (link)
  5. Synthesized content (I'll pre-process and add as 'Copied text')
Why I'm asking: Each source type goes through a different sub-flow in the Add Source dialog. Picking upfront saves a step."
Action 3 (Studio output):
"Which Studio output? Audio Overview / Study Guide / Briefing Doc / Timeline / FAQ / Table of Contents / Infographic / Slides / Mind Map. And: any custom-prompt direction? Default prompts produce mediocre output — I always open the customization menu and write a detailed prompt. Tell me the angle or audience.
Why I'm asking: The output type sets the UI button to find. The custom prompt is mandatory for quality."
Action 4 (create new):
"Initial sources? Provide URLs, file paths, or 'I'll add later'."
操作1(读取/提取):
“你想向笔记本提什么问题?请使用自然表述 — 笔记本的聊天功能最适合处理此类问题。”
操作2(添加来源):
“来源类型是什么?请选择一项:
  1. URL/网站/YouTube链接
  2. 复制的文本(在此粘贴或指向内容)
  3. 文件上传(提供绝对路径)
  4. Google文档(链接)
  5. 合成内容(我将预处理后作为“复制的文本”添加)
提问原因: 每种来源类型在“添加来源”对话框中对应不同的子流程。提前选择可节省步骤。”
操作3(生成Studio输出):
“选择哪种Studio输出?Audio Overview/学习指南/简报文档/时间线/常见问题/目录/信息图/幻灯片/思维导图。另外:是否有自定义提示方向?**默认提示生成的输出质量一般 — 我总会打开自定义菜单并编写详细提示。**请告知角度或受众。
提问原因: 输出类型决定了需要查找的UI按钮。自定义提示是保证质量的必要条件。”
操作4(创建新笔记本):
“初始来源?请提供URL、文件路径,或回复“稍后添加”。”

Q4 (depends on Q1 = action 3) — Studio custom prompt detail

Q4(依赖Q1=操作3)—— Studio自定义提示细节

Tell me the angle, audience, and length for the Studio output. Examples:
  • Audio Overview: "Two-host conversation for a non-technical executive, 8–10 min, focus on business implications not technical depth"
  • Infographic: "Decision-tree style, action-oriented, 6 panels max, monochrome navy"
  • Study Guide: "Undergrad-level, definitions + 3 practice questions per concept"
Why I'm asking: This becomes the custom prompt. Default Studio prompts produce mediocre output — specific direction produces sharp output.
Asked only for Studio output generation (Q1=3). Skip otherwise.
Stop condition: After Q4 (or earlier with dependency skips), commit and start the action sequence.
See
references/studio_output_custom_prompts.md
for the canon.
“请告知Studio输出的角度、受众和长度。示例:”
  • Audio Overview: “面向非技术高管的双人对话,时长8–10分钟,聚焦业务影响而非技术深度”
  • 信息图: “决策树风格,注重行动导向,最多6个面板,单色海军蓝”
  • 学习指南: “本科水平,包含定义 + 每个概念配3道练习题”
提问原因: 这将作为自定义提示内容。默认Studio提示生成的输出质量一般 — 明确的方向才能生成优质输出。
仅针对生成Studio输出(Q1=3)提问。其他情况跳过。
停止条件: 完成Q4后(或根据依赖关系提前结束),提交并启动操作序列。
有关标准示例,请参阅
references/studio_output_custom_prompts.md

Notebook Discovery

笔记本查找

For actions 1-3 (require existing notebook):
  1. Navigate to homepage → screenshot
  2. If user provided URL → navigate directly
  3. If user provided name:
    • Use semantic find() to locate notebook card by visible title text
    • If multiple matches → screenshot homepage, list options, ask user to specify
    • If no match → ask user to provide URL or confirm spelling
For action 4 (create new):
  1. Locate "New notebook" button on homepage
  2. Click → set title from Q2
  3. Add initial sources per Q3
针对操作1-3(需要现有笔记本):
  1. 导航至主页 → 截图
  2. 若用户提供URL → 直接导航
  3. 若用户提供名称
    • 使用语义find()功能通过可见标题文本定位笔记本卡片
    • 若存在多个匹配项 → 截图主页,列出选项,请求用户明确选择
    • 若无匹配项 → 请求用户提供URL或确认拼写
针对操作4(创建新笔记本):
  1. 在主页上找到“New notebook”按钮
  2. 点击 → 设置Q2中提供的标题
  3. 根据Q3添加初始来源

Action 1: Read / Extract

操作1:读取/提取

  1. Open the notebook (notebook discovery above)
  2. Locate chat input (semantic find or screenshot coordinates)
  3. Type the question (use the user's natural phrasing from Q3)
  4. Submit (Enter or send button)
  5. Wait 3–5 seconds
  6. Screenshot the response area
  7. Extract and present in clean format (not raw chat dump)
  1. 打开笔记本(通过上述笔记本查找流程)
  2. 定位聊天输入框(语义查找或截图坐标)
  3. 输入问题(使用用户在Q3中提供的自然表述)
  4. 提交(回车键或发送按钮)
  5. 等待3–5秒
  6. 截取响应区域的截图
  7. 提取内容并以清晰格式呈现(非原始聊天记录)

Action 2: Add Sources

操作2:添加来源

Sub-flows per source type:
TypeUI flow
URL / Website / YouTubeAdd Source → Link → paste URL
Copied TextAdd Source → Copied text → paste content
File UploadUse file-upload tool with absolute path + input ref (never click native file picker)
Google DocAdd Source → Google Docs → Drive picker
Synthesized contentPre-process content elsewhere, then add as Copied text
After every add: wait for ingestion spinner, screenshot to confirm success.
Synthesized content pattern (powerful): instead of asking NotebookLM to ingest a raw URL with potentially noisy content, pre-process the content (extract main article, strip nav/ads/comments), then add as "Copied text". Produces dramatically better summarization.
按来源类型划分的子流程:
类型UI流程
URL/网站/YouTube添加来源 → 链接 → 粘贴URL
复制的文本添加来源 → 复制的文本 → 粘贴内容
文件上传使用文件上传工具并提供绝对路径 + 输入引用(切勿点击原生文件选择器)
Google文档添加来源 → Google文档 → 云端硬盘选择器
合成内容在其他地方预处理内容,然后作为“复制的文本”添加
每次添加后: 等待摄取加载动画,截图确认成功。
合成内容模式(高效): 无需让NotebookLM摄取可能包含噪音内容的原始URL,先预处理内容(提取主要文章、移除导航/广告/评论),再作为“复制的文本”添加。这种方式生成的摘要质量显著提升。

Action 3: Studio Outputs

操作3:Studio输出

All 9 output types supported: Audio Overview, Study Guide, Briefing Doc, Timeline, FAQ, Table of Contents, Infographic, Slides, Mind Map.
Mandatory workflow:
  1. Locate Studio panel (right side; may need toggle)
  2. Find the specific output button for the requested type
  3. Open customization menu (chevron/arrow next to button) — NOT the main button
  4. Write detailed custom prompt (from Q4)
  5. Confirm and submit
  6. Do NOT wait for completion — confirm generation started, notify user, return
支持全部9种输出类型: Audio Overview、学习指南、简报文档、时间线、常见问题、目录、信息图、幻灯片、思维导图。
强制工作流:
  1. 定位Studio面板(右侧;可能需要切换显示)
  2. 找到对应输出类型的特定按钮
  3. 打开自定义菜单(按钮旁的下拉箭头)—— 不要点击主按钮
  4. 编写详细的自定义提示(来自Q4)
  5. 确认并提交
  6. 无需等待完成 — 确认生成已启动,通知用户,然后结束任务

Custom prompt examples (4 output types)

自定义提示示例(4种输出类型)

Audio Overview:
"Two-host conversation between a researcher and an experienced practitioner. Audience: non-technical executive making a budget decision. Length: 8-10 minutes. Focus on business implications, not technical depth. Include one concrete example per major point. Acknowledge counter-arguments briefly."
Infographic:
"Decision-tree style. Action-oriented (each panel ends with a decision or action). 6 panels max. Monochrome navy + amber highlight. Each panel has: title (4-6 words), 1-2 sentence body, decision/action line. No filler panels."
Study Guide:
"Undergraduate-level (define every technical term). Structure: 6 concepts × 4 elements each (definition / why it matters / one worked example / 3 practice questions). Practice questions Bloom-higher-order (apply/analyze), not recall."
Slides (slide deck):
"12 slides max. 1-2 sentences per slide body. Presenter notes per slide with: one concrete example + one likely audience objection + how to address it. No bullet points in slide bodies — prose only. End with one-slide call-to-action."
See
references/studio_output_custom_prompts.md
for more.
Audio Overview:
“研究人员与资深从业者之间的双人对话。受众:制定预算决策的非技术高管。时长:8-10分钟。聚焦业务影响,而非技术深度。每个要点包含一个具体示例。简要提及反对意见。”
信息图:
“决策树风格。注重行动导向(每个面板以决策或行动结尾)。最多6个面板。单色海军蓝 + 琥珀色高亮。每个面板包含:标题(4-6个单词)、1-2句正文、决策/行动说明。无冗余面板。”
学习指南:
“本科水平(定义每个技术术语)。结构:6个概念 × 每个概念4个要素(定义/重要性/一个实例/3道练习题)。练习题采用布鲁姆高阶认知(应用/分析),而非回忆类。”
幻灯片(演示文稿):
“最多12张幻灯片。每张幻灯片正文1-2句话。每张幻灯片包含演讲者备注:一个具体示例 + 一个可能的受众异议 + 如何应对。幻灯片正文无项目符号 — 仅使用散文。最后以一张幻灯片展示行动号召。”
更多示例请参阅
references/studio_output_custom_prompts.md

Action 4: Create New Notebook

操作4:创建新笔记本

  1. Navigate to homepage
  2. Click "New notebook"
  3. Set title from Q2
  4. Add initial sources from Q3 (use Action 2 sub-flows per source type)
  5. Wait for auto-summary generation (this one IS synchronous — usually completes in <30 sec)
  6. Screenshot final state
  1. 导航至主页
  2. 点击“New notebook”
  3. 设置Q2中提供的标题
  4. 根据Q3添加初始来源(针对每种来源类型使用操作2的子流程)
  5. 等待自动摘要生成(此流程为同步操作 — 通常在30秒内完成)
  6. 截取最终状态的截图

Critical Async Behavior

关键异步行为

Async output rule: For Studio generations (especially Audio Overview — 5-10 min), DO NOT wait for completion. The user's session will time out.
Workflow: Click Generate → confirm generation has started via screenshot → tell the user "Generation in progress — NotebookLM will notify you when ready" → end the task.
This is the fire-and-notify pattern. Different from add-source and auto-summary (which are fast enough to wait).
Use
scripts/async_action_classifier.py
to determine wait-or-notify per action:
ActionWait?
Add Source (URL/text/file)Yes — wait for ingestion spinner (~5-30s)
Read/Extract (chat)Yes — wait 3-5s for response
Studio: Audio OverviewNo — fire and notify (5-10 min)
Studio: Infographic / Slides / Mind MapNo — fire and notify (2-5 min)
Studio: Study Guide / Briefing Doc / FAQYes — wait ~30-60s
Create New NotebookYes — wait for auto-summary (<30s)
See
references/async_action_discipline.md
for the canon.
异步输出规则: 对于Studio生成(尤其是Audio Overview — 需5-10分钟),请勿等待完成。用户会话会超时。
工作流:点击生成 → 通过截图确认生成已启动 → 告知用户“生成进行中 — NotebookLM将在完成时通知您” → 结束任务。
这是触发并通知模式。与添加来源和自动摘要(速度快可等待)不同。
使用
scripts/async_action_classifier.py
确定每个操作是等待还是通知:
操作是否等待?
添加来源(URL/文本/文件)是 — 等待摄取加载动画(约5-30秒)
读取/提取(聊天)是 — 等待3-5秒获取响应
Studio:Audio Overview — 触发并通知(5-10分钟)
Studio:信息图/幻灯片/思维导图 — 触发并通知(2-5分钟)
Studio:学习指南/简报文档/常见问题是 — 等待约30-60秒
创建新笔记本是 — 等待自动摘要生成(<30秒)
有关标准规范,请参阅
references/async_action_discipline.md

Screenshot-First Discipline

优先截图原则

NotebookLM is a dynamic SPA where UI varies by:
  • Account tier (free vs Plus vs Enterprise)
  • Feature rollout (some Studio types not yet available to all users)
  • Recent UI changes (Google iterates the product frequently)
Every UI action must be preceded by a screenshot. Reasons:
  1. Verify the UI matches expectations before acting
  2. Catch login walls early
  3. Detect unexpected layout changes
  4. Audit trail for debugging
Use
screenshot()
(or equivalent in your browser-automation tool) before every meaningful UI interaction.
See
references/browser_automation_canon.md
for the discipline.
NotebookLM是动态SPA,其UI会因以下因素而异:
  • 账户层级(免费版/Plus版/企业版)
  • 功能推送(部分Studio类型尚未向所有用户开放)
  • 近期UI变更(Google频繁迭代产品)
每次UI操作前必须先截图。原因:
  1. 操作前验证UI是否符合预期
  2. 尽早发现登录墙
  3. 检测意外的布局变更
  4. 提供调试审计跟踪
在每次有意义的UI交互前,使用
screenshot()
(或浏览器自动化工具中的等效功能)。
有关规范,请参阅
references/browser_automation_canon.md

find()-Before-Click

先查找后点击

Use semantic element finders before pixel coordinates wherever possible:
  • find(text="Audio Overview")
    → returns element regardless of position
  • click(x=420, y=380)
    → breaks when UI rearranges
Semantic finders survive minor UI changes. Pixel coordinates do not.
Only fall back to coordinates when:
  • Semantic find() returns nothing
  • Element has no stable text/aria-label/data-attribute
  • Visual position is the only reliable signal
尽可能使用语义元素查找器而非像素坐标:
  • find(text="Audio Overview")
    → 无论位置如何都能返回元素
  • click(x=420, y=380)
    → UI重排时会失效
语义查找器可适应轻微的UI变更。像素坐标则无法做到。
仅在以下情况使用坐标:
  • 语义find()未返回结果
  • 元素无稳定文本/aria标签/数据属性
  • 视觉位置是唯一可靠的信号

Saving Outputs to Workspace

将输出保存到工作区

For Read/Extract actions producing useful information:
  1. Extract chat response cleanly (strip UI chrome)
  2. Format readably (paragraphs, lists, code blocks as appropriate)
  3. If user requested → save to file (
    ${WORKSPACE}/notebooklm/<notebook-slug>-<action>-<date>.md
    )
  4. Otherwise → return in chat as final summary
For Studio outputs:
  1. NotebookLM hosts the output (Audio Overview is in-app, Infographic downloadable, etc.)
  2. Report the location (URL or in-app navigation path) to user
  3. Don't try to download/save Studio outputs to local workspace — that's NotebookLM's job
对于读取/提取操作生成的有用信息:
  1. 清晰提取聊天响应(去除UI边框)
  2. 格式化以便阅读(段落、列表、代码块等按需设置)
  3. 若用户要求 → 保存到文件(
    ${WORKSPACE}/notebooklm/<notebook-slug>-<action>-<date>.md
  4. 否则 → 在聊天中返回最终摘要
对于Studio输出:
  1. NotebookLM托管输出(Audio Overview在应用内,信息图可下载等)
  2. 向用户报告位置(URL或应用内导航路径)
  3. 不要尝试将Studio输出下载/保存到本地工作区 — 这是NotebookLM的职责

Reporting Back Format

反馈格式

After completing any action:
  1. Take final screenshot if visually relevant
  2. Give clean summary (not raw chat dump):
    • Notebook used (name)
    • Action taken (specific)
    • Result (1-2 sentences)
    • For generated outputs: what was created + where it is + when ready
  3. For fire-and-notify actions: explicit "NotebookLM will notify you when ready"
完成任何操作后:
  1. 若视觉相关则截取最终截图
  2. 提供清晰摘要(非原始聊天记录):
    • 使用的笔记本(名称)
    • 执行的操作(具体内容)
    • 结果(1-2句话)
    • 对于生成的输出:创建的内容 + 位置 + 完成时间
  3. 对于触发并通知的操作:明确告知“NotebookLM将在完成时通知您”

Error Handling

错误处理

FailureBehavior
Browser automation unavailableFail fast with "this skill requires browser automation" message (Step 0 halt)
Login wall detectedStop. Tell user to log in. Don't attempt auto-login.
Multiple notebooks match nameScreenshot homepage, list options, ask user to specify
Source ingestion spinner stuck > 60sNote timeout, ask user if they want to retry
Studio button not found in panelScroll down or look for "Discover more"; if still missing, note feature may not be enabled for this account
Chat response doesn't appear in 10sScreenshot, check for error state, retry once
Page layout changed unexpectedlyScreenshot, describe what's visible, ask user for guidance
故障处理方式
浏览器自动化不可用快速失败并显示“本技能需要浏览器自动化环境”提示(步骤0终止)
检测到登录墙停止。告知用户登录。切勿尝试自动登录。
多个笔记本匹配名称截图主页,列出选项,请求用户明确选择
来源摄取加载动画卡住超过60秒记录超时,询问用户是否重试
在面板中未找到Studio按钮向下滚动或查找“Discover more”;若仍未找到,提示该功能可能未对本账户启用
10秒内未出现聊天响应截图,检查错误状态,重试一次
页面布局意外变更截图,描述可见内容,请求用户指导

Tooling

工具

ScriptRole
scripts/action_router.py
Q1-Q4 answers → action plan + UI flow + required parameters
scripts/custom_prompt_template_generator.py
Studio output type + audience + length → starter custom prompt
scripts/async_action_classifier.py
Action name → wait-or-notify pattern (fire-and-notify for slow generations)
脚本作用
scripts/action_router.py
将Q1-Q4的答案转换为操作计划 + UI流程 + 所需参数
scripts/custom_prompt_template_generator.py
根据Studio输出类型 + 受众 + 长度生成初始自定义提示
scripts/async_action_classifier.py
根据操作名称确定等待或通知模式(缓慢生成操作采用触发并通知)

References

参考资料

  • references/browser_automation_canon.md
    — screenshot-first + find-before-click + tool-agnostic patterns (7+ sources)
  • references/studio_output_custom_prompts.md
    — why defaults are mediocre + per-output-type templates (7+ sources)
  • references/async_action_discipline.md
    — fire-and-notify pattern for slow UI ops (7+ sources)
  • references/browser_automation_canon.md
    — 优先截图+先查找后点击+工具无关模式(7+来源)
  • references/studio_output_custom_prompts.md
    — 为何默认提示质量一般+按输出类型分类的模板(7+来源)
  • references/async_action_discipline.md
    — 缓慢UI操作的触发并通知模式(7+来源)

Anti-Patterns To Reject

需避免的反模式

  • Tool-specific tool names without abstraction (e.g., hardcoding "Claude Chrome Extension")
  • Synchronous waiting on Studio generations (especially Audio Overview)
  • Skipping screenshots between actions
  • Using pixel coordinates when semantic find() is available
  • Attempting to handle login flows automatically
  • Generating Studio outputs without opening customization menu
  • Using default Studio prompts (always write custom)

Version: 1.0.0 Source spec:
megaprompts/03-notebooklm-megaprompt.md
Build pattern: Path B (direct conversion). Browser-automation shape — distinct from research-pack convention.
  • 未抽象的工具特定名称(例如硬编码“Claude Chrome Extension”)
  • 同步等待Studio生成(尤其是Audio Overview)
  • 操作之间跳过截图
  • 可使用语义find()时仍使用像素坐标
  • 尝试自动处理登录流程
  • 未打开自定义菜单就生成Studio输出
  • 使用默认Studio提示(始终编写自定义提示)

版本: 1.0.0 源规范:
megaprompts/03-notebooklm-megaprompt.md
构建模式: 路径B(直接转换)。浏览器自动化形态 — 与research-pack约定不同。