notebooklm
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNotebookLM — Browser Automation
NotebookLM — 浏览器自动化
Requires: A browser automation environment (Claude Code CLI with computer-use, Claude Chrome Extension, or equivalent). Skill will gracefully fail in non-automation contexts with a clear "not supported" message.
Critical: This skill is the only browser-automation skill in the v2 collection. It does NOT follow the research-pack Agent Integrity Rules convention. Different constraints apply (UI dynamics, async generation, login walls).
要求: 浏览器自动化环境(支持计算机操作的Claude Code CLI、Claude Chrome扩展或等效工具)。在非自动化环境中,技能将优雅地失败并显示清晰的“不支持”提示信息。
重要提示: 本技能是v2集合中唯一的浏览器自动化技能。它不遵循research-pack Agent完整性规则约定。适用不同的约束条件(UI动态变化、异步生成、登录墙)。
Step 0: Browser Context Setup (Mandatory)
步骤0:浏览器上下文设置(必填)
Before any other action, verify browser automation is available:
- Check whether browser-control tools are loaded in the harness (screenshot, click, find-element, navigate)
- If unavailable → halt with clear message: "This skill requires browser automation. Currently in {context}. Cannot proceed. Use Claude Code CLI with computer-use, Claude Chrome Extension, or equivalent."
- If available → take initial screenshot, navigate to https://notebooklm.google.com
- Detect login wall via screenshot. If login screen detected: halt with "Please log in to NotebookLM in the browser, then re-invoke this skill." Never attempt to handle login automatically.
在执行任何其他操作之前,需验证浏览器自动化是否可用:
- 检查测试工具中是否加载了浏览器控制工具(截图、点击、查找元素、导航)
- 若不可用 → 终止并显示清晰提示:“本技能需要浏览器自动化环境。当前处于{context}环境,无法继续。请使用支持计算机操作的Claude Code CLI、Claude Chrome扩展或等效工具。”
- 若可用 → 截取初始截图,导航至https://notebooklm.google.com
- 通过截图检测登录墙。若检测到登录界面:终止并提示“请先在浏览器中登录NotebookLM,然后重新调用本技能。”切勿尝试自动处理登录流程。
Phase 0: Grill-Me Intake (Action-Routing)
阶段0:引导式输入(动作路由)
Up to 4 forcing questions, one at a time, dependency-ordered. Most invocations stop at Q3.
最多4个强制问题,按依赖顺序逐个提出。大多数调用会在第3个问题后停止。
Q1 (root) — Action
Q1(核心)—— 动作类型
What do you want me to do? Pick one:
- Read / extract — ask a question of an existing notebook
- Add a source — push content (URL, text, file, Google Doc, or synthesized content) into a notebook
- Generate a Studio output — Audio Overview, Study Guide, Briefing Doc, Timeline, FAQ, Infographic, Slides, or Mind Map
- Create a new notebook — initialize with title + initial sources
Why I'm asking: Each action takes a different path through the UI and requires different parameters. Naming the action upfront prevents wasted screenshots and lets me ask only the follow-up questions that apply.
Forcing choice. If the user says "open NotebookLM" without specifying an action, refuse to start and re-ask Q1.
你希望我执行什么操作?请选择一项:
- 读取/提取 — 向现有笔记本提问
- 添加来源 — 将内容(URL、文本、文件、Google文档或合成内容)导入笔记本
- 生成Studio输出 — Audio Overview、学习指南、简报文档、时间线、常见问题、信息图、幻灯片、思维导图
- 创建新笔记本 — 初始化标题及初始来源
提问原因: 每个操作的UI路径不同,所需参数也不同。提前明确操作类型可避免不必要的截图,仅提出相关的后续问题。
强制选择。若用户仅说“打开NotebookLM”而未指定操作,拒绝启动并重新询问Q1。
Q2 (depends on Q1) — Notebook identity
Q2(依赖Q1)—— 笔记本标识
Which notebook? (asked for actions 1, 2, 3 — not for "create new")Why I'm asking: If you give me a name, I'll search the homepage; if you give me a URL, I'll navigate directly. Names that are ambiguous will get a disambiguation prompt with screenshots.
For action 4 (create new): replace with "What's the title for the new notebook?"
选择哪个笔记本?(针对操作1、2、3提问 — 不针对“创建新笔记本”)提问原因: 若提供名称,我将在主页搜索;若提供URL,我将直接导航。名称存在歧义时,会显示截图并提示用户明确选择。
针对操作4(创建新笔记本):替换为“新笔记本的标题是什么?”
Q3 (depends on Q1) — Action-specific parameter
Q3(依赖Q1)—— 操作特定参数
Action 1 (read/extract):
"What's the question to ask the notebook? Use natural phrasing — the notebook's chat handles it best."
Action 2 (add source):
"What source type? Pick one:
- URL / website / YouTube link
- Copied text (paste here or point at content)
- File upload (provide absolute path)
- Google Doc (link)
- Synthesized content (I'll pre-process and add as 'Copied text')
Why I'm asking: Each source type goes through a different sub-flow in the Add Source dialog. Picking upfront saves a step."
Action 3 (Studio output):
"Which Studio output? Audio Overview / Study Guide / Briefing Doc / Timeline / FAQ / Table of Contents / Infographic / Slides / Mind Map. And: any custom-prompt direction? Default prompts produce mediocre output — I always open the customization menu and write a detailed prompt. Tell me the angle or audience.Why I'm asking: The output type sets the UI button to find. The custom prompt is mandatory for quality."
Action 4 (create new):
"Initial sources? Provide URLs, file paths, or 'I'll add later'."
操作1(读取/提取):
“你想向笔记本提什么问题?请使用自然表述 — 笔记本的聊天功能最适合处理此类问题。”
操作2(添加来源):
“来源类型是什么?请选择一项:
- URL/网站/YouTube链接
- 复制的文本(在此粘贴或指向内容)
- 文件上传(提供绝对路径)
- Google文档(链接)
- 合成内容(我将预处理后作为“复制的文本”添加)
提问原因: 每种来源类型在“添加来源”对话框中对应不同的子流程。提前选择可节省步骤。”
操作3(生成Studio输出):
“选择哪种Studio输出?Audio Overview/学习指南/简报文档/时间线/常见问题/目录/信息图/幻灯片/思维导图。另外:是否有自定义提示方向?**默认提示生成的输出质量一般 — 我总会打开自定义菜单并编写详细提示。**请告知角度或受众。提问原因: 输出类型决定了需要查找的UI按钮。自定义提示是保证质量的必要条件。”
操作4(创建新笔记本):
“初始来源?请提供URL、文件路径,或回复“稍后添加”。”
Q4 (depends on Q1 = action 3) — Studio custom prompt detail
Q4(依赖Q1=操作3)—— Studio自定义提示细节
Tell me the angle, audience, and length for the Studio output. Examples:
- Audio Overview: "Two-host conversation for a non-technical executive, 8–10 min, focus on business implications not technical depth"
- Infographic: "Decision-tree style, action-oriented, 6 panels max, monochrome navy"
- Study Guide: "Undergrad-level, definitions + 3 practice questions per concept"
Why I'm asking: This becomes the custom prompt. Default Studio prompts produce mediocre output — specific direction produces sharp output.
Asked only for Studio output generation (Q1=3). Skip otherwise.
Stop condition: After Q4 (or earlier with dependency skips), commit and start the action sequence.
See for the canon.
references/studio_output_custom_prompts.md“请告知Studio输出的角度、受众和长度。示例:”
- Audio Overview: “面向非技术高管的双人对话,时长8–10分钟,聚焦业务影响而非技术深度”
- 信息图: “决策树风格,注重行动导向,最多6个面板,单色海军蓝”
- 学习指南: “本科水平,包含定义 + 每个概念配3道练习题”
提问原因: 这将作为自定义提示内容。默认Studio提示生成的输出质量一般 — 明确的方向才能生成优质输出。
仅针对生成Studio输出(Q1=3)提问。其他情况跳过。
停止条件: 完成Q4后(或根据依赖关系提前结束),提交并启动操作序列。
有关标准示例,请参阅。
references/studio_output_custom_prompts.mdNotebook Discovery
笔记本查找
For actions 1-3 (require existing notebook):
- Navigate to homepage → screenshot
- If user provided URL → navigate directly
- If user provided name:
- Use semantic find() to locate notebook card by visible title text
- If multiple matches → screenshot homepage, list options, ask user to specify
- If no match → ask user to provide URL or confirm spelling
For action 4 (create new):
- Locate "New notebook" button on homepage
- Click → set title from Q2
- Add initial sources per Q3
针对操作1-3(需要现有笔记本):
- 导航至主页 → 截图
- 若用户提供URL → 直接导航
- 若用户提供名称:
- 使用语义find()功能通过可见标题文本定位笔记本卡片
- 若存在多个匹配项 → 截图主页,列出选项,请求用户明确选择
- 若无匹配项 → 请求用户提供URL或确认拼写
针对操作4(创建新笔记本):
- 在主页上找到“New notebook”按钮
- 点击 → 设置Q2中提供的标题
- 根据Q3添加初始来源
Action 1: Read / Extract
操作1:读取/提取
- Open the notebook (notebook discovery above)
- Locate chat input (semantic find or screenshot coordinates)
- Type the question (use the user's natural phrasing from Q3)
- Submit (Enter or send button)
- Wait 3–5 seconds
- Screenshot the response area
- Extract and present in clean format (not raw chat dump)
- 打开笔记本(通过上述笔记本查找流程)
- 定位聊天输入框(语义查找或截图坐标)
- 输入问题(使用用户在Q3中提供的自然表述)
- 提交(回车键或发送按钮)
- 等待3–5秒
- 截取响应区域的截图
- 提取内容并以清晰格式呈现(非原始聊天记录)
Action 2: Add Sources
操作2:添加来源
Sub-flows per source type:
| Type | UI flow |
|---|---|
| URL / Website / YouTube | Add Source → Link → paste URL |
| Copied Text | Add Source → Copied text → paste content |
| File Upload | Use file-upload tool with absolute path + input ref (never click native file picker) |
| Google Doc | Add Source → Google Docs → Drive picker |
| Synthesized content | Pre-process content elsewhere, then add as Copied text |
After every add: wait for ingestion spinner, screenshot to confirm success.
Synthesized content pattern (powerful): instead of asking NotebookLM to ingest a raw URL with potentially noisy content, pre-process the content (extract main article, strip nav/ads/comments), then add as "Copied text". Produces dramatically better summarization.
按来源类型划分的子流程:
| 类型 | UI流程 |
|---|---|
| URL/网站/YouTube | 添加来源 → 链接 → 粘贴URL |
| 复制的文本 | 添加来源 → 复制的文本 → 粘贴内容 |
| 文件上传 | 使用文件上传工具并提供绝对路径 + 输入引用(切勿点击原生文件选择器) |
| Google文档 | 添加来源 → Google文档 → 云端硬盘选择器 |
| 合成内容 | 在其他地方预处理内容,然后作为“复制的文本”添加 |
每次添加后: 等待摄取加载动画,截图确认成功。
合成内容模式(高效): 无需让NotebookLM摄取可能包含噪音内容的原始URL,先预处理内容(提取主要文章、移除导航/广告/评论),再作为“复制的文本”添加。这种方式生成的摘要质量显著提升。
Action 3: Studio Outputs
操作3:Studio输出
All 9 output types supported: Audio Overview, Study Guide, Briefing Doc, Timeline, FAQ, Table of Contents, Infographic, Slides, Mind Map.
Mandatory workflow:
- Locate Studio panel (right side; may need toggle)
- Find the specific output button for the requested type
- Open customization menu (chevron/arrow next to button) — NOT the main button
- Write detailed custom prompt (from Q4)
- Confirm and submit
- Do NOT wait for completion — confirm generation started, notify user, return
支持全部9种输出类型: Audio Overview、学习指南、简报文档、时间线、常见问题、目录、信息图、幻灯片、思维导图。
强制工作流:
- 定位Studio面板(右侧;可能需要切换显示)
- 找到对应输出类型的特定按钮
- 打开自定义菜单(按钮旁的下拉箭头)—— 不要点击主按钮
- 编写详细的自定义提示(来自Q4)
- 确认并提交
- 无需等待完成 — 确认生成已启动,通知用户,然后结束任务
Custom prompt examples (4 output types)
自定义提示示例(4种输出类型)
Audio Overview:
"Two-host conversation between a researcher and an experienced practitioner. Audience: non-technical executive making a budget decision. Length: 8-10 minutes. Focus on business implications, not technical depth. Include one concrete example per major point. Acknowledge counter-arguments briefly."
Infographic:
"Decision-tree style. Action-oriented (each panel ends with a decision or action). 6 panels max. Monochrome navy + amber highlight. Each panel has: title (4-6 words), 1-2 sentence body, decision/action line. No filler panels."
Study Guide:
"Undergraduate-level (define every technical term). Structure: 6 concepts × 4 elements each (definition / why it matters / one worked example / 3 practice questions). Practice questions Bloom-higher-order (apply/analyze), not recall."
Slides (slide deck):
"12 slides max. 1-2 sentences per slide body. Presenter notes per slide with: one concrete example + one likely audience objection + how to address it. No bullet points in slide bodies — prose only. End with one-slide call-to-action."
See for more.
references/studio_output_custom_prompts.mdAudio Overview:
“研究人员与资深从业者之间的双人对话。受众:制定预算决策的非技术高管。时长:8-10分钟。聚焦业务影响,而非技术深度。每个要点包含一个具体示例。简要提及反对意见。”
信息图:
“决策树风格。注重行动导向(每个面板以决策或行动结尾)。最多6个面板。单色海军蓝 + 琥珀色高亮。每个面板包含:标题(4-6个单词)、1-2句正文、决策/行动说明。无冗余面板。”
学习指南:
“本科水平(定义每个技术术语)。结构:6个概念 × 每个概念4个要素(定义/重要性/一个实例/3道练习题)。练习题采用布鲁姆高阶认知(应用/分析),而非回忆类。”
幻灯片(演示文稿):
“最多12张幻灯片。每张幻灯片正文1-2句话。每张幻灯片包含演讲者备注:一个具体示例 + 一个可能的受众异议 + 如何应对。幻灯片正文无项目符号 — 仅使用散文。最后以一张幻灯片展示行动号召。”
更多示例请参阅。
references/studio_output_custom_prompts.mdAction 4: Create New Notebook
操作4:创建新笔记本
- Navigate to homepage
- Click "New notebook"
- Set title from Q2
- Add initial sources from Q3 (use Action 2 sub-flows per source type)
- Wait for auto-summary generation (this one IS synchronous — usually completes in <30 sec)
- Screenshot final state
- 导航至主页
- 点击“New notebook”
- 设置Q2中提供的标题
- 根据Q3添加初始来源(针对每种来源类型使用操作2的子流程)
- 等待自动摘要生成(此流程为同步操作 — 通常在30秒内完成)
- 截取最终状态的截图
Critical Async Behavior
关键异步行为
Async output rule: For Studio generations (especially Audio Overview — 5-10 min), DO NOT wait for completion. The user's session will time out.Workflow: Click Generate → confirm generation has started via screenshot → tell the user "Generation in progress — NotebookLM will notify you when ready" → end the task.
This is the fire-and-notify pattern. Different from add-source and auto-summary (which are fast enough to wait).
Use to determine wait-or-notify per action:
scripts/async_action_classifier.py| Action | Wait? |
|---|---|
| Add Source (URL/text/file) | Yes — wait for ingestion spinner (~5-30s) |
| Read/Extract (chat) | Yes — wait 3-5s for response |
| Studio: Audio Overview | No — fire and notify (5-10 min) |
| Studio: Infographic / Slides / Mind Map | No — fire and notify (2-5 min) |
| Studio: Study Guide / Briefing Doc / FAQ | Yes — wait ~30-60s |
| Create New Notebook | Yes — wait for auto-summary (<30s) |
See for the canon.
references/async_action_discipline.md异步输出规则: 对于Studio生成(尤其是Audio Overview — 需5-10分钟),请勿等待完成。用户会话会超时。工作流:点击生成 → 通过截图确认生成已启动 → 告知用户“生成进行中 — NotebookLM将在完成时通知您” → 结束任务。
这是触发并通知模式。与添加来源和自动摘要(速度快可等待)不同。
使用确定每个操作是等待还是通知:
scripts/async_action_classifier.py| 操作 | 是否等待? |
|---|---|
| 添加来源(URL/文本/文件) | 是 — 等待摄取加载动画(约5-30秒) |
| 读取/提取(聊天) | 是 — 等待3-5秒获取响应 |
| Studio:Audio Overview | 否 — 触发并通知(5-10分钟) |
| Studio:信息图/幻灯片/思维导图 | 否 — 触发并通知(2-5分钟) |
| Studio:学习指南/简报文档/常见问题 | 是 — 等待约30-60秒 |
| 创建新笔记本 | 是 — 等待自动摘要生成(<30秒) |
有关标准规范,请参阅。
references/async_action_discipline.mdScreenshot-First Discipline
优先截图原则
NotebookLM is a dynamic SPA where UI varies by:
- Account tier (free vs Plus vs Enterprise)
- Feature rollout (some Studio types not yet available to all users)
- Recent UI changes (Google iterates the product frequently)
Every UI action must be preceded by a screenshot. Reasons:
- Verify the UI matches expectations before acting
- Catch login walls early
- Detect unexpected layout changes
- Audit trail for debugging
Use (or equivalent in your browser-automation tool) before every meaningful UI interaction.
screenshot()See for the discipline.
references/browser_automation_canon.mdNotebookLM是动态SPA,其UI会因以下因素而异:
- 账户层级(免费版/Plus版/企业版)
- 功能推送(部分Studio类型尚未向所有用户开放)
- 近期UI变更(Google频繁迭代产品)
每次UI操作前必须先截图。原因:
- 操作前验证UI是否符合预期
- 尽早发现登录墙
- 检测意外的布局变更
- 提供调试审计跟踪
在每次有意义的UI交互前,使用(或浏览器自动化工具中的等效功能)。
screenshot()有关规范,请参阅。
references/browser_automation_canon.mdfind()-Before-Click
先查找后点击
Use semantic element finders before pixel coordinates wherever possible:
- ✅ → returns element regardless of position
find(text="Audio Overview") - ❌ → breaks when UI rearranges
click(x=420, y=380)
Semantic finders survive minor UI changes. Pixel coordinates do not.
Only fall back to coordinates when:
- Semantic find() returns nothing
- Element has no stable text/aria-label/data-attribute
- Visual position is the only reliable signal
尽可能使用语义元素查找器而非像素坐标:
- ✅ → 无论位置如何都能返回元素
find(text="Audio Overview") - ❌ → UI重排时会失效
click(x=420, y=380)
语义查找器可适应轻微的UI变更。像素坐标则无法做到。
仅在以下情况使用坐标:
- 语义find()未返回结果
- 元素无稳定文本/aria标签/数据属性
- 视觉位置是唯一可靠的信号
Saving Outputs to Workspace
将输出保存到工作区
For Read/Extract actions producing useful information:
- Extract chat response cleanly (strip UI chrome)
- Format readably (paragraphs, lists, code blocks as appropriate)
- If user requested → save to file ()
${WORKSPACE}/notebooklm/<notebook-slug>-<action>-<date>.md - Otherwise → return in chat as final summary
For Studio outputs:
- NotebookLM hosts the output (Audio Overview is in-app, Infographic downloadable, etc.)
- Report the location (URL or in-app navigation path) to user
- Don't try to download/save Studio outputs to local workspace — that's NotebookLM's job
对于读取/提取操作生成的有用信息:
- 清晰提取聊天响应(去除UI边框)
- 格式化以便阅读(段落、列表、代码块等按需设置)
- 若用户要求 → 保存到文件()
${WORKSPACE}/notebooklm/<notebook-slug>-<action>-<date>.md - 否则 → 在聊天中返回最终摘要
对于Studio输出:
- NotebookLM托管输出(Audio Overview在应用内,信息图可下载等)
- 向用户报告位置(URL或应用内导航路径)
- 不要尝试将Studio输出下载/保存到本地工作区 — 这是NotebookLM的职责
Reporting Back Format
反馈格式
After completing any action:
- Take final screenshot if visually relevant
- Give clean summary (not raw chat dump):
- Notebook used (name)
- Action taken (specific)
- Result (1-2 sentences)
- For generated outputs: what was created + where it is + when ready
- For fire-and-notify actions: explicit "NotebookLM will notify you when ready"
完成任何操作后:
- 若视觉相关则截取最终截图
- 提供清晰摘要(非原始聊天记录):
- 使用的笔记本(名称)
- 执行的操作(具体内容)
- 结果(1-2句话)
- 对于生成的输出:创建的内容 + 位置 + 完成时间
- 对于触发并通知的操作:明确告知“NotebookLM将在完成时通知您”
Error Handling
错误处理
| Failure | Behavior |
|---|---|
| Browser automation unavailable | Fail fast with "this skill requires browser automation" message (Step 0 halt) |
| Login wall detected | Stop. Tell user to log in. Don't attempt auto-login. |
| Multiple notebooks match name | Screenshot homepage, list options, ask user to specify |
| Source ingestion spinner stuck > 60s | Note timeout, ask user if they want to retry |
| Studio button not found in panel | Scroll down or look for "Discover more"; if still missing, note feature may not be enabled for this account |
| Chat response doesn't appear in 10s | Screenshot, check for error state, retry once |
| Page layout changed unexpectedly | Screenshot, describe what's visible, ask user for guidance |
| 故障 | 处理方式 |
|---|---|
| 浏览器自动化不可用 | 快速失败并显示“本技能需要浏览器自动化环境”提示(步骤0终止) |
| 检测到登录墙 | 停止。告知用户登录。切勿尝试自动登录。 |
| 多个笔记本匹配名称 | 截图主页,列出选项,请求用户明确选择 |
| 来源摄取加载动画卡住超过60秒 | 记录超时,询问用户是否重试 |
| 在面板中未找到Studio按钮 | 向下滚动或查找“Discover more”;若仍未找到,提示该功能可能未对本账户启用 |
| 10秒内未出现聊天响应 | 截图,检查错误状态,重试一次 |
| 页面布局意外变更 | 截图,描述可见内容,请求用户指导 |
Tooling
工具
| Script | Role |
|---|---|
| Q1-Q4 answers → action plan + UI flow + required parameters |
| Studio output type + audience + length → starter custom prompt |
| Action name → wait-or-notify pattern (fire-and-notify for slow generations) |
| 脚本 | 作用 |
|---|---|
| 将Q1-Q4的答案转换为操作计划 + UI流程 + 所需参数 |
| 根据Studio输出类型 + 受众 + 长度生成初始自定义提示 |
| 根据操作名称确定等待或通知模式(缓慢生成操作采用触发并通知) |
References
参考资料
- — screenshot-first + find-before-click + tool-agnostic patterns (7+ sources)
references/browser_automation_canon.md - — why defaults are mediocre + per-output-type templates (7+ sources)
references/studio_output_custom_prompts.md - — fire-and-notify pattern for slow UI ops (7+ sources)
references/async_action_discipline.md
- — 优先截图+先查找后点击+工具无关模式(7+来源)
references/browser_automation_canon.md - — 为何默认提示质量一般+按输出类型分类的模板(7+来源)
references/studio_output_custom_prompts.md - — 缓慢UI操作的触发并通知模式(7+来源)
references/async_action_discipline.md
Anti-Patterns To Reject
需避免的反模式
- Tool-specific tool names without abstraction (e.g., hardcoding "Claude Chrome Extension")
- Synchronous waiting on Studio generations (especially Audio Overview)
- Skipping screenshots between actions
- Using pixel coordinates when semantic find() is available
- Attempting to handle login flows automatically
- Generating Studio outputs without opening customization menu
- Using default Studio prompts (always write custom)
Version: 1.0.0
Source spec:
Build pattern: Path B (direct conversion). Browser-automation shape — distinct from research-pack convention.
megaprompts/03-notebooklm-megaprompt.md- 未抽象的工具特定名称(例如硬编码“Claude Chrome Extension”)
- 同步等待Studio生成(尤其是Audio Overview)
- 操作之间跳过截图
- 可使用语义find()时仍使用像素坐标
- 尝试自动处理登录流程
- 未打开自定义菜单就生成Studio输出
- 使用默认Studio提示(始终编写自定义提示)
版本: 1.0.0
源规范:
构建模式: 路径B(直接转换)。浏览器自动化形态 — 与research-pack约定不同。
megaprompts/03-notebooklm-megaprompt.md