bilibili-render-pdf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bilibili Render PDF

Bilibili 视频转PDF工具

Use this skill to turn a Bilibili video into a complete, compileable
.tex
note and a rendered PDF.
This skill extends the
youtube-render-pdf
workflow with Bilibili-specific adaptations for subtitle scarcity, login-gated high resolution, multi-part (分P) videos, and platform-specific non-teaching content.
使用该工具将B站视频转换为完整的、可编译的
.tex
笔记及渲染好的PDF。
本工具基于
youtube-render-pdf
工作流进行适配,针对B站的字幕稀缺、高清资源需登录、分P视频及平台专属非教学内容等特性做了优化。

Bilibili vs YouTube: Key Differences

Bilibili 与 YouTube 的核心差异

AspectHandling
Subtitle scarcityTry CC subtitles first → fall back to Whisper speech-to-text → visual-only mode
Login-gated HD1080P+ requires cookies; prompt the user to use
yt-dlp --cookies-from-browser chrome
Multi-part videosDetect 分P videos and ask the user which parts to process
URL formatsSupport
bilibili.com/video/BVxxxxxxx
and
b23.tv
short links
DanmakuDo not use danmaku as a teaching content source (too noisy); use only CC subtitles or Whisper output
方面处理方式
字幕稀缺问题优先尝试CC字幕 → fallback 到Whisper语音转文本 → 纯视觉模式
需登录的高清资源1080P及以上分辨率需要cookies;提示用户使用
yt-dlp --cookies-from-browser chrome
命令
分P视频检测分P视频,询问用户需要处理哪些分P
URL格式支持
bilibili.com/video/BVxxxxxxx
b23.tv
短链接
弹幕不将弹幕作为教学内容来源(噪声过大);仅使用CC字幕或Whisper生成的文本

Goal

目标

Produce a professional Chinese lecture note from a Bilibili URL.
The output must:
  • use the video's actual teaching content rather than subtitle transcription alone
  • place the video's original cover image on the front page of the
    .tex
    and rendered PDF whenever available
  • include all necessary high-value key frames as figures, without adding redundant screenshots
  • end with a final synthesis section that includes the speaker's substantive closing discussion and your own distilled takeaways
  • be structurally organized with
    \section{...}
    and
    \subsection{...}
  • be a complete
    .tex
    document from
    \documentclass
    to
    \end{document}
  • be compiled successfully to PDF as part of the final delivery
从B站URL生成专业的中文讲座笔记。
输出成果必须满足:
  • 基于视频实际教学内容,而非仅依赖字幕转录
  • 只要可用,就在
    .tex
    文件和渲染后的PDF首页放置视频原封面图
  • 包含所有必要的高价值关键帧作为插图,不添加冗余截图
  • 结尾包含总结章节,整合主讲人的实质性收尾讨论及提炼的要点
  • 使用
    \section{...}
    \subsection{...}
    进行结构化组织
  • 是从
    \documentclass
    \end{document}
    的完整
    .tex
    文档
  • 最终交付时需成功编译为PDF

Pedagogical Standard

教学标准

The notes must read like a strong human teacher is guiding the reader through the material.
  • organize each major section so the reader first understands the motivation, then the main idea, then the mechanism, then the example or evidence, and finally the takeaway
  • be patient and explicit about logical transitions; make it clear why the speaker introduces a concept, what problem it solves, and how the next idea follows
  • aim for deep-but-accessible explanations: keep the technical depth, but introduce formalism only after giving intuition in plain language
  • when a section is dense, break it into smaller subsections that progressively build understanding rather than compressing everything into one long derivation
  • do not dump subtitle content in chronological order; rewrite it into a teaching sequence with clear intent, contrast, and buildup
笔记需达到优秀人类教师引导读者学习的水准:
  • 组织每个主要章节时,先让读者理解学习动机,再介绍核心思想、实现机制,接着给出示例或证据,最后总结要点
  • 耐心且清晰地呈现逻辑过渡,明确说明主讲人引入某个概念的原因、解决的问题,以及后续内容的衔接关系
  • 追求深度与易懂性的平衡:保留技术深度,但先以通俗语言建立直觉,再引入形式化内容
  • 当章节内容密集时,将其拆分为更小的子章节,逐步构建读者的理解,而非将所有内容压缩为冗长的推导
  • 不要按字幕时间顺序堆砌内容;需重构成有明确意图、对比和递进的教学序列

Source Acquisition

资源获取

Metadata Inspection

元数据检查

  1. Inspect the video metadata first. Prefer title, chapters, duration, thumbnail availability, and subtitle availability before writing.
  2. Detect multi-part (分P) videos. List all parts and ask the user which parts to process before downloading.
  1. 先检查视频元数据。 在开始撰写前,优先获取标题、章节、时长、封面可用性及字幕可用性信息。
  2. 检测分P视频。 列出所有分P,在下载前询问用户需要处理哪些分P。

Subtitle Acquisition (Three-Level Fallback)

字幕获取(三级 fallback 机制)

Priority 1: CC subtitles (platform-embedded)
Use manual subtitles over auto-generated subtitles when both are available. Prefer
zh-Hans
,
zh-CN
,
zh
, or
ai-zh
subtitle tracks. Preserve the subtitle timestamps; do not flatten subtitles into plain text too early if figures still need to be located.
yt-dlp --write-subs --sub-langs "zh-Hans,zh-CN,zh,ai-zh" --convert-subs srt \
  --skip-download -o "%(title)s.%(ext)s" "<URL>"
Priority 2: Whisper speech-to-text (when no CC subtitles are available)
Extract audio first, then transcribe with Whisper to produce a timestamped SRT file.
yt-dlp -x --audio-format wav -o "audio.%(ext)s" "<URL>"
whisper audio.wav --model medium --language zh --output_format srt --output_dir .
Priority 3: Visual-only mode (when audio quality is too poor)
Skip subtitles entirely and rely on dense frame sampling to extract teaching content from the video frames alone.
优先级1:CC字幕(平台内嵌)
当手动字幕和自动生成字幕同时存在时,优先使用手动字幕。 优先选择
zh-Hans
zh-CN
zh
ai-zh
字幕轨道。 保留字幕时间戳;若仍需定位插图,不要过早将字幕转换为纯文本。
yt-dlp --write-subs --sub-langs "zh-Hans,zh-CN,zh,ai-zh" --convert-subs srt \
  --skip-download -o "%(title)s.%(ext)s" "<URL>"
优先级2:Whisper语音转文本(无CC字幕时使用)
先提取音频,再用Whisper转录生成带时间戳的SRT文件。
yt-dlp -x --audio-format wav -o "audio.%(ext)s" "<URL>"
whisper audio.wav --model medium --language zh --output_format srt --output_dir .
优先级3:纯视觉模式(音频质量过差时)
完全跳过字幕,依赖密集帧采样从视频帧中提取教学内容。

Video and Cover Download

视频与封面下载

  1. Acquire the video's original cover image before writing the
    .tex
    . Prefer the highest-resolution thumbnail exposed by the platform metadata. Save the selected cover locally and reference that local asset from the front page.
  2. Prefer the best usable video source for figure extraction. Probe formats and choose the highest resolution that is actually downloadable in the current environment. Note that 1080P+ on Bilibili typically requires login cookies.
  3. Keep all source artifacts local when practical. Typical working artifacts are metadata, the downloaded cover image, a timestamped subtitle file (CC or Whisper-generated), optional cleaned transcript text, a local video file, and extracted frames.
  1. 在撰写
    .tex
    文件前获取视频原封面图。 优先选择平台元数据提供的最高分辨率封面。 将选中的封面保存到本地,并在首页引用该本地资源。
  2. 优先选择最佳可用视频源用于提取插图。 探测视频格式,选择当前环境下可下载的最高分辨率视频。 注意:B站1080P及以上分辨率通常需要登录cookies。
  3. 尽可能将所有源文件保存在本地。 典型的工作文件包括元数据、下载的封面图、带时间戳的字幕文件(CC或Whisper生成)、可选的清理后的转录文本、本地视频文件及提取的帧。

Long Video Strategy

长视频处理策略

For longer videos, do not rely on a single monolithic pass.
  • If the video is longer than 20 minutes, or the subtitle file contains more than 300 subtitle entries, split the work into smaller segments.
  • Prefer chapter boundaries or 分P boundaries for splitting. If those are unavailable or too uneven, split by coherent time windows or subtitle ranges.
  • When subagents are available, spawn multiple subagents in parallel for different segments so coverage stays high and detail is not lost.
  • Give each subagent a concrete segment boundary and require it to return: the segment's teaching goal, the core claims, important formulas or code, required figures with time provenance, and any ambiguities that need integration-time resolution.
  • Keep a small overlap between neighboring segments when the explanation crosses boundaries, then deduplicate during integration.
  • The main agent must integrate the segment outputs into one unified outline and one coherent final narrative. The final PDF must read like a single lecture note, not a concatenation of chunk summaries.
对于较长的视频,不要依赖单一的整体处理流程。
  • 若视频时长超过20分钟,或字幕文件包含超过300条字幕条目,将工作拆分为更小的片段。
  • 优先按章节边界或分P边界拆分。若这些边界不可用或分布不均,按连贯的时间窗口或字幕范围拆分。
  • 若有子代理可用,为不同片段并行生成多个子代理,以保证内容覆盖度和细节。
  • 为每个子代理指定明确的片段边界,要求其返回:该片段的教学目标、核心观点、重要公式或代码、需包含的插图及时间来源、以及需要在整合阶段解决的模糊点。
  • 当讲解内容跨片段边界时,相邻片段保留少量重叠内容,整合时再去重。
  • 主代理必须将各片段的输出整合为统一的大纲和连贯的最终内容。最终PDF应像一份完整的讲座笔记,而非多个片段摘要的拼接。

Teaching Content Rules

教学内容规则

Build the notes from all of the following when available:
  • video title and chapter structure
  • the video's original cover image and key metadata
  • on-screen diagrams, formulas, tables, plots, and architecture slides
  • subtitle explanations, examples, and verbal emphasis
  • code snippets shown or described in the talk
Skip content that does not contribute to the actual lesson:
  • greetings
  • small talk
  • sponsorship
  • channel logistics (一键三连, 关注投币, etc.)
  • closing pleasantries
Keep the speaker's closing discussion when it carries actual teaching value, such as synthesis, limitations, future work, tradeoffs, advice, or open questions.
构建笔记时,整合所有以下可用内容:
  • 视频标题和章节结构
  • 视频原封面图及关键元数据
  • 屏幕上的图表、公式、表格、图表及架构幻灯片
  • 字幕讲解、示例及口头强调的内容
  • 讲座中展示或提及的代码片段
跳过对实际教学无贡献的内容:
  • 问候语
  • 闲聊
  • 赞助内容
  • 频道运营内容(一键三连、关注投币等)
  • 结束语客套话
当主讲人的收尾讨论具有实际教学价值时(如总结、局限性、未来工作、权衡、建议或开放问题),需保留该部分内容。

Writing Rules

撰写规则

  1. Write the notes in Chinese unless the user explicitly requests another language.
  2. Organize the document with
    \section{...}
    and
    \subsection{...}
    . Reconstruct the teaching flow when needed; do not blindly mirror subtitle order. Each section should answer, in order when applicable: what problem is being solved, why simpler views are insufficient, what the core idea is, how it works, and what the reader should retain.
  3. Start from
    assets/notes-template.tex
    . Fill in the metadata block, including the local cover image path, and replace the body content block with the generated notes.
  4. The front page must include the video's original cover image when available. Place it on the first page rather than burying it later in the document. Keep it visually distinct from in-body teaching figures.
  5. Use figures whenever they materially improve explanation. Include as many figures as are necessary for teaching clarity, even if that means many figures across the document. Do not optimize for a small figure count; optimize for explanatory coverage and readability. Good figures are key formulas, diagrams, tables, plots, visual comparisons, pipeline schedules, architecture views, and stage-by-stage visual progressions.
  6. Do not place images inside custom message boxes.
  7. When a mathematical formula appears: first explain in plain Chinese what the formula is trying to express and why it appears show it in display math using
    $$...$$
    then immediately follow with a flat list that explains every symbol
  8. When code examples appear: explain the role of the code before the listing and summarize the expected behavior after it when useful wrap them in
    lstlisting
    include a descriptive
    caption
  9. Highlight teaching signals deliberately and repeatedly when the content justifies it: use
    importantbox
    for core concepts the reader must walk away with, including formal definitions, central claims, key mechanism summaries, theorem-like statements, critical algorithm steps, and compact restatements of the main idea after a dense explanation use
    knowledgebox
    for background and side knowledge that improves understanding without being the main thread, including prerequisite reminders, historical lineage, engineering context, design tradeoffs, terminology comparisons, and intuition-building analogies use
    warningbox
    for common misunderstandings and failure points, including notation overload, hidden assumptions, misleading heuristics, easy-to-make implementation mistakes, causal confusions, off-by-one style reasoning errors, and places where the speaker contrasts a wrong intuition with the correct one there is no quota of one box per section; add multiple boxes in a section when the material contains multiple distinct teaching signals each box should carry a specific pedagogical payload rather than generic emphasis prefer placing a box immediately after the paragraph, derivation, or example that motivates it routine exposition should stay in normal prose; boxes are for high-signal takeaways, not decoration figures must stay outside
    importantbox
    ,
    knowledgebox
    , and
    warningbox
  10. End every major section with
    \subsection{本章小结}
    . Add
    \subsection{拓展阅读}
    when there are one or two worthwhile external links.
  11. End the document with a final top-level section such as
    \section{总结与延伸}
    . That final section must include:
    • the speaker's substantive closing discussion, excluding routine sign-off language
    • your own structured distillation of the core claims, mechanisms, and practical implications
    • your expanded synthesis, including conceptual compression, cross-links between sections, and any careful generalization that stays faithful to the video
    • concrete takeaways, open questions, or next steps when the material supports them
  12. Do not emit
    [cite]
    -style placeholders anywhere in the LaTeX.
  1. 除非用户明确要求其他语言,否则笔记使用中文撰写。
  2. 使用
    \section{...}
    \subsection{...}
    组织文档。 必要时重构教学流程;不要盲目照搬字幕顺序。 每个章节应依次回答(适用时):解决的问题是什么、为什么简单方案不足、核心思想是什么、如何实现、读者应记住什么。
  3. assets/notes-template.tex
    模板开始。 填写元数据块(包括本地封面图路径),并将正文内容块替换为生成的笔记。
  4. 只要可用,首页必须包含视频原封面图。 将其放在第一页,而非埋在文档后面。 使其与正文中的教学插图在视觉上区分开。
  5. 只要对讲解有实质性帮助,就使用插图。 为保证教学清晰度,必要时可在文档中插入大量插图。 不要为减少插图数量而优化;要为讲解覆盖度和可读性优化。 优质插图包括关键公式、图表、表格、图表、视觉对比、流水线调度图、架构视图及分步视觉演进图。
  6. 不要将图片放在自定义消息框内。
  7. 当出现数学公式时: 先用通俗的中文解释公式的含义及出现的原因 使用
    $$...$$
    将其显示为展示式数学公式 随后立即用无序列表解释每个符号的含义
  8. 当出现代码示例时: 在代码列表前解释代码的作用,必要时在列表后总结预期行为 将代码包裹在
    lstlisting
    环境中 添加描述性的
    caption
  9. 当内容需要时,刻意且反复突出教学重点: 使用
    importantbox
    放置读者必须掌握的核心概念,包括正式定义、核心观点、关键机制总结、定理类陈述、关键算法步骤,以及密集讲解后的核心思想精简重述 使用
    knowledgebox
    放置有助于理解但非主线的背景知识和旁支内容,包括前置知识提醒、历史脉络、工程背景、设计权衡、术语对比及直觉构建类比 使用
    warningbox
    放置常见误解和易错点,包括符号重载、隐藏假设、误导性启发式、易犯的实现错误、因果混淆、差一错误类推理错误,以及主讲人对比错误直觉与正确内容的部分 每个章节的消息框数量没有限制;当内容包含多个不同的教学信号时,可在一个章节中添加多个消息框 每个消息框应承载特定的教学内容,而非泛泛的强调 优先将消息框放在触发它的段落、推导或示例之后 常规说明内容使用普通 prose;消息框用于高信号量的要点,而非装饰 插图必须放在
    importantbox
    knowledgebox
    warningbox
    之外
  10. 每个主要章节结尾添加
    \subsection{本章小结}
    。 当有一个或多个值得推荐的外部链接时,添加
    \subsection{拓展阅读}
  11. 文档结尾添加顶级章节,如
    \section{总结与延伸}
    。 该最终章节必须包含:
    • 主讲人的实质性收尾讨论,排除常规告别语
    • 你对核心观点、机制及实际应用的结构化提炼
    • 你拓展的总结内容,包括概念压缩、章节间交叉链接,以及忠实于视频内容的合理概括
    • 当内容支持时,给出具体的要点、开放问题或下一步建议
  12. 不要在LaTeX中使用
    [cite]
    类占位符。

Figure Handling

插图处理

Select figures by necessity and teaching value, not by an arbitrary quota or a bias toward keeping the document visually sparse.
When locating candidate frames, bias strongly toward recall before precision. It is better to inspect too many nearby candidates first than to miss the one frame where the slide, formula, table, or diagram is finally fully revealed and readable.
Frame understanding must come from direct visual inspection.
  • Use the
    view image
    tool to inspect candidate frames and crops before deciding what they show, how they should be described, and whether they are complete enough to include.
  • Do not use OCR tools such as
    tesseract
    as a substitute for visual understanding of a frame.
  • Do not infer a frame's semantic content only from nearby subtitles, filenames, or timestamps without checking the image itself.
  • Contact sheets, montages, and tiled strips are good for recall, but final keep-or-reject decisions and semantic naming must be based on actual image inspection with
    view image
    .
根据必要性和教学价值选择插图,而非基于任意配额或追求文档视觉简洁的偏见。
定位候选帧时,优先保证召回率而非精确率。 先检查过多的附近候选帧,比错过幻灯片、公式、表格或图表完全显示且清晰的那一帧更好。
帧的理解必须基于直接的视觉检查。
  • 在决定帧的内容、描述方式及是否足够完整可包含前,使用
    view image
    工具检查候选帧及裁剪后的内容。
  • 不要使用OCR工具(如
    tesseract
    )替代对帧的视觉理解。
  • 不要仅根据附近的字幕、文件名或时间戳推断帧的语义内容,而不检查图像本身。
  • 联系表、蒙太奇图和平铺帧条有助于召回,但最终的保留/舍弃决定及语义命名必须基于使用
    view image
    进行的实际图像检查。

Frame Selection Checklist

帧选择检查清单

Before inserting any video frame, first inspect several nearby candidates from the same subtitle-aligned interval and apply this checklist. If any item fails, reject the frame and keep searching nearby rather than forcing an approximate match.
  • Relevance: the frame must directly support the exact concept discussed in the surrounding paragraph or subsection, not just the same broad topic.
  • Required content visible: every visual element referenced in the text must already be visible in the frame.
  • Fully revealed state: when slides, whiteboards, animations, or dashboards build progressively, use the final fully populated readable state rather than an intermediate state.
  • Best nearby candidate: compare multiple nearby frames and prefer the one that is both most complete and most readable.
  • Readability: text, formulas, labels, and diagram structure must be legible enough to justify inclusion.
插入任何视频帧前,先检查同一字幕对齐区间内的多个附近候选帧,并应用以下检查清单。若任意一项不满足,舍弃该帧并继续在附近搜索,不要勉强使用近似匹配的帧。
  • 相关性:帧必须直接支持周围段落或子章节讨论的具体概念,而非仅相关的宽泛主题。
  • 所需内容可见:文本中提及的所有视觉元素必须已在帧中可见。
  • 完全显示状态:当幻灯片、白板、动画或仪表板内容逐步显示时,使用最终完全填充且清晰的状态,而非中间状态。
  • 最佳附近候选:比较多个附近帧,选择最完整且最清晰的那一帧。
  • 可读性:文本、公式、标签及图表结构必须足够清晰,值得包含。

Frame Naming

帧命名

  • Use neutral timestamp-based names for raw candidate frames. Do not assign semantic names before inspecting the actual frame content.
  • Rename a frame semantically only after visually confirming what is fully visible in the image.
  • The semantic filename must describe the frame's actual visible content, not a guess based on subtitles, nearby narration, or the intended paragraph topic.
  • If the frame is partially revealed, transitional, or ambiguous, keep searching and do not lock in a semantic name yet.
  • Use the timestamped subtitle file (CC or Whisper-generated SRT) as the primary locator for key-frame search.
  • First identify the subtitle span that corresponds to the concept, example, formula, or visual explanation being discussed.
  • Then search within that subtitle-aligned time interval, and slightly around its boundaries when needed, to find the best readable frame.
  • Do not jump directly from one guessed timestamp to one extracted frame. First generate a dense candidate set across the relevant interval, then inspect and down-select.
  • Prefer tools that help you inspect many nearby candidates at once, such as
    magick montage
    , contact sheets, tiled frame strips, or equivalent workflows. Use them to maximize recall and avoid missing the frame where the visual content is fully present.
  • When the visual is a progressive PPT reveal, animation build, whiteboard accumulation, or dashboard state change, explicitly search for the final fully populated state. Do not stop at the first frame that seems approximately correct.
  • If several nearby candidates differ only by progressive reveal state, keep checking until you find the frame with the most complete readable information.
  • When in doubt between a sparse early frame and a denser later frame from the same explanation window, prefer the later frame if it is materially more complete and still readable.
  • Include every figure that is necessary to explain the content well.
  • It is acceptable, and often desirable, to include several figures within one section or subsection when the video builds an idea in stages.
  • Omit repetitive or low-information frames.
  • Extract frames near chapter boundaries and explanation peaks when chapters exist, but still validate them against subtitle timing.
  • Search nearby timestamps when the first extracted frame catches an animation transition.
  • Crop, enlarge, or isolate the relevant region when the full frame is too loose.
  • When a slide reveals content progressively, capture the final readable state and add intermediate frames only when they teach a genuinely different step.
  • For dense visual sections, it is acceptable to over-sample first and discard later. Do not optimize candidate count so early that key visual states are never inspected.
  • Prefer a sequence of necessary figures over one overloaded figure with unreadable labels.
  • Preserve readability of formulas and labels.
  • 原始候选帧使用中性的基于时间戳的名称。在检查实际帧内容前,不要分配语义名称。
  • 仅在视觉确认图像中完全可见的内容后,才为帧赋予语义名称。
  • 语义文件名必须描述帧的实际可见内容,而非基于字幕、附近旁白或预期段落主题的猜测。
  • 若帧内容部分显示、过渡或模糊,继续搜索,不要锁定语义名称。
  • 使用带时间戳的字幕文件(CC或Whisper生成的SRT)作为关键帧搜索的主要定位工具。
  • 首先确定与讨论的概念、示例、公式或视觉解释对应的字幕区间。
  • 然后在该字幕对齐的时间区间内搜索,必要时可稍微扩大边界范围,找到最清晰的帧。
  • 不要直接从一个猜测的时间戳跳转到一帧提取帧。 先生成相关区间内的密集候选集,再检查并筛选。
  • 优先使用可同时检查多个附近候选的工具,如
    magick montage
    、联系表、平铺帧条或等效工作流。 使用这些工具最大化召回率,避免错过视觉内容完全显示的帧。
  • 当视觉内容是逐步展示的PPT、动画构建、白板书写或仪表板状态变化时,明确搜索最终完全填充的状态。 不要在第一个看似近似正确的帧处停止。
  • 若多个附近候选帧仅在逐步显示状态上有差异,继续检查直到找到包含最完整清晰信息的帧。
  • 当在同一讲解窗口的稀疏早期帧和更密集的后期帧之间犹豫时,若后期帧内容更完整且仍清晰,优先选择后期帧。
  • 包含所有解释内容所需的插图。
  • 当视频逐步构建某个概念时,在一个章节或子章节中包含多个插图是可接受的,且通常是可取的。
  • 省略重复或低信息量的帧。
  • 若存在章节,在章节边界和讲解高峰附近提取帧,但仍需根据字幕时间验证。
  • 当首次提取的帧捕捉到动画过渡时,搜索附近的时间戳。
  • 当整帧构图过松时,裁剪、放大或隔离相关区域。
  • 当幻灯片内容逐步显示时,捕捉最终清晰状态;仅当中间帧能教授真正不同的步骤时,才添加中间帧。
  • 对于密集的视觉内容部分,先过度采样再筛选是可接受的。 不要过早优化候选帧数量,导致关键视觉状态未被检查。
  • 优先选择一系列必要的插图,而非一个标签模糊不清的重载插图。
  • 保证公式和标签的可读性。

Figure Time Provenance

插图时间来源

Whenever the
.tex
or PDF references a specific video frame, or a crop derived from a video frame, record its source time interval on the same page as a bottom footnote.
  • The footnote must show the concrete time interval, for example
    00:12:31--00:12:46
    .
  • The interval should come from the subtitle-aligned segment used to locate the figure, not from a vague chapter-level estimate.
  • If the figure is a crop, the footnote still refers to the original video time interval of the source frame or subtitle span.
  • If several nearby frames in one figure all come from the same subtitle interval, one clear footnote is enough.
  • Keep the figure and its time footnote anchored to the same page; prefer layouts such as
    [H]
    , a non-floating block, or another stable placement when ordinary floats would separate them.
每当
.tex
或PDF引用特定视频帧或其裁剪图时,在同一页底部添加脚注记录其来源时间区间。
  • 脚注必须显示具体的时间区间,例如
    00:12:31--00:12:46
  • 区间应来自用于定位插图的字幕对齐片段,而非模糊的章节级估计。
  • 若为裁剪图,脚注仍指向源帧或字幕区间的原始视频时间区间。
  • 若同一字幕区间内的多个附近帧被用在一个插图中,一个清晰的脚注即可。
  • 保持插图及其时间脚注在同一页;当普通浮动布局会将它们分开时,优先使用
    [H]
    、非浮动块或其他稳定布局方式。

Visualization

可视化

For concepts that remain hard to explain with only screenshots and prose, add accurate visualizations.
Two acceptable routes:
  • generate LaTeX-native visualizations with TikZ or PGFPlots
  • generate figures ahead of time with scripts and include them as images
For script-generated illustrations, prefer Python tools such as
matplotlib
and
seaborn
when they are the clearest way to produce an accurate teaching figure.
When a visualization is generated externally rather than drawn natively in LaTeX:
  • export the figure as
    pdf
    so it can be inserted into the
    .tex
    without rasterization loss
  • prefer vector output for plots, charts, and schematic illustrations
  • avoid
    png
    or
    jpg
    for script-generated teaching figures unless the content is inherently raster
When the source material contains relationships, results, or equations that would be clearer when redrawn than when shown as a screenshot, prefer rebuilding them with LaTeX-native tools or with
matplotlib
/
seaborn
.
Use visualizations for:
  • process flows, pipelines, and architecture overviews
  • curves and charts such as scaling laws, training curves, benchmark results, and ablation comparisons
  • distributions, correlations, heatmaps, and other plots that explain data relationships
  • complex functions, surfaces, contour plots, and geometric intuition figures
  • tables or comparisons that become clearer when redrawn as charts
  • summary diagrams that compress a section's core mechanism or takeaway into one figure
Do not add decorative graphics that do not teach anything.
对于仅用截图和 prose 难以解释的概念,添加准确的可视化内容。
两种可接受的方式:
  • 使用TikZ或PGFPlots生成LaTeX原生可视化内容
  • 提前用脚本生成插图,再作为图片插入
对于脚本生成的插图,当
matplotlib
seaborn
是生成准确教学插图最清晰的方式时,优先使用这些Python工具。
当可视化内容是外部生成而非LaTeX原生绘制时:
  • 将图导出为
    pdf
    格式,以便插入
    .tex
    文件时无光栅化损失
  • 对于图表、曲线图和示意图,优先使用矢量输出
  • 除非内容本身是光栅图,否则避免使用
    png
    jpg
    格式
当源材料中的关系、结果或公式重绘比截图更清晰时,优先使用LaTeX原生工具或
matplotlib
/
seaborn
重绘。
可视化内容适用于:
  • 流程、流水线和架构概述
  • 曲线和图表,如缩放定律、训练曲线、基准测试结果和消融实验对比
  • 分布、相关性、热图及其他解释数据关系的图表
  • 复杂函数、曲面、等高线图及几何直觉图
  • 重绘为图表后更清晰的表格或对比内容
  • 总结章节核心机制或要点的示意图
不要添加无教学价值的装饰性图形。

Final Checklist

最终检查清单

Before delivery, verify all of the following:
  • no important teaching content has been dropped, and no concrete but critical detail has been lost during condensation, restructuring, or summarization
  • the text and figures are aligned: each inserted frame supports the surrounding explanation, necessary crops have been applied, and the chosen frame shows the fullest relevant information rather than a transitional or incomplete state
  • the document is visually rich enough for teaching: check whether more high-information key frames should be added, and whether additional LaTeX-native or Python-script-generated illustrations would improve clarity
交付前,验证以下所有内容:
  • 未遗漏重要教学内容,在浓缩、重构或总结过程中未丢失具体但关键的细节
  • 文本与插图对齐:每个插入的帧支持周围的讲解,已应用必要的裁剪,所选帧显示最完整的相关信息,而非过渡或不完整状态
  • 文档的视觉丰富度满足教学需求:检查是否应添加更多高信息量关键帧,以及是否添加LaTeX原生或Python脚本生成的插图可提升清晰度

Delivery

交付内容

Deliver all of the following:
  • the final
    .tex
    file
  • the downloaded cover image referenced on the front page
  • any extracted or generated figure assets referenced by the document
  • the compiled PDF
  • the Whisper-generated SRT subtitle file, if speech-to-text was used
交付以下所有内容:
  • 最终的
    .tex
    文件
  • 首页引用的下载封面图
  • 文档引用的所有提取或生成的插图资源
  • 编译后的PDF
  • 若使用了语音转文本,提供Whisper生成的SRT字幕文件

Asset

资源

  • assets/notes-template.tex
    : default LaTeX template to copy and fill
  • assets/notes-template.tex
    :默认LaTeX模板,可复制并填充内容