BUAA Classroom Summarizer
Use this skill for:
- one
classroom.msa.buaa.edu.cn/livingroom
replay URL
- one
classroom.msa.buaa.edu.cn/coursedetail
course page
Assume commands run from this skill root. Otherwise use the absolute path to
.
Core Boundary
- Let scripts handle authentication, extraction, replay diagnosis, caching, and artifact writes.
- Let the agent handle course alignment, concept confirmation, terminology correction, and final prose reconstruction.
- Treat deterministic note output as a seed unless semantic rebuild is explicitly completed.
Main Commands
Single replay extraction:
powershell
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>"
Whole-course replay enumeration or extraction:
powershell
python scripts\collect_buaa_course_replays.py "<coursedetail-url>" --output-dir "<output-dir>"
python scripts\collect_buaa_course_replays.py "<coursedetail-url>" --output-dir "<output-dir>" --extract-existing --skip-existing
Whole-course commands are extraction and inventory commands, not permission to generate final notes for every available lesson. For a
URL, enumerate or extract artifacts first, then ask the user which lesson to semantically rebuild next unless the user explicitly requests a batch finalization workflow.
Course Identity Rule
For a
URL, resolve the course identity before choosing a course folder or reusing vault context:
- First extract the strongest available course title from classroom/SPoC metadata, saved course detail files, replay metadata, or stable replay titles.
- Treat the normalized course title as the canonical course identity. If the title matches an existing course, it is the same course even when , lecturer, class time, classroom, or sub_id ranges differ.
- If no reliable title is available, use only as a provisional identity such as ; do not merge into an existing titled course by teacher, schedule, lesson dates, old extraction directories, or nearby vault content.
- Preserve and source URLs as source metadata, but do not use them to split courses that have the same confirmed title.
- If title extraction is ambiguous and the target vault already has plausible course folders, pause for user confirmation before writing formal notes or updating trackers.
Runtime browser auth when local cookie reuse is unreliable:
powershell
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>" --browser-runtime-auth --browser-channel "auto"
Required Replay Diagnosis
Before building any note, the script must produce one
and route the replay into exactly one of:
Downstream note logic must consume this diagnosis instead of recomputing route decisions elsewhere.
Standalone Markdown Note Workflow
Prepare a semantic rebuild packet:
powershell
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>" --export-markdown-note
Preferred user-facing modes:
powershell
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>" --export-markdown-note --markdown-note-mode "final-lite"
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>" --export-markdown-note --markdown-note-mode "final-explained"
These modes must write only:
semantic_rebuild/semantic_rebuild_input.json
semantic_rebuild/semantic_rebuild_prompt.md
All final-oriented modes, including legacy
, must write only the semantic packet. Do not emit
by default. Treat the packet as the only intermediate artifact, then let the agent produce the final note.
Before accepting an agent-written note as final, run:
powershell
python scripts\validate_final_note.py "<final-note.md>"
If the validator fails, keep the extraction artifacts and semantic packet only. Do not rename or present the failed note as a final note.
Then create a reviewer packet:
powershell
python scripts\review_final_note.py --note "<final-note.md>" --semantic-input "<semantic_rebuild_input.json>" --output-dir "<review-dir>"
Use
final_note_review/final_note_review_prompt.md
with an independent reviewer agent only when the active system/developer instructions allow spawning one. If subagents are unavailable or not allowed, run a separate reviewer pass yourself with the same prompt, write the result as
final_note_review/final_note_review_result.json
, and do not edit the note during review.
When the agent writes the final standalone Markdown note:
- use a readable lesson filename, preferably the lesson title such as
2026-04-13 贝叶斯统计 第7周星期1第3,4,5节.md
- do not name the final note
- place final lesson Markdown notes for the same course in one course folder named by course title, for example
贝叶斯统计/2026-04-13 贝叶斯统计 第7周星期1第3,4,5节.md
- if the chosen output directory is already named exactly as the course title, write final lesson Markdown notes directly in that directory; do not create
- keep extraction artifacts such as , , and semantic rebuild packets in their original replay output directories; only user-facing final Markdown notes need the course-folder layout
- start directly with the lesson title and content
- do not show production metadata such as , , transcript coverage, replay diagnosis, or PPT extraction status in the user-facing note
- pass
scripts\validate_final_note.py
before the note is called final
- pass the reviewer gate for the current file hash before the note is called final
Batch Finalization Rule
Whole-course extraction is still not permission to blindly finalize every replay. A whole-course run may produce:
- replay inventory
- extraction artifacts
- semantic rebuild packets
- a course-level todo list
If the user explicitly asks to process all currently pending lessons, batch finalization is allowed, but it must remain lesson-by-lesson inside the batch:
- Skip lessons already finalized unless the user asks for a revision.
- For each candidate lesson, verify the transcript exists and is non-empty before authoring.
- Read the full transcript plus the semantic packet before writing that lesson.
- Run , create the review packet, and record a pass result for the current note hash before calling that lesson final.
- If a lesson fails any gate, leave only artifacts/packet or a review-gated draft and continue with other eligible lessons.
- Prefer running tracker/overview maintenance once after the batch, not after every lesson, unless an intermediate checkpoint is needed.
- Reuse existing extraction artifacts and semantic packets when their inputs have not changed; do not rerun browser extraction just to rebuild prose.
Final Note Quality Gate
A user-facing final note must be a semantic reconstruction, not a decorated transcript segment list. Before writing or accepting a final note, reject it if it contains any of these patterns:
- raw ASR/OCR snippets presented as "representative expressions" or "代表性表达"
- headings such as
- boilerplate like
- repeated generic advice across sections instead of course-specific mathematical content
- transcript noise such as misrecognized symbols copied into the note without correction
- a course overview that marks low-quality diagnostics as "正式笔记"
If a note fails this gate, keep only the extraction artifacts and semantic packet, then mark the lesson as needing semantic rebuild. Do not call it final.
The semantic packet must not contain user-facing seed prose such as
, raw
, or
. It may contain time windows and paths to the transcript; the agent must read the transcript itself and reconstruct the note semantically.
Reviewer Gate
Finalization requires both gates on the current Markdown bytes:
scripts\validate_final_note.py
passes.
- The independent reviewer returns ,
finalization_allowed=true
, and equal to final_note_review_input.json
.
If the note changes after either gate, both gate results are invalid and must be rerun.
Reviewer implementation detail:
- If subagents are permitted, use an independent reviewer agent.
- If subagents are not permitted by active instructions, run a separate reviewer pass in the main agent, write
final_note_review_result.json
, and ensure matches final_note_review_input.json
.
- Do not rerun review for an unchanged note when an existing
final_note_review_result.json
already passes for the same hash.
Reviewer decisions:
- : the note faithfully covers the transcript, handles course-domain substance, preserves supported affairs/emphasis, and is safe to present as final.
- : the transcript can support a final note, but the current note misses supported content, is too generic, or needs correction. Revise, rerun hard gate, then rerun reviewer.
- : the current source material or note is not fit for finalization. Keep extraction artifacts and semantic packet; do not present a final note.
Absence is not failure. Missing homework, exam, grading, or deadline information is only a problem when the transcript contains evidence for it and the note omits, distorts, or invents it. If the transcript shows early dismissal, in-class exercise, student presentation, discussion, or a logistics-only class, the note may be short but must faithfully describe what happened.
Semantic Rebuild Rules
- Perform a course-alignment check before accepting the rewrite as final.
- Correct obvious ASR/OCR term errors when the course context makes the intended term clear.
- Keep the lesson time axis visible. Each final section should keep a packet time range or a coarse marker.
- Keep math as or only. Do not wrap formulas in backticks.
- Treat the course transcript as the only primary source for section boundaries, lesson mainline, and completion checks.
- Only mark a lesson final when course-transcript coverage and summary coverage both pass.
- Reconstruct course-specific substance. Do not substitute generic learning advice for missing semantic understanding.
- If is missing, empty, or near-empty, treat the replay as waiting for transcript material even if a tracker lists it under backlog. Do not create a formal note from metadata, schedule, title, or PPT alone.
Authoring Contract
When writing the final student-facing Markdown note from a semantic packet:
- You are writing the finished note, not a seed note, diagnostic note, or instruction to a future organizer.
- Read the full before writing. Use
semantic_rebuild_input.json
only as metadata, time anchors, and artifact index.
- Do not expose evidence snippets, candidate phrases, OCR fragments, raw ASR lines, or internal workflow notes.
- Every major time block should explain what teaching move happened: definition, model, argument, proof, example, comparison, case discussion, policy explanation, teacher comment, assignment, exam arrangement, or class logistics.
- Capture high-value classroom signals: exams, homework, deadlines, submission format, grading weight, reading requirements, teacher-emphasized key points, repeatedly stressed phrases, formulas, theorems, definitions, examples, and common mistakes.
- If the teacher explicitly says something is important, likely to be tested, easy to confuse, often wrong, or needs review after class, preserve it in the note.
- If transcript evidence is weak, write the item under instead of turning it into a confident conclusion.
- The final note must face the student reader directly. Avoid phrases such as “整理时应...”, “后续重写...”, “这一段主要在...”, or other process commentary.
Course-domain reconstruction guidance:
- Math and statistics: reconstruct objects, definitions, assumptions, equations, theorems, proof ideas, examples, counterexamples, symbol meanings, and links between results.
- Engineering and computer science: reconstruct system components, algorithms, design constraints, implementation steps, experiment setup, failure cases, trade-offs, and how formulas or code relate to the design.
- Humanities and social sciences: reconstruct concepts, arguments, historical or institutional background, author positions, evidence, comparisons, cases, and the teacher's evaluative emphasis.
- Ideological and political courses: reconstruct policy concepts, theoretical claims, historical context, named documents or events, value judgments, exam-oriented formulations, and examples used to explain abstract claims.
- Language, writing, and communication courses: reconstruct vocabulary, rhetorical patterns, text structure, examples, correction points, practice requirements, and teacher feedback.
- Lab, design, or project courses: reconstruct task goals, deliverables, tools, operation steps, data requirements, safety or format constraints, grading criteria, and troubleshooting advice.
Transcript-Only Rule
When
replay_diagnosis=transcript_only
:
- do not emit fake content templates such as “课程定位 / 基础概念 / 方法流程”
- let scripts provide only time segments from the course transcript, representative transcript lines, and
- let the agent infer the real lesson structure from the course transcript plus course context
- do not ask scripts to pre-confirm concepts from transcript-only material
PPT Rule
- Prefer teacher stream by default.
- Treat PPT as auxiliary only, even when a PPT stream exists.
- PPT may help with term spelling, page or book titles, formula symbols, and logistics screenshots.
- PPT must not decide section boundaries, lesson mainline, concept generation, or completion state.
Logistics-Only Teacher Review
If the user only wants follow-up on assignments, exams, notices, or arrangements:
powershell
python scripts\extract_buaa_classroom.py "<livingroom-url>" --output-dir "<output-dir>" --export-markdown-note --lightweight-teacher-review
This mode prepares short teacher-stream review clips and
. It should not silently rewrite conclusions into the note until a later confirmation step marks them as confirmed.
Failure Rules
- If the course transcript is missing, keep extraction artifacts but do not invent a formal lesson note.
- If the course transcript is empty or near-empty, handle it the same as missing transcript: keep it in waiting/backlog and do not write a final note.
- If course-transcript coverage is clearly partial, keep only a diagnostic draft rather than a final note.
- If the course transcript exists but the current summary only covers an early slice of the lesson or leaves large uncovered gaps, mark the note instead of final.
- If session reuse fails, rerun with .
On Windows, prefer a UTF-8 shell when validating generated files. If needed, set
and
[Console]::OutputEncoding
to UTF-8 before manual
or other console inspection. For inline Python in PowerShell, use:
powershell
@'
print("hello")
'@ | python -
Do not use Bash heredoc syntax such as
in PowerShell.