Loading...
Loading...
Use when user asks YouTube video extraction, get, fetch, transcripts, subtitles, or captions. Writes video details and transcription into structured markdown file.
npx skill4agent add nicepkg/ai-workflow youtube-to-markdownpython3 ./check_existing.py "<YOUTUBE_URL>" "<output_directory>"summary_valid: falsetranscript_valid: falsecomments_valid: falseexists: truepython3 extract_data.py "<YOUTUBE_URL>" "<output_directory>"enpython3 extract_transcript.py "<YOUTUBE_URL>" "<output_directory>" "<LANG_CODE>"python3 extract_transcript_whisper.py "<YOUTUBE_URL>" "<output_directory>"python3 ./deduplicate_vtt.py "<output_directory>/${BASE_NAME}_transcript.vtt" "<output_directory>/${BASE_NAME}_transcript_dedup.md" "<output_directory>/${BASE_NAME}_transcript_no_timestamps.txt"INPUT: <output_directory>/${BASE_NAME}_transcript_no_timestamps.txt
CHAPTERS: <output_directory>/${BASE_NAME}_chapters.json
OUTPUT: <output_directory>/${BASE_NAME}_transcript_paragraphs.txt
Analyze INPUT and identify natural paragraph break line numbers.
Read CHAPTERS. If it contains chapters, use chapter timestamps as primary break points.
Target ~500 chars per paragraph. Find natural break points at topic shifts or sentence endings.
Write to OUTPUT in format:
15,42,78,103,...python3 ./apply_paragraph_breaks.py "<output_directory>/${BASE_NAME}_transcript_dedup.md" "<output_directory>/${BASE_NAME}_transcript_paragraphs.txt" "<output_directory>/${BASE_NAME}_transcript_paragraphs.md"INPUT: <output_directory>/${BASE_NAME}_transcript_no_timestamps.txt
OUTPUT: <output_directory>/${BASE_NAME}_summary.md
FORMATS: ./summary_formats.md
1. Classify content type:
- TIPS: gear reviews, rankings, "X ways to...", practical advice lists
- INTERVIEW: podcasts, conversations, Q&A, multiple perspectives
- EDUCATIONAL: concept explanations, analysis, "how X works"
- TUTORIAL: step-by-step instructions, coding, recipes
2. Analyze content structure:
- Identify meaningful content units (topic shifts, argument structure, narrative breaks)
- If single continuous topic, omit content unit headers
- Skip ads, sponsors, self-promotion ("like and subscribe", merch, etc.)
- Merge content spanning ad breaks if thematically connected
3. Read FORMATS file and use format for detected content type. Target <10% of transcript bytes.
ACTION REQUIRED: Use the Write tool NOW to save output to OUTPUT file. Do not ask for confirmation.INPUT: <output_directory>/${BASE_NAME}_summary.md
OUTPUT: <output_directory>/${BASE_NAME}_summary_tight.md
FORMATS: ./summary_formats.md
You are an adversarial copy editor. Cut fluff, enforce quality.
Rules:
- Read FORMATS - the format has been selected based on the content type - preserve format and do not count a reason to squeeze more from budget.
- Byte budget: <10% of transcript bytes
- Hidden Gems: Remove if duplicates main content
- Tightness: Cut filler words, compress verbose explanations, prefer lists over prose
Preserve original language - do not translate.
ACTION REQUIRED: Use the Write tool NOW to save output to OUTPUT file. Do not ask for confirmation.Read <output_directory>/${BASE_NAME}_transcript_paragraphs.md and clean speech artifacts.
Tasks:
- Remove fillers (um, uh, like, you know)
- Fix transcription errors
- Add proper punctuation
- Reduce or add implicit words to improve flow
- Preserve natural voice and tone
- Keep timestamps at end of paragraphs
ACTION REQUIRED: Use the Write tool NOW to save output to <output_directory>/${BASE_NAME}_transcript_cleaned.md. Do not ask for confirmation.INPUT: <output_directory>/${BASE_NAME}_transcript_cleaned.md
OUTPUT: <output_directory>/${BASE_NAME}_transcript.md
Read the INPUT file. Add markdown headings.
Read <output_directory>/${BASE_NAME}_chapters.json:
- If contains chapters: Use chapter names as ### headings at chapter timestamps, add #### headings for subtopics
- If empty: Add ### headings where major topics change
ACTION REQUIRED: Use the Write tool NOW to save output to OUTPUT file. Do not ask for confirmation.python3 finalize.py "${BASE_NAME}" "<output_directory>"youtube - {title} ({video_id}).mdyoutube - {title} - transcript ({video_id}).md--debug