udemy-crawler
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUdemy Course Crawler
Udemy课程爬虫
Extract Udemy course data using Chrome DevTools skill.
使用Chrome DevTools技能提取Udemy课程数据。
Prerequisites
前置条件
- Chrome DevTools skill installed and working
cd .claude/skills/chrome-devtools/scripts && npm install
- 已安装并可正常使用Chrome DevTools技能
cd .claude/skills/chrome-devtools/scripts && npm install
Strategy
策略
- Visible browser required - Cloudflare blocks headless
- networkidle2 - Dynamic page load
- Scroll + expand - Lazy-loaded curriculum
- Text parsing - CSS selectors obfuscated
- 需要可见浏览器 - Cloudflare会拦截无头模式
- networkidle2 - 动态页面加载
- 滚动+展开 - 加载懒加载的课程大纲
- 文本解析 - CSS选择器已被混淆
Usage
使用方法
bash
cd .claude/skills/chrome-devtools/scriptsbash
cd .claude/skills/chrome-devtools/scriptsStep 1: Basic Info
步骤1:基础信息
bash
node evaluate.js \
--url "https://www.udemy.com/course/COURSE-SLUG/" \
--headless false \
--timeout 60000 \
--wait-until networkidle2 \
--script '(function() {
return {
title: document.querySelector("h1")?.textContent?.trim() || "",
headline: document.querySelector("[data-purpose=\"lead-headline\"]")?.textContent?.trim() || "",
rating: document.querySelector("[data-purpose=\"rating-number\"]")?.textContent?.trim() || "",
students: document.querySelector("[data-purpose=\"enrollment\"]")?.textContent?.trim() || "",
instructor: document.querySelector("[data-purpose=\"instructor-name-top\"] a")?.textContent?.trim() || "",
price: document.querySelector("[data-purpose=\"course-price-text\"] span span")?.textContent?.trim() || "",
lastUpdated: document.querySelector("[data-purpose=\"last-update-date\"]")?.textContent?.trim() || "",
language: document.querySelector("[data-purpose=\"lead-course-locale\"]")?.textContent?.trim() || "",
whatYouWillLearn: Array.from(document.querySelectorAll("[data-purpose=\"objective\"] span")).map(el => el.textContent?.trim()).filter(Boolean),
targetAudience: Array.from(document.querySelectorAll("[data-purpose=\"target-audience\"] li")).map(el => el.textContent?.trim()).filter(Boolean)
};
})()'bash
node evaluate.js \
--url "https://www.udemy.com/course/COURSE-SLUG/" \
--headless false \
--timeout 60000 \
--wait-until networkidle2 \
--script '(function() {
return {
title: document.querySelector("h1")?.textContent?.trim() || "",
headline: document.querySelector("[data-purpose=\"lead-headline\"]")?.textContent?.trim() || "",
rating: document.querySelector("[data-purpose=\"rating-number\"]")?.textContent?.trim() || "",
students: document.querySelector("[data-purpose=\"enrollment\"]")?.textContent?.trim() || "",
instructor: document.querySelector("[data-purpose=\"instructor-name-top\"] a")?.textContent?.trim() || "",
price: document.querySelector("[data-purpose=\"course-price-text\"] span span")?.textContent?.trim() || "",
lastUpdated: document.querySelector("[data-purpose=\"last-update-date\"]")?.textContent?.trim() || "",
language: document.querySelector("[data-purpose=\"lead-course-locale\"]")?.textContent?.trim() || "",
whatYouWillLearn: Array.from(document.querySelectorAll("[data-purpose=\"objective\"] span")).map(el => el.textContent?.trim()).filter(Boolean),
targetAudience: Array.from(document.querySelectorAll("[data-purpose=\"target-audience\"] li")).map(el => el.textContent?.trim()).filter(Boolean)
};
})()'Step 2: Curriculum + Details
步骤2:课程大纲+详细信息
bash
node evaluate.js \
--url "https://www.udemy.com/course/COURSE-SLUG/" \
--headless false \
--timeout 120000 \
--wait-until networkidle2 \
--script '(async function() {
window.scrollTo(0, 1200);
await new Promise(r => setTimeout(r, 1000));
const expandAll = Array.from(document.querySelectorAll("button")).find(b => b.textContent.includes("Expand all"));
if (expandAll) { expandAll.click(); await new Promise(r => setTimeout(r, 3000)); }
window.scrollTo(0, 5000);
await new Promise(r => setTimeout(r, 1000));
const mainContent = document.getElementById("main-content-anchor");
const parent = mainContent ? mainContent.closest("div") : document.body;
const fullText = parent.textContent;
const currStart = fullText.indexOf("Course content");
const currEnd = fullText.indexOf("Requirements") > currStart ? fullText.indexOf("Requirements") : fullText.indexOf("Who this course");
const descStart = fullText.indexOf("Description");
const descEnd = fullText.indexOf("Who this course");
const reqStart = fullText.indexOf("Requirements");
const reqEnd = fullText.indexOf("Description");
const targetStart = fullText.indexOf("Who this course is for");
return {
curriculum: fullText.substring(currStart, currEnd).replace(/\s+/g, " ").trim(),
requirements: fullText.substring(reqStart, reqEnd).replace(/\s+/g, " ").trim(),
description: fullText.substring(descStart, descEnd).replace(/\s+/g, " ").substring(0, 3000).trim(),
targetAudience: fullText.substring(targetStart, targetStart + 1500).replace(/\s+/g, " ").trim()
};
})()'bash
node evaluate.js \
--url "https://www.udemy.com/course/COURSE-SLUG/" \
--headless false \
--timeout 120000 \
--wait-until networkidle2 \
--script '(async function() {
window.scrollTo(0, 1200);
await new Promise(r => setTimeout(r, 1000));
const expandAll = Array.from(document.querySelectorAll("button")).find(b => b.textContent.includes("Expand all"));
if (expandAll) { expandAll.click(); await new Promise(r => setTimeout(r, 3000)); }
window.scrollTo(0, 5000);
await new Promise(r => setTimeout(r, 1000));
const mainContent = document.getElementById("main-content-anchor");
const parent = mainContent ? mainContent.closest("div") : document.body;
const fullText = parent.textContent;
const currStart = fullText.indexOf("Course content");
const currEnd = fullText.indexOf("Requirements") > currStart ? fullText.indexOf("Requirements") : fullText.indexOf("Who this course");
const descStart = fullText.indexOf("Description");
const descEnd = fullText.indexOf("Who this course");
const reqStart = fullText.indexOf("Requirements");
const reqEnd = fullText.indexOf("Description");
const targetStart = fullText.indexOf("Who this course is for");
return {
curriculum: fullText.substring(currStart, currEnd).replace(/\s+/g, " ").trim(),
requirements: fullText.substring(reqStart, reqEnd).replace(/\s+/g, " ").trim(),
description: fullText.substring(descStart, descEnd).replace(/\s+/g, " ").substring(0, 3000).trim(),
targetAudience: fullText.substring(targetStart, targetStart + 1500).replace(/\s+/g, " ").trim()
};
})()'Output Template
输出模板
markdown
undefinedmarkdown
undefined{title}
{title}
URL: {url}
URL: {url}
Course Info
课程信息
- Instructor: {instructor}
- Rating: {rating} ({students})
- Language: {language}
- Price: {price}
- Last Updated: {lastUpdated}
- 讲师: {instructor}
- 评分: {rating} ({students})
- 语言: {language}
- 价格: {price}
- 最后更新: {lastUpdated}
What You'll Learn
你将学到什么
{whatYouWillLearn as bullets}
{whatYouWillLearn as bullets}
Course Content
课程内容
{curriculum parsed into sections}
{curriculum parsed into sections}
Requirements
要求
{requirements as bullets}
{requirements as bullets}
Description
课程描述
{description}
{description}
Target Audience
目标受众
{targetAudience as bullets}
undefined{targetAudience as bullets}
undefinedKey Notes
关键注意事项
- attributes stable for: rating, enrollment, objectives
data-purpose - Curriculum uses dynamic class names - text parsing only
- 3s delay after expanding sections
- Scroll required to load lazy content
- 属性在以下内容中是稳定的:评分、注册人数、学习目标
data-purpose - 课程大纲使用动态类名 - 仅支持文本解析
- 展开章节后需等待3秒
- 需要滚动页面以加载懒加载内容