linkedin-scraper

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LinkedIn Scraper — Chrome Profile Web Scraping

LinkedIn 爬取工具 — 基于Chrome配置文件的网页爬取

Scrape LinkedIn profiles and search results using the user's authenticated Chrome browser session. No API keys needed — uses the browser tool with the Chrome profile relay.

使用用户已认证的Chrome浏览器会话爬取LinkedIn个人资料和搜索结果。无需API密钥——通过浏览器工具配合Chrome配置文件中继实现。

Prerequisites

前置条件

Chrome browser with active LinkedIn login
Browser relay connected (Chrome extension or openclaw browser profile)
DuckDB workspace for storing results (optional)

已登录LinkedIn的Chrome浏览器
已连接浏览器中继（Chrome扩展或openclaw浏览器配置文件）
用于存储结果的DuckDB工作区（可选）

Core Workflow

核心工作流程

1. Single Profile Scrape

1. 单个个人资料爬取

browser → open LinkedIn profile URL
browser → snapshot (extract structured data)
→ Parse: name, headline, title, company, location, education, experience, connections, about
→ Return structured JSON or insert into DuckDB

browser → 打开LinkedIn个人资料URL
browser → 快照（提取结构化数据）
→ 解析：姓名、头衔、职位、公司、所在地、教育背景、工作经历、人脉数量、个人简介
→ 返回结构化JSON或插入DuckDB

2. Search + Bulk Scrape

2. 搜索+批量爬取

browser → open LinkedIn search URL with filters
browser → snapshot (extract result cards)
→ Parse each result: name, title, company, profile URL
→ For each profile URL: open → snapshot → parse full profile
→ Batch insert into DuckDB

browser → 打开带筛选条件的LinkedIn搜索URL
browser → 快照（提取结果卡片）
→ 解析每个结果：姓名、职位、公司、个人资料URL
→ 针对每个个人资料URL：打开→快照→解析完整个人资料
→ 批量插入DuckDB

3. Company Page Scrape

3. 公司页面爬取

browser → open LinkedIn company page
→ Parse: company name, industry, size, description, specialties, employee count
→ Navigate to /people tab for employee list

browser → 打开LinkedIn公司页面
→ 解析：公司名称、行业、规模、描述、业务专长、员工数量
→ 导航至/people标签页获取员工列表

Implementation Rules

实施规则

Rate Limiting (CRITICAL)

速率限制（至关重要）

Minimum 3-5 second delay between page loads
Maximum 80 profiles per session (LinkedIn rate limits)
Randomize delays between 3-8 seconds (avoid detection)
After every 20 profiles, take a 60-second break
If CAPTCHA or "unusual activity" detected, stop immediately and alert user

页面加载之间至少保持3-5秒延迟
每个会话最多爬取80个个人资料（LinkedIn有速率限制）
随机化延迟时间在3-8秒之间（避免被检测）
每爬取20个个人资料后，暂停60秒
若检测到CAPTCHA或“异常活动”提示，立即停止并提醒用户

Stealth Patterns

隐匿模式

Use natural scrolling (scroll down slowly, pause, scroll more)
Don't scrape the same search results page more than twice
Vary the order of profile visits (don't go sequentially)
Close and reopen tabs periodically

使用自然滚动（缓慢向下滚动、暂停、继续滚动）
同一搜索结果页面爬取不超过两次
改变个人资料访问顺序（不要按顺序访问）
定期关闭并重新打开标签页

Data Extraction — Profile Page

数据提取 — 个人资料页面

From a LinkedIn profile snapshot, extract these fields:

Field	Location	Notes
name	Main heading h1	Full name
headline	Below name	Title + Company usually
location	Location section	City, State/Country
current_title	Experience section, first entry	Most recent role
current_company	Experience section, first entry	Company name
education	Education section	School, degree, dates
connections	Connections count	Number or "500+"
about	About section	Bio text (may need "see more" click)
experience	Experience section	All roles with dates
profile_url	Browser URL bar	Canonical LinkedIn URL

从LinkedIn个人资料快照中提取以下字段：

字段	位置	说明
name	主标题h1	全名
headline	姓名下方	通常包含职位+公司
location	所在地板块	城市、州/国家
current_title	工作经历板块第一条	最新职位
current_company	工作经历板块第一条	公司名称
education	教育背景板块	学校、学位、就读时间
connections	人脉数量	具体数字或“500+”
about	个人简介板块	个人简介文本（可能需要点击“查看更多”）
experience	工作经历板块	所有带时间的职位
profile_url	浏览器地址栏	标准LinkedIn URL

Data Extraction — Search Results

数据提取 — 搜索结果页面

From LinkedIn search results page:

Field	Location
name	Result card heading
headline	Below name in card
location	Card metadata
profile_url	Link href on name
mutual_connections	Card footer

从LinkedIn搜索结果页面提取：

字段	位置
name	结果卡片标题
headline	卡片内姓名下方
location	卡片元数据
profile_url	姓名上的链接地址
mutual_connections	卡片页脚

Search URL Patterns

搜索URL格式

undefined

undefined

People search

人员搜索

https://www.linkedin.com/search/results/people/?keywords={query}

With filters

带筛选条件

&geoUrn=%5B%22103644278%22%5D # United States &network=%5B%22F%22%2C%22S%22%5D # 1st + 2nd connections &currentCompany=%5B%22{company_id}%22%5D # Current company &schoolFilter=%5B%22{school_id}%22%5D # School filter

&geoUrn=%5B%22103644278%22%5D # 美国 &network=%5B%22F%22%2C%22S%22%5D # 一级+二级人脉 &currentCompany=%5B%22{company_id}%22%5D # 当前公司 &schoolFilter=%5B%22{school_id}%22%5D # 学校筛选

YC founders (common query)

YC创始人（常见查询）

https://www.linkedin.com/search/results/people/?keywords=Y%20Combinator%20founder

Company employees

公司员工

https://www.linkedin.com/company/{slug}/people/

undefined

https://www.linkedin.com/company/{slug}/people/

undefined

DuckDB Integration

DuckDB集成

When storing to DuckDB, use the Ironclaw workspace database:

sql

-- Check if leads/contacts object exists
SELECT * FROM objects WHERE name = 'leads' OR name = 'contacts';

-- Insert via the EAV pattern or direct pivot view
INSERT INTO v_leads ("Name", "Title", "Company", "LinkedIn URL", "Location", "Source")
VALUES (?, ?, ?, ?, ?, 'LinkedIn Scrape');

If no suitable object exists, create one:

sql

-- Use Ironclaw's object creation pattern from the dench skill

当存储至DuckDB时，使用Ironclaw工作区数据库：

sql

-- 检查leads/contacts对象是否存在
SELECT * FROM objects WHERE name = 'leads' OR name = 'contacts';

-- 通过EAV模式或直接透视视图插入数据
INSERT INTO v_leads ("Name", "Title", "Company", "LinkedIn URL", "Location", "Source")
VALUES (?, ?, ?, ?, ?, 'LinkedIn Scrape');

如果没有合适的对象，创建一个：

sql

-- 使用Ironclaw的对象创建模式（来自dench skill）

Error Handling

错误处理

Error	Action
"Sign in" page	LinkedIn session expired — alert user to re-login in Chrome
CAPTCHA / Security check	Stop immediately, wait 30+ min, alert user
"Profile not found"	Skip, log URL as invalid
Rate limit (429)	Stop, wait 15 min, retry with longer delays
Empty snapshot	Page still loading — wait 3s and re-snapshot

错误	操作
“登录”页面	LinkedIn会话已过期 — 提醒用户在Chrome中重新登录
CAPTCHA / 安全检查	立即停止，等待30分钟以上，提醒用户
“个人资料未找到”	跳过，记录该URL为无效
速率限制（429错误）	停止，等待15分钟，重试时延长延迟时间
空快照	页面仍在加载 — 等待3秒后重新快照

Output Formats

输出格式

JSON (default)

JSON（默认）

json

{
  "name": "Jane Doe",
  "headline": "CEO at Acme Corp",
  "current_title": "CEO",
  "current_company": "Acme Corp",
  "location": "San Francisco, CA",
  "linkedin_url": "https://www.linkedin.com/in/janedoe",
  "connections": "500+",
  "education": [{"school": "Stanford", "degree": "BS CS", "years": "2010-2014"}],
  "experience": [{"title": "CEO", "company": "Acme Corp", "duration": "2020-Present"}],
  "scraped_at": "2026-02-17T14:30:00Z"
}

json

{
  "name": "Jane Doe",
  "headline": "CEO at Acme Corp",
  "current_title": "CEO",
  "current_company": "Acme Corp",
  "location": "San Francisco, CA",
  "linkedin_url": "https://www.linkedin.com/in/janedoe",
  "connections": "500+",
  "education": [{"school": "Stanford", "degree": "BS CS", "years": "2010-2014"}],
  "experience": [{"title": "CEO", "company": "Acme Corp", "duration": "2020-Present"}],
  "scraped_at": "2026-02-17T14:30:00Z"
}

Progress Reporting

进度报告

For bulk scrapes, report progress:

Scraping: 15/50 profiles (30%) — Last: Jane Doe (Acme Corp)
Rate: ~4 profiles/min — ETA: 9 min remaining

对于批量爬取，报告进度：

爬取进度：15/50个个人资料（30%）—— 最后一个：Jane Doe（Acme Corp）
速率：约4个个人资料/分钟 — 预计剩余时间：9分钟

Safety

安全注意事项

Never scrape private/restricted profiles
Respect LinkedIn's robots.txt for public pages
Store data locally only (DuckDB) — never exfiltrate
User must have legitimate LinkedIn access
This tool assists the user's own manual browsing at scale

切勿爬取私人/受限个人资料
遵守LinkedIn针对公开页面的robots.txt规则
仅本地存储数据（DuckDB）—— 绝不向外泄露
用户必须拥有合法的LinkedIn访问权限
本工具仅辅助用户在合法范围内批量完成手动浏览可实现的操作