bright-data
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBright Data Web Scraper API
Bright Data Web Scraper API
Use the Bright Data API via direct calls for social media scraping, web data extraction, and account management.
curlOfficial docs:https://docs.brightdata.com/
通过直接调用使用Bright Data API,实现社交媒体爬取、网页数据提取和账户管理。
curl官方文档:https://docs.brightdata.com/
When to Use
使用场景
Use this skill when you need to:
- Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data - Posts, profiles, comments, engagement metrics
- Monitor usage - Track bandwidth and request usage
- Manage account - Check status and zones
当你需要以下操作时,可以使用本技能:
- 爬取社交媒体 - Twitter/X、Reddit、YouTube、Instagram、TikTok、LinkedIn
- 提取网页数据 - 帖子、个人主页、评论、互动指标
- 监控使用情况 - 跟踪带宽和请求使用量
- 管理账户 - 查看状态和可用区域
Prerequisites
前置条件
- Sign up at Bright Data
- Get your API key from Settings > Users
- Create a Web Scraper dataset in the Control Panel to get your
dataset_id
bash
export BRIGHTDATA_API_KEY="your-api-key"- 在Bright Data注册账号
- 从设置 > 用户获取你的API密钥
- 在控制面板中创建一个Web Scraper数据集,以获取你的
dataset_id
bash
export BRIGHTDATA_API_KEY="your-api-key"Base URL
基础URL
https://api.brightdata.comImportant: When usingin a command that pipes to another command, wrap the command containing$VARin$VAR. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c '...'bashbash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
https://api.brightdata.com重要提示: 当在包含管道的命令中使用时,请将包含$VAR的命令用$VAR包裹。由于Claude Code的一个bug,直接使用管道时环境变量会被静默清除。bash -c '...'bashbash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
Social Media Scraping
社交媒体爬取
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
Bright Data支持爬取以下社交媒体平台:
| 平台 | 个人主页 | 帖子 | 评论 | 短视频/视频 |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
How to Use
使用方法
1. Trigger Scraping (Asynchronous)
1. 触发爬取(异步)
Trigger a data collection job and get a for later retrieval.
snapshot_idWrite to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Response:
json
{
"snapshot_id": "s_m4x7enmven8djfqak"
}触发数据采集任务并获取,用于后续结果获取。
snapshot_id写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'响应:
json
{
"snapshot_id": "s_m4x7enmven8djfqak"
}2. Trigger Scraping (Synchronous)
2. 触发爬取(同步)
Get results immediately in the response (for small requests).
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'直接在响应中获取结果(适用于小型请求)。
写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'3. Monitor Progress
3. 监控任务进度
Check the status of a scraping job (replace with your actual snapshot ID):
<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Response:
json
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}Status values: , ,
runningreadyfailed检查爬取任务的状态(将替换为你的实际快照ID):
<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'响应:
json
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}状态值:(运行中)、(已就绪)、(失败)
runningreadyfailed4. Download Results
4. 下载结果
Once status is , download the collected data (replace with your actual snapshot ID):
ready<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'当状态变为后,下载采集到的数据(将替换为你的实际快照ID):
ready<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'5. List Snapshots
5. 列出所有快照
Get all your snapshots:
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'获取你的所有快照:
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'6. Cancel Snapshot
6. 取消快照
Cancel a running job (replace with your actual snapshot ID):
<snapshot-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'取消正在运行的任务(将替换为你的实际快照ID):
<snapshot-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Platform-Specific Examples
平台专属示例
Twitter/X - Scrape Profile
Twitter/X - 爬取个人主页
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/elonmusk"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , ,
x_idprofile_namebiographyis_verifiedfollowersfollowingprofile_image_link写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/elonmusk"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、、、
x_idprofile_namebiographyis_verifiedfollowersfollowingprofile_image_linkTwitter/X - Scrape Posts
Twitter/X - 爬取帖子
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username/status/123456789"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , , ,
post_idtextreplieslikesretweetsviewshashtagsmedia写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username/status/123456789"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、、、、
post_idtextreplieslikesretweetsviewshashtagsmediaReddit - Scrape Subreddit Posts
Reddit - 爬取子版块帖子
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Parameters: , (new/top/hot)
urlsort_byReturns: , , , , , ,
post_idtitledescriptionnum_commentsupvotesdate_postedcommunity写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'参数: 、(new/top/hot,即最新/最热/热门)
urlsort_by返回结果包含: 、、、、、、
post_idtitledescriptionnum_commentsupvotesdate_postedcommunityReddit - Scrape Comments
Reddit - 爬取评论
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
comment_iduser_postedcomment_textupvotesreplies写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、
comment_iduser_postedcomment_textupvotesrepliesYouTube - Scrape Video Info
YouTube - 爬取视频信息
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , ,
titleviewslikesnum_commentsvideo_lengthtranscriptchannel_name写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、、、
titleviewslikesnum_commentsvideo_lengthtranscriptchannel_nameYouTube - Search by Keyword
YouTube - 关键词搜索
Write to :
/tmp/brightdata_request.jsonjson
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'写入到:
/tmp/brightdata_request.jsonjson
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'YouTube - Scrape Comments
YouTube - 爬取评论
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
comment_textlikesrepliesusernamedate写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、
comment_textlikesrepliesusernamedateInstagram - Scrape Profile
Instagram - 爬取个人主页
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.instagram.com/username"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
followerspost_countprofile_nameis_verifiedbiography写入到:
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.instagram.com/username"}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'返回结果包含: 、、、、
followerspost_countprofile_nameis_verifiedbiographyInstagram - Scrape Posts
Instagram - 爬取帖子
Write to :
/tmp/brightdata_request.jsonjson
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'写入到:
/tmp/brightdata_request.jsonjson
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]然后运行(将替换为你的实际数据集ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Account Management
账户管理
Check Account Status
检查账户状态
bash
bash -c 'curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Response:
json
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}bash
bash -c 'curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'响应:
json
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}Get Active Zones
获取可用区域
bash
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'bash
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'Get Bandwidth Usage
获取带宽使用情况
bash
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'bash
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Getting Dataset IDs
获取Dataset ID
To use the scraping features, you need a :
dataset_id- Go to Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the from the dataset settings
dataset_id
Dataset IDs can also be found in the bandwidth usage API response under the field keys (e.g., where is your dataset ID).
datav__ds_api_gd_xxxxxgd_xxxxx要使用爬取功能,你需要一个:
dataset_id- 访问Bright Data控制面板
- 创建一个新的Web Scraper数据集,或选择一个现有的数据集
- 选择目标平台(Twitter、Reddit、YouTube等)
- 从数据集设置中复制
dataset_id
Dataset ID也可以在带宽使用情况API响应的字段键中找到(例如,其中就是你的数据集ID)。
datav__ds_api_gd_xxxxxgd_xxxxxCommon Parameters
通用参数
| Parameter | Description | Example |
|---|---|---|
| Target URL to scrape | |
| Search keyword | |
| Limit number of results | |
| Filter by date (MM-DD-YYYY) | |
| Filter by date (MM-DD-YYYY) | |
| Sort order (Reddit) | |
| Response format | |
| 参数 | 描述 | 示例 |
|---|---|---|
| 要爬取的目标URL | |
| 搜索关键词 | |
| 结果数量限制 | |
| 日期筛选(格式为MM-DD-YYYY) | |
| 日期筛选(格式为MM-DD-YYYY) | |
| 排序方式(仅Reddit) | |
| 响应格式 | |
Rate Limits
速率限制
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns error
429
- 批量模式:最多100个并发请求
- 最大输入大小:每批1GB
- 超过限制会返回错误
429
Guidelines
使用指南
- Create datasets first: Use the Control Panel to create scraper datasets
- Use async for large jobs: Use for discovery and batch operations
/trigger - Use sync for small jobs: Use for single URL quick lookups
/scrape - Check status before download: Poll until status is
/progressready - Respect rate limits: Don't exceed 100 concurrent requests
- Date format: Use MM-DD-YYYY for date parameters
- 先创建数据集:使用控制面板创建爬取数据集
- 大型任务用异步:使用进行发现和批量操作
/trigger - 小型任务用同步:使用进行单个URL快速查询
/scrape - 下载前检查状态:轮询直到状态变为
/progressready - 遵守速率限制:不要超过100个并发请求
- 日期格式:日期参数使用MM-DD-YYYY格式