bright-data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bright Data Web Scraper API

Bright Data Web Scraper API

Use the Bright Data API via direct
curl
calls for social media scraping, web data extraction, and account management.
Official docs:
https://docs.brightdata.com/

通过直接调用
curl
使用Bright Data API,实现社交媒体爬取网页数据提取账户管理
官方文档:
https://docs.brightdata.com/

When to Use

使用场景

Use this skill when you need to:
  • Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
  • Extract web data - Posts, profiles, comments, engagement metrics
  • Monitor usage - Track bandwidth and request usage
  • Manage account - Check status and zones

当你需要以下操作时,可以使用本技能:
  • 爬取社交媒体 - Twitter/X、Reddit、YouTube、Instagram、TikTok、LinkedIn
  • 提取网页数据 - 帖子、个人主页、评论、互动指标
  • 监控使用情况 - 跟踪带宽和请求使用量
  • 管理账户 - 查看状态和可用区域

Prerequisites

前置条件

  1. Sign up at Bright Data
  2. Get your API key from Settings > Users
  3. Create a Web Scraper dataset in the Control Panel to get your
    dataset_id
bash
export BRIGHTDATA_API_KEY="your-api-key"
  1. Bright Data注册账号
  2. 设置 > 用户获取你的API密钥
  3. 控制面板中创建一个Web Scraper数据集,以获取你的
    dataset_id
bash
export BRIGHTDATA_API_KEY="your-api-key"

Base URL

基础URL

https://api.brightdata.com

Important: When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

https://api.brightdata.com

重要提示: 当在包含管道的命令中使用
$VAR
时,请将包含
$VAR
的命令用
bash -c '...'
包裹。由于Claude Code的一个bug,直接使用管道时环境变量会被静默清除。
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

Social Media Scraping

社交媒体爬取

Bright Data supports scraping these social media platforms:
PlatformProfilesPostsCommentsReels/Videos
Twitter/X--
Reddit--
YouTube-
Instagram
TikTok-
LinkedIn--

Bright Data支持爬取以下社交媒体平台:
平台个人主页帖子评论短视频/视频
Twitter/X--
Reddit--
YouTube-
Instagram
TikTok-
LinkedIn--

How to Use

使用方法

1. Trigger Scraping (Asynchronous)

1. 触发爬取(异步)

Trigger a data collection job and get a
snapshot_id
for later retrieval.
Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://twitter.com/username"},
  {"url": "https://twitter.com/username2"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Response:
json
{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

触发数据采集任务并获取
snapshot_id
,用于后续结果获取。
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://twitter.com/username"},
  {"url": "https://twitter.com/username2"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
响应:
json
{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

2. Trigger Scraping (Synchronous)

2. 触发爬取(同步)

Get results immediately in the response (for small requests).
Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

直接在响应中获取结果(适用于小型请求)。
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

3. Monitor Progress

3. 监控任务进度

Check the status of a scraping job (replace
<snapshot-id>
with your actual snapshot ID):
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
json
{
  "snapshot_id": "s_m4x7enmven8djfqak",
  "dataset_id": "gd_xxxxx",
  "status": "running"
}
Status values:
running
,
ready
,
failed

检查爬取任务的状态(将
<snapshot-id>
替换为你的实际快照ID):
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
响应:
json
{
  "snapshot_id": "s_m4x7enmven8djfqak",
  "dataset_id": "gd_xxxxx",
  "status": "running"
}
状态值:
running
(运行中)、
ready
(已就绪)、
failed
(失败)

4. Download Results

4. 下载结果

Once status is
ready
, download the collected data (replace
<snapshot-id>
with your actual snapshot ID):
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

当状态变为
ready
后,下载采集到的数据(将
<snapshot-id>
替换为你的实际快照ID):
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

5. List Snapshots

5. 列出所有快照

Get all your snapshots:
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'

获取你的所有快照:
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'

6. Cancel Snapshot

6. 取消快照

Cancel a running job (replace
<snapshot-id>
with your actual snapshot ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

取消正在运行的任务(将
<snapshot-id>
替换为你的实际快照ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Platform-Specific Examples

平台专属示例

Twitter/X - Scrape Profile

Twitter/X - 爬取个人主页

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://twitter.com/elonmusk"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
x_id
,
profile_name
,
biography
,
is_verified
,
followers
,
following
,
profile_image_link
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://twitter.com/elonmusk"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
x_id
profile_name
biography
is_verified
followers
following
profile_image_link

Twitter/X - Scrape Posts

Twitter/X - 爬取帖子

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://twitter.com/username/status/123456789"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
post_id
,
text
,
replies
,
likes
,
retweets
,
views
,
hashtags
,
media

写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://twitter.com/username/status/123456789"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
post_id
text
replies
likes
retweets
views
hashtags
media

Reddit - Scrape Subreddit Posts

Reddit - 爬取子版块帖子

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Parameters:
url
,
sort_by
(new/top/hot)
Returns:
post_id
,
title
,
description
,
num_comments
,
upvotes
,
date_posted
,
community
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
参数:
url
sort_by
(new/top/hot,即最新/最热/热门)
返回结果包含:
post_id
title
description
num_comments
upvotes
date_posted
community

Reddit - Scrape Comments

Reddit - 爬取评论

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
comment_id
,
user_posted
,
comment_text
,
upvotes
,
replies

写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
comment_id
user_posted
comment_text
upvotes
replies

YouTube - Scrape Video Info

YouTube - 爬取视频信息

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
title
,
views
,
likes
,
num_comments
,
video_length
,
transcript
,
channel_name
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
title
views
likes
num_comments
video_length
transcript
channel_name

YouTube - Search by Keyword

YouTube - 关键词搜索

Write to
/tmp/brightdata_request.json
:
json
[
  {"keyword": "artificial intelligence", "num_of_posts": 50}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
写入到
/tmp/brightdata_request.json
json
[
  {"keyword": "artificial intelligence", "num_of_posts": 50}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

YouTube - Scrape Comments

YouTube - 爬取评论

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
comment_text
,
likes
,
replies
,
username
,
date

写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
comment_text
likes
replies
username
date

Instagram - Scrape Profile

Instagram - 爬取个人主页

Write to
/tmp/brightdata_request.json
:
json
[
  {"url": "https://www.instagram.com/username"}
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
Returns:
followers
,
post_count
,
profile_name
,
is_verified
,
biography
写入到
/tmp/brightdata_request.json
json
[
  {"url": "https://www.instagram.com/username"}
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'
返回结果包含:
followers
post_count
profile_name
is_verified
biography

Instagram - Scrape Posts

Instagram - 爬取帖子

Write to
/tmp/brightdata_request.json
:
json
[
  {
    "url": "https://www.instagram.com/username",
    "num_of_posts": 20,
    "start_date": "01-01-2024",
    "end_date": "12-31-2024"
  }
]
Then run (replace
<dataset-id>
with your actual dataset ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

写入到
/tmp/brightdata_request.json
json
[
  {
    "url": "https://www.instagram.com/username",
    "num_of_posts": 20,
    "start_date": "01-01-2024",
    "end_date": "12-31-2024"
  }
]
然后运行(将
<dataset-id>
替换为你的实际数据集ID):
bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Account Management

账户管理

Check Account Status

检查账户状态

bash
bash -c 'curl -s "https://api.brightdata.com/status" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
json
{
  "status": "active",
  "customer": "hl_xxxxxxxx",
  "can_make_requests": true,
  "ip": "x.x.x.x"
}
bash
bash -c 'curl -s "https://api.brightdata.com/status" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
响应:
json
{
  "status": "active",
  "customer": "hl_xxxxxxxx",
  "can_make_requests": true,
  "ip": "x.x.x.x"
}

Get Active Zones

获取可用区域

bash
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'
bash
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'

Get Bandwidth Usage

获取带宽使用情况

bash
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

bash
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Getting Dataset IDs

获取Dataset ID

To use the scraping features, you need a
dataset_id
:
  1. Go to Bright Data Control Panel
  2. Create a new Web Scraper dataset or select an existing one
  3. Choose the platform (Twitter, Reddit, YouTube, etc.)
  4. Copy the
    dataset_id
    from the dataset settings
Dataset IDs can also be found in the bandwidth usage API response under the
data
field keys (e.g.,
v__ds_api_gd_xxxxx
where
gd_xxxxx
is your dataset ID).

要使用爬取功能,你需要一个
dataset_id
  1. 访问Bright Data控制面板
  2. 创建一个新的Web Scraper数据集,或选择一个现有的数据集
  3. 选择目标平台(Twitter、Reddit、YouTube等)
  4. 从数据集设置中复制
    dataset_id
Dataset ID也可以在带宽使用情况API响应的
data
字段键中找到(例如
v__ds_api_gd_xxxxx
,其中
gd_xxxxx
就是你的数据集ID)。

Common Parameters

通用参数

ParameterDescriptionExample
url
Target URL to scrape
https://twitter.com/user
keyword
Search keyword
"artificial intelligence"
num_of_posts
Limit number of results
50
start_date
Filter by date (MM-DD-YYYY)
"01-01-2024"
end_date
Filter by date (MM-DD-YYYY)
"12-31-2024"
sort_by
Sort order (Reddit)
new
,
top
,
hot
format
Response format
json
,
csv

参数描述示例
url
要爬取的目标URL
https://twitter.com/user
keyword
搜索关键词
"artificial intelligence"
num_of_posts
结果数量限制
50
start_date
日期筛选(格式为MM-DD-YYYY)
"01-01-2024"
end_date
日期筛选(格式为MM-DD-YYYY)
"12-31-2024"
sort_by
排序方式(仅Reddit)
new
,
top
,
hot
format
响应格式
json
,
csv

Rate Limits

速率限制

  • Batch mode: up to 100 concurrent requests
  • Maximum input size: 1GB per batch
  • Exceeding limits returns
    429
    error

  • 批量模式:最多100个并发请求
  • 最大输入大小:每批1GB
  • 超过限制会返回
    429
    错误

Guidelines

使用指南

  1. Create datasets first: Use the Control Panel to create scraper datasets
  2. Use async for large jobs: Use
    /trigger
    for discovery and batch operations
  3. Use sync for small jobs: Use
    /scrape
    for single URL quick lookups
  4. Check status before download: Poll
    /progress
    until status is
    ready
  5. Respect rate limits: Don't exceed 100 concurrent requests
  6. Date format: Use MM-DD-YYYY for date parameters
  1. 先创建数据集:使用控制面板创建爬取数据集
  2. 大型任务用异步:使用
    /trigger
    进行发现和批量操作
  3. 小型任务用同步:使用
    /scrape
    进行单个URL快速查询
  4. 下载前检查状态:轮询
    /progress
    直到状态变为
    ready
  5. 遵守速率限制:不要超过100个并发请求
  6. 日期格式:日期参数使用MM-DD-YYYY格式