bright-data

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Bright Data Web Scraper API

Use the Bright Data API via direct

curl

calls for social media scraping, web data extraction, and account management.

Official docs:
https://docs.brightdata.com/

通过直接调用

curl

使用Bright Data API，实现社交媒体爬取、网页数据提取和账户管理。

官方文档：
https://docs.brightdata.com/

When to Use

使用场景

Use this skill when you need to:

Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
Extract web data - Posts, profiles, comments, engagement metrics
Monitor usage - Track bandwidth and request usage
Manage account - Check status and zones

当你需要以下操作时，可以使用本技能：

爬取社交媒体 - Twitter/X、Reddit、YouTube、Instagram、TikTok、LinkedIn
提取网页数据 - 帖子、个人主页、评论、互动指标
监控使用情况 - 跟踪带宽和请求使用量
管理账户 - 查看状态和可用区域

Prerequisites

前置条件

Sign up at Bright Data
Get your API key from Settings > Users
Create a Web Scraper dataset in the Control Panel to get your
```
dataset_id
```

bash

export BRIGHTDATA_API_KEY="your-api-key"

在Bright Data注册账号
从设置 > 用户获取你的API密钥
在控制面板中创建一个Web Scraper数据集，以获取你的
```
dataset_id
```

bash

export BRIGHTDATA_API_KEY="your-api-key"

Base URL

基础URL

https://api.brightdata.com

Important: When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

https://api.brightdata.com

重要提示： 当在包含管道的命令中使用
$VAR
时，请将包含
$VAR
的命令用
bash -c '...'
包裹。由于Claude Code的一个bug，直接使用管道时环境变量会被静默清除。
bash
bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

Social Media Scraping

社交媒体爬取

Bright Data supports scraping these social media platforms:

Platform	Profiles	Posts	Comments	Reels/Videos
Twitter/X	✅	✅	-	-
Reddit	-	✅	✅	-
YouTube	✅	✅	✅	-
Instagram	✅	✅	✅	✅
TikTok	✅	✅	✅	-
LinkedIn	✅	✅	-	-

Bright Data支持爬取以下社交媒体平台：

平台	个人主页	帖子	评论	短视频/视频
Twitter/X	✅	✅	-	-
Reddit	-	✅	✅	-
YouTube	✅	✅	✅	-
Instagram	✅	✅	✅	✅
TikTok	✅	✅	✅	-
LinkedIn	✅	✅	-	-

How to Use

使用方法

1. Trigger Scraping (Asynchronous)

1. 触发爬取（异步）

Trigger a data collection job and get a

snapshot_id

for later retrieval.

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://twitter.com/username"},
  {"url": "https://twitter.com/username2"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Response:

json

{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

触发数据采集任务并获取

snapshot_id

，用于后续结果获取。

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://twitter.com/username"},
  {"url": "https://twitter.com/username2"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

响应：

json

{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

2. Trigger Scraping (Synchronous)

2. 触发爬取（同步）

Get results immediately in the response (for small requests).

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

直接在响应中获取结果（适用于小型请求）。

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

3. Monitor Progress

3. 监控任务进度

Check the status of a scraping job (replace

<snapshot-id>

with your actual snapshot ID):

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Response:

json

{
  "snapshot_id": "s_m4x7enmven8djfqak",
  "dataset_id": "gd_xxxxx",
  "status": "running"
}

Status values:

running

ready

failed

检查爬取任务的状态（将

<snapshot-id>

替换为你的实际快照ID）：

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

响应：

json

{
  "snapshot_id": "s_m4x7enmven8djfqak",
  "dataset_id": "gd_xxxxx",
  "status": "running"
}

状态值：

running

（运行中）、

ready

（已就绪）、

failed

（失败）

4. Download Results

4. 下载结果

Once status is

ready

, download the collected data (replace

<snapshot-id>

with your actual snapshot ID):

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

当状态变为

ready

后，下载采集到的数据（将

<snapshot-id>

替换为你的实际快照ID）：

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

5. List Snapshots

5. 列出所有快照

Get all your snapshots:

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'

获取你的所有快照：

bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'

6. Cancel Snapshot

6. 取消快照

Cancel a running job (replace

<snapshot-id>

with your actual snapshot ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

取消正在运行的任务（将

<snapshot-id>

替换为你的实际快照ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Platform-Specific Examples

平台专属示例

Twitter/X - Scrape Profile

Twitter/X - 爬取个人主页

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://twitter.com/elonmusk"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

x_id

profile_name

biography

is_verified

followers

following

profile_image_link

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://twitter.com/elonmusk"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

x_id

、

profile_name

、

biography

、

is_verified

、

followers

、

following

、

profile_image_link

Twitter/X - Scrape Posts

Twitter/X - 爬取帖子

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://twitter.com/username/status/123456789"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

post_id

text

replies

likes

retweets

views

hashtags

media

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://twitter.com/username/status/123456789"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

post_id

、

text

、

replies

、

likes

、

retweets

、

views

、

hashtags

、

media

Reddit - Scrape Subreddit Posts

Reddit - 爬取子版块帖子

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Parameters:

url

sort_by

(new/top/hot)

Returns:

post_id

title

description

num_comments

upvotes

date_posted

community

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

参数：

url

、

sort_by

（new/top/hot，即最新/最热/热门）

返回结果包含：

post_id

、

title

、

description

、

num_comments

、

upvotes

、

date_posted

、

community

Reddit - Scrape Comments

Reddit - 爬取评论

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

comment_id

user_posted

comment_text

upvotes

replies

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

comment_id

、

user_posted

、

comment_text

、

upvotes

、

replies

YouTube - Scrape Video Info

YouTube - 爬取视频信息

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

title

views

likes

num_comments

video_length

transcript

channel_name

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

title

、

views

、

likes

、

num_comments

、

video_length

、

transcript

、

channel_name

YouTube - Search by Keyword

YouTube - 关键词搜索

Write to

/tmp/brightdata_request.json

json

[
  {"keyword": "artificial intelligence", "num_of_posts": 50}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

写入到

/tmp/brightdata_request.json

：

json

[
  {"keyword": "artificial intelligence", "num_of_posts": 50}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

YouTube - Scrape Comments

YouTube - 爬取评论

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

comment_text

likes

replies

username

date

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

comment_text

、

likes

、

replies

、

username

、

date

Instagram - Scrape Profile

Instagram - 爬取个人主页

Write to

/tmp/brightdata_request.json

json

[
  {"url": "https://www.instagram.com/username"}
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Returns:

followers

post_count

profile_name

is_verified

biography

写入到

/tmp/brightdata_request.json

：

json

[
  {"url": "https://www.instagram.com/username"}
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

返回结果包含：

followers

、

post_count

、

profile_name

、

is_verified

、

biography

Instagram - Scrape Posts

Instagram - 爬取帖子

Write to

/tmp/brightdata_request.json

json

[
  {
    "url": "https://www.instagram.com/username",
    "num_of_posts": 20,
    "start_date": "01-01-2024",
    "end_date": "12-31-2024"
  }
]

Then run (replace

<dataset-id>

with your actual dataset ID):

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

写入到

/tmp/brightdata_request.json

：

json

[
  {
    "url": "https://www.instagram.com/username",
    "num_of_posts": 20,
    "start_date": "01-01-2024",
    "end_date": "12-31-2024"
  }
]

然后运行（将

<dataset-id>

替换为你的实际数据集ID）：

bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/brightdata_request.json'

Account Management

账户管理

Check Account Status

检查账户状态

bash

bash -c 'curl -s "https://api.brightdata.com/status" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Response:

json

{
  "status": "active",
  "customer": "hl_xxxxxxxx",
  "can_make_requests": true,
  "ip": "x.x.x.x"
}

bash

bash -c 'curl -s "https://api.brightdata.com/status" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

响应：

json

{
  "status": "active",
  "customer": "hl_xxxxxxxx",
  "can_make_requests": true,
  "ip": "x.x.x.x"
}

Get Active Zones

获取可用区域

bash

bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'

bash

bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'

Get Bandwidth Usage

获取带宽使用情况

bash

bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

bash

bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
  -H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

Getting Dataset IDs

获取Dataset ID

To use the scraping features, you need a

dataset_id

Go to Bright Data Control Panel
Create a new Web Scraper dataset or select an existing one
Choose the platform (Twitter, Reddit, YouTube, etc.)
Copy the
```
dataset_id
```
from the dataset settings

Dataset IDs can also be found in the bandwidth usage API response under the

data

field keys (e.g.,

v__ds_api_gd_xxxxx

where

gd_xxxxx

is your dataset ID).

要使用爬取功能，你需要一个

dataset_id

：

访问Bright Data控制面板
创建一个新的Web Scraper数据集，或选择一个现有的数据集
选择目标平台（Twitter、Reddit、YouTube等）
从数据集设置中复制
```
dataset_id
```

Dataset ID也可以在带宽使用情况API响应的

data

字段键中找到（例如

v__ds_api_gd_xxxxx

，其中

gd_xxxxx

就是你的数据集ID）。

Common Parameters

通用参数

Parameter	Description	Example
`url`	Target URL to scrape	`https://twitter.com/user`
`keyword`	Search keyword	`"artificial intelligence"`
`num_of_posts`	Limit number of results	`50`
`start_date`	Filter by date (MM-DD-YYYY)	`"01-01-2024"`
`end_date`	Filter by date (MM-DD-YYYY)	`"12-31-2024"`
`sort_by`	Sort order (Reddit)	`new` , `top` , `hot`
`format`	Response format	`json` , `csv`

参数	描述	示例
`url`	要爬取的目标URL	`https://twitter.com/user`
`keyword`	搜索关键词	`"artificial intelligence"`
`num_of_posts`	结果数量限制	`50`
`start_date`	日期筛选（格式为MM-DD-YYYY）	`"01-01-2024"`
`end_date`	日期筛选（格式为MM-DD-YYYY）	`"12-31-2024"`
`sort_by`	排序方式（仅Reddit）	`new` , `top` , `hot`
`format`	响应格式	`json` , `csv`

Rate Limits

速率限制

Batch mode: up to 100 concurrent requests
Maximum input size: 1GB per batch
Exceeding limits returns
```
429
```
error

批量模式：最多100个并发请求
最大输入大小：每批1GB
超过限制会返回
```
429
```
错误

Guidelines

使用指南

Create datasets first: Use the Control Panel to create scraper datasets
Use async for large jobs: Use
```
/trigger
```
for discovery and batch operations
Use sync for small jobs: Use
```
/scrape
```
for single URL quick lookups
Check status before download: Poll
```
/progress
```
until status is
```
ready
```
Respect rate limits: Don't exceed 100 concurrent requests
Date format: Use MM-DD-YYYY for date parameters

先创建数据集：使用控制面板创建爬取数据集
大型任务用异步：使用
```
/trigger
```
进行发现和批量操作
小型任务用同步：使用
```
/scrape
```
进行单个URL快速查询
下载前检查状态：轮询
```
/progress
```
直到状态变为
```
ready
```
遵守速率限制：不要超过100个并发请求
日期格式：日期参数使用MM-DD-YYYY格式