bright-data
Original:🇺🇸 English
Translated
Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.
1installs
Sourcevm0-ai/vm0-skills
Added on
NPX Install
npx skill4agent add vm0-ai/vm0-skills bright-dataTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Bright Data Web Scraper API
Use the Bright Data API via direct calls for social media scraping, web data extraction, and account management.
curlOfficial docs:https://docs.brightdata.com/
When to Use
Use this skill when you need to:
- Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data - Posts, profiles, comments, engagement metrics
- Monitor usage - Track bandwidth and request usage
- Manage account - Check status and zones
Prerequisites
- Sign up at Bright Data
- Get your API key from Settings > Users
- Create a Web Scraper dataset in the Control Panel to get your
dataset_id
bash
export BRIGHTDATA_API_KEY="your-api-key"Base URL
https://api.brightdata.comImportant: When usingin a command that pipes to another command, wrap the command containing$VARin$VAR. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c '...'bashbash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
Social Media Scraping
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
How to Use
1. Trigger Scraping (Asynchronous)
Trigger a data collection job and get a for later retrieval.
snapshot_idWrite to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Response:
json
{
"snapshot_id": "s_m4x7enmven8djfqak"
}2. Trigger Scraping (Synchronous)
Get results immediately in the response (for small requests).
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'3. Monitor Progress
Check the status of a scraping job (replace with your actual snapshot ID):
<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Response:
json
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}Status values: , ,
runningreadyfailed4. Download Results
Once status is , download the collected data (replace with your actual snapshot ID):
ready<snapshot-id>bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'5. List Snapshots
Get all your snapshots:
bash
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'6. Cancel Snapshot
Cancel a running job (replace with your actual snapshot ID):
<snapshot-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Platform-Specific Examples
Twitter/X - Scrape Profile
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/elonmusk"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , ,
x_idprofile_namebiographyis_verifiedfollowersfollowingprofile_image_linkTwitter/X - Scrape Posts
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://twitter.com/username/status/123456789"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , , ,
post_idtextreplieslikesretweetsviewshashtagsmediaReddit - Scrape Subreddit Posts
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Parameters: , (new/top/hot)
urlsort_byReturns: , , , , , ,
post_idtitledescriptionnum_commentsupvotesdate_postedcommunityReddit - Scrape Comments
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
comment_iduser_postedcomment_textupvotesrepliesYouTube - Scrape Video Info
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , , , ,
titleviewslikesnum_commentsvideo_lengthtranscriptchannel_nameYouTube - Search by Keyword
Write to :
/tmp/brightdata_request.jsonjson
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'YouTube - Scrape Comments
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
comment_textlikesrepliesusernamedateInstagram - Scrape Profile
Write to :
/tmp/brightdata_request.jsonjson
[
{"url": "https://www.instagram.com/username"}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Returns: , , , ,
followerspost_countprofile_nameis_verifiedbiographyInstagram - Scrape Posts
Write to :
/tmp/brightdata_request.jsonjson
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]Then run (replace with your actual dataset ID):
<dataset-id>bash
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'Account Management
Check Account Status
bash
bash -c 'curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Response:
json
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}Get Active Zones
bash
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'Get Bandwidth Usage
bash
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'Getting Dataset IDs
To use the scraping features, you need a :
dataset_id- Go to Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the from the dataset settings
dataset_id
Dataset IDs can also be found in the bandwidth usage API response under the field keys (e.g., where is your dataset ID).
datav__ds_api_gd_xxxxxgd_xxxxxCommon Parameters
| Parameter | Description | Example |
|---|---|---|
| Target URL to scrape | |
| Search keyword | |
| Limit number of results | |
| Filter by date (MM-DD-YYYY) | |
| Filter by date (MM-DD-YYYY) | |
| Sort order (Reddit) | |
| Response format | |
Rate Limits
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns error
429
Guidelines
- Create datasets first: Use the Control Panel to create scraper datasets
- Use async for large jobs: Use for discovery and batch operations
/trigger - Use sync for small jobs: Use for single URL quick lookups
/scrape - Check status before download: Poll until status is
/progressready - Respect rate limits: Don't exceed 100 concurrent requests
- Date format: Use MM-DD-YYYY for date parameters