data-feeds

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bright Data - Structured Data Feeds

Bright Data - 结构化数据馈送

Extract structured data from major websites with automatic parsing. No scraping logic needed - just provide a URL and get clean JSON data.
从主流网站提取结构化数据,无需编写爬取逻辑——只需提供URL,即可获取整洁的JSON数据。

Setup

配置

Environment Variables (Required)

必需的环境变量

bash
export BRIGHTDATA_API_KEY="your-api-key"
bash
export BRIGHTDATA_API_KEY="your-api-key"

Optional

可选配置

bash
export BRIGHTDATA_POLLING_TIMEOUT=600  # Max seconds to wait (default: 600)
Get your API key from Bright Data Dashboard.
bash
export BRIGHTDATA_POLLING_TIMEOUT=600  # 最长等待秒数(默认值:600)
Bright Data Dashboard获取你的API密钥。

Usage

使用方法

bash
bash scripts/datasets.sh <dataset_type> <url> [additional_params...]
bash
bash scripts/datasets.sh <dataset_type> <url> [additional_params...]

Available Datasets

可用数据集

E-Commerce

电商平台

DatasetCommandDescription
Amazon Product
datasets.sh amazon_product <url>
Product details, pricing, ratings
Amazon Reviews
datasets.sh amazon_product_reviews <url>
Customer reviews for a product
Amazon Search
datasets.sh amazon_product_search <keyword> <domain_url>
Search results
Walmart Product
datasets.sh walmart_product <url>
Product details from Walmart
Walmart Seller
datasets.sh walmart_seller <url>
Seller information
eBay Product
datasets.sh ebay_product <url>
eBay listing details
Home Depot
datasets.sh homedepot_products <url>
Home Depot product data
Zara
datasets.sh zara_products <url>
Zara product details
Etsy
datasets.sh etsy_products <url>
Etsy listing data
Best Buy
datasets.sh bestbuy_products <url>
Best Buy product info
数据集命令描述
亚马逊产品
datasets.sh amazon_product <url>
产品详情、定价、评分
亚马逊评论
datasets.sh amazon_product_reviews <url>
产品的用户评论
亚马逊搜索
datasets.sh amazon_product_search <keyword> <domain_url>
搜索结果
沃尔玛产品
datasets.sh walmart_product <url>
沃尔玛产品详情
沃尔玛卖家
datasets.sh walmart_seller <url>
卖家信息
eBay产品
datasets.sh ebay_product <url>
eBay商品列表详情
家得宝
datasets.sh homedepot_products <url>
家得宝产品数据
Zara
datasets.sh zara_products <url>
Zara产品详情
Etsy
datasets.sh etsy_products <url>
Etsy商品列表数据
百思买
datasets.sh bestbuy_products <url>
百思买产品信息

Professional Networks

职业社交网络

DatasetCommandDescription
LinkedIn Person
datasets.sh linkedin_person_profile <url>
Profile data (experience, skills)
LinkedIn Company
datasets.sh linkedin_company_profile <url>
Company page data
LinkedIn Jobs
datasets.sh linkedin_job_listings <url>
Job posting details
LinkedIn Posts
datasets.sh linkedin_posts <url>
Post content and engagement
LinkedIn Search
datasets.sh linkedin_people_search <url> <first> <last>
Find people
Crunchbase
datasets.sh crunchbase_company <url>
Company funding, employees
ZoomInfo
datasets.sh zoominfo_company_profile <url>
Company profile data
数据集命令描述
领英个人资料
datasets.sh linkedin_person_profile <url>
个人资料数据(经历、技能)
领英公司主页
datasets.sh linkedin_company_profile <url>
公司主页数据
领英职位列表
datasets.sh linkedin_job_listings <url>
职位发布详情
领英帖子
datasets.sh linkedin_posts <url>
帖子内容及互动数据
领英人员搜索
datasets.sh linkedin_people_search <url> <first> <last>
人员查找
Crunchbase
datasets.sh crunchbase_company <url>
公司融资、员工数据
ZoomInfo
datasets.sh zoominfo_company_profile <url>
公司资料数据

Instagram

Instagram

DatasetCommandDescription
Profiles
datasets.sh instagram_profiles <url>
Bio, followers, following
Posts
datasets.sh instagram_posts <url>
Post details, likes, captions
Reels
datasets.sh instagram_reels <url>
Reel data and metrics
Comments
datasets.sh instagram_comments <url>
Post comments
数据集命令描述
个人主页
datasets.sh instagram_profiles <url>
简介、粉丝数、关注数
帖子
datasets.sh instagram_posts <url>
帖子详情、点赞数、文案
Reels短视频
datasets.sh instagram_reels <url>
Reels数据及指标
评论
datasets.sh instagram_comments <url>
帖子评论

Facebook

Facebook

DatasetCommandDescription
Posts
datasets.sh facebook_posts <url>
Post content and reactions
Marketplace
datasets.sh facebook_marketplace_listings <url>
Listing details
Reviews
datasets.sh facebook_company_reviews <url> [num]
Company reviews
Events
datasets.sh facebook_events <url>
Event details
数据集命令描述
帖子
datasets.sh facebook_posts <url>
帖子内容及互动反应
Marketplace集市
datasets.sh facebook_marketplace_listings <url>
商品列表详情
公司评论
datasets.sh facebook_company_reviews <url> [num]
公司评论
活动
datasets.sh facebook_events <url>
活动详情

TikTok

TikTok

DatasetCommandDescription
Profiles
datasets.sh tiktok_profiles <url>
Creator profile data
Posts
datasets.sh tiktok_posts <url>
Video details and metrics
Shop
datasets.sh tiktok_shop <url>
TikTok Shop product data
Comments
datasets.sh tiktok_comments <url>
Video comments
数据集命令描述
创作者主页
datasets.sh tiktok_profiles <url>
创作者主页数据
视频帖子
datasets.sh tiktok_posts <url>
视频详情及指标
TikTok店铺
datasets.sh tiktok_shop <url>
TikTok店铺产品数据
评论
datasets.sh tiktok_comments <url>
视频评论

YouTube

YouTube

DatasetCommandDescription
Profiles
datasets.sh youtube_profiles <url>
Channel data
Videos
datasets.sh youtube_videos <url>
Video details and stats
Comments
datasets.sh youtube_comments <url> [num]
Video comments (default: 10)
数据集命令描述
频道主页
datasets.sh youtube_profiles <url>
频道数据
视频
datasets.sh youtube_videos <url>
视频详情及统计数据
评论
datasets.sh youtube_comments <url> [num]
视频评论(默认:10条)

Other Social

其他社交平台

DatasetCommandDescription
X (Twitter)
datasets.sh x_posts <url>
Tweet data
Reddit
datasets.sh reddit_posts <url>
Post and comment data
数据集命令描述
X(原Twitter)
datasets.sh x_posts <url>
推文数据
Reddit
datasets.sh reddit_posts <url>
帖子及评论数据

Google Services

Google服务

DatasetCommandDescription
Maps Reviews
datasets.sh google_maps_reviews <url> [days]
Business reviews (default: 3 days)
Shopping
datasets.sh google_shopping <url>
Product comparison data
Play Store
datasets.sh google_play_store <url>
App details and reviews
数据集命令描述
地图评论
datasets.sh google_maps_reviews <url> [days]
商家评论(默认:3天内)
购物平台
datasets.sh google_shopping <url>
产品对比数据
Play商店
datasets.sh google_play_store <url>
应用详情及评论

Other

其他类别

DatasetCommandDescription
Apple App Store
datasets.sh apple_app_store <url>
iOS app data
Reuters News
datasets.sh reuter_news <url>
News article content
GitHub
datasets.sh github_repository_file <url>
Repository file data
Yahoo Finance
datasets.sh yahoo_finance_business <url>
Stock and company data
Zillow
datasets.sh zillow_properties_listing <url>
Property listing details
Booking.com
datasets.sh booking_hotel_listings <url>
Hotel listing data
数据集命令描述
Apple App Store
datasets.sh apple_app_store <url>
iOS应用数据
路透社新闻
datasets.sh reuter_news <url>
新闻文章内容
GitHub
datasets.sh github_repository_file <url>
仓库文件数据
雅虎财经
datasets.sh yahoo_finance_business <url>
股票及公司数据
Zillow房产
datasets.sh zillow_properties_listing <url>
房产列表详情
Booking.com
datasets.sh booking_hotel_listings <url>
酒店列表数据

Examples

示例

Get LinkedIn Profile

获取领英个人资料

bash
bash scripts/datasets.sh linkedin_person_profile "https://www.linkedin.com/in/satyanadella/"
bash
bash scripts/datasets.sh linkedin_person_profile "https://www.linkedin.com/in/satyanadella/"

Get Amazon Product

获取亚马逊产品数据

bash
bash scripts/datasets.sh amazon_product "https://www.amazon.com/dp/B09V3KXJPB"
bash
bash scripts/datasets.sh amazon_product "https://www.amazon.com/dp/B09V3KXJPB"

Get Instagram Profile

获取Instagram主页数据

bash
bash scripts/datasets.sh instagram_profiles "https://www.instagram.com/natgeo/"
bash
bash scripts/datasets.sh instagram_profiles "https://www.instagram.com/natgeo/"

Get YouTube Comments

获取YouTube评论

bash
bash scripts/datasets.sh youtube_comments "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 20
bash
bash scripts/datasets.sh youtube_comments "https://www.youtube.com/watch?v=dQw4w9WgXcQ" 20

Search Amazon

亚马逊搜索

bash
bash scripts/datasets.sh amazon_product_search "wireless headphones" "https://www.amazon.com"
bash
bash scripts/datasets.sh amazon_product_search "wireless headphones" "https://www.amazon.com"

Output Format

输出格式

Returns structured JSON with website-specific fields. Example for LinkedIn profile:
json
{
  "name": "Satya Nadella",
  "headline": "Chairman and CEO at Microsoft",
  "location": "Greater Seattle Area",
  "connections": "500+",
  "experience": [...],
  "education": [...],
  "skills": [...]
}
返回包含网站专属字段的结构化JSON。以下是领英个人资料的示例:
json
{
  "name": "Satya Nadella",
  "headline": "Chairman and CEO at Microsoft",
  "location": "Greater Seattle Area",
  "connections": "500+",
  "experience": [...],
  "education": [...],
  "skills": [...]
}

How It Works

工作原理

  1. Trigger: Sends URL to Bright Data's Web Data API
  2. Poll: Waits for data collection to complete (checks every second)
  3. Return: Outputs structured JSON when ready
The polling mechanism handles rate limits and ensures data quality by waiting for full extraction.
  1. 触发:将URL发送至Bright Data的Web Data API
  2. 轮询:等待数据收集完成(每秒检查一次)
  3. 返回:数据准备就绪后输出结构化JSON
轮询机制可处理速率限制,并通过等待完整提取来确保数据质量。

Advanced: Direct Fetch

进阶:直接获取

For custom dataset IDs or advanced use cases:
bash
bash scripts/fetch.sh <dataset_id> '<json_input>'
Example:
bash
bash scripts/fetch.sh gd_l1viktl72bvl7bjuj0 '{"url":"https://linkedin.com/in/someone"}'
针对自定义数据集ID或进阶使用场景:
bash
bash scripts/fetch.sh <dataset_id> '<json_input>'
示例:
bash
bash scripts/fetch.sh gd_l1viktl72bvl7bjuj0 '{"url":"https://linkedin.com/in/someone"}'