datacommons-client
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Commons Client
Data Commons 客户端
Overview
概述
Provides comprehensive access to the Data Commons Python API v2 for querying statistical observations, exploring the knowledge graph, and resolving entity identifiers. Data Commons aggregates data from census bureaus, health organizations, environmental agencies, and other authoritative sources into a unified knowledge graph.
提供对Data Commons Python API v2的全面访问,用于查询统计观测数据、探索知识图谱以及解析实体标识符。Data Commons将来自人口普查局、卫生组织、环境机构和其他权威来源的数据聚合到一个统一的知识图谱中。
Installation
安装
Install the Data Commons Python client with Pandas support:
bash
uv pip install "datacommons-client[Pandas]"For basic usage without Pandas:
bash
uv pip install datacommons-client安装支持Pandas的Data Commons Python客户端:
bash
uv pip install "datacommons-client[Pandas]"如果无需Pandas,仅基础使用:
bash
uv pip install datacommons-clientCore Capabilities
核心功能
The Data Commons API consists of three main endpoints, each detailed in dedicated reference files:
Data Commons API包含三个主要端点,每个端点的详细信息都在专门的参考文件中:
1. Observation Endpoint - Statistical Data Queries
1. 观测端点 - 统计数据查询
Query time-series statistical data for entities. See for comprehensive documentation.
references/observation.mdPrimary use cases:
- Retrieve population, economic, health, or environmental statistics
- Access historical time-series data for trend analysis
- Query data for hierarchies (all counties in a state, all countries in a region)
- Compare statistics across multiple entities
- Filter by data source for consistency
Common patterns:
python
from datacommons_client import DataCommonsClient
client = DataCommonsClient()查询实体的时间序列统计数据。有关完整文档,请参阅。
references/observation.md主要使用场景:
- 获取人口、经济、健康或环境统计数据
- 访问历史时间序列数据以进行趋势分析
- 查询层级数据(某一州的所有县、某一地区的所有国家)
- 比较多个实体的统计数据
- 按数据源筛选以保证数据一致性
常见用法示例:
python
from datacommons_client import DataCommonsClient
client = DataCommonsClient()Get latest population data
获取最新人口数据
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06"], # California
date="latest"
)
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06"], # 加利福尼亚州
date="latest"
)
Get time series
获取时间序列数据
response = client.observation.fetch(
variable_dcids=["UnemploymentRate_Person"],
entity_dcids=["country/USA"],
date="all"
)
response = client.observation.fetch(
variable_dcids=["UnemploymentRate_Person"],
entity_dcids=["country/USA"],
date="all"
)
Query by hierarchy
按层级查询
response = client.observation.fetch(
variable_dcids=["MedianIncome_Household"],
entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
date="2020"
)
undefinedresponse = client.observation.fetch(
variable_dcids=["MedianIncome_Household"],
entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
date="2020"
)
undefined2. Node Endpoint - Knowledge Graph Exploration
2. 节点端点 - 知识图谱探索
Explore entity relationships and properties within the knowledge graph. See for comprehensive documentation.
references/node.mdPrimary use cases:
- Discover available properties for entities
- Navigate geographic hierarchies (parent/child relationships)
- Retrieve entity names and metadata
- Explore connections between entities
- List all entity types in the graph
Common patterns:
python
undefined探索知识图谱中的实体关系和属性。有关完整文档,请参阅。
references/node.md主要使用场景:
- 发现实体的可用属性
- 导航地理层级(父/子关系)
- 获取实体名称和元数据
- 探索实体之间的关联
- 列出图谱中的所有实体类型
常见用法示例:
python
undefinedDiscover properties
发现属性
labels = client.node.fetch_property_labels(
node_dcids=["geoId/06"],
out=True
)
labels = client.node.fetch_property_labels(
node_dcids=["geoId/06"],
out=True
)
Navigate hierarchy
导航层级结构
children = client.node.fetch_place_children(
node_dcids=["country/USA"]
)
children = client.node.fetch_place_children(
node_dcids=["country/USA"]
)
Get entity names
获取实体名称
names = client.node.fetch_entity_names(
node_dcids=["geoId/06", "geoId/48"]
)
undefinednames = client.node.fetch_entity_names(
node_dcids=["geoId/06", "geoId/48"]
)
undefined3. Resolve Endpoint - Entity Identification
3. 解析端点 - 实体识别
Translate entity names, coordinates, or external IDs into Data Commons IDs (DCIDs). See for comprehensive documentation.
references/resolve.mdPrimary use cases:
- Convert place names to DCIDs for queries
- Resolve coordinates to places
- Map Wikidata IDs to Data Commons entities
- Handle ambiguous entity names
Common patterns:
python
undefined将实体名称、坐标或外部ID转换为Data Commons ID(DCIDs)。有关完整文档,请参阅。
references/resolve.md主要使用场景:
- 将地名转换为DCIDs以用于查询
- 将坐标解析为对应地点
- 将Wikidata ID映射到Data Commons实体
- 处理模糊的实体名称
常见用法示例:
python
undefinedResolve by name
通过名称解析
response = client.resolve.fetch_dcids_by_name(
names=["California", "Texas"],
entity_type="State"
)
response = client.resolve.fetch_dcids_by_name(
names=["California", "Texas"],
entity_type="State"
)
Resolve by coordinates
通过坐标解析
dcid = client.resolve.fetch_dcid_by_coordinates(
latitude=37.7749,
longitude=-122.4194
)
dcid = client.resolve.fetch_dcid_by_coordinates(
latitude=37.7749,
longitude=-122.4194
)
Resolve Wikidata IDs
解析Wikidata ID
response = client.resolve.fetch_dcids_by_wikidata_id(
wikidata_ids=["Q30", "Q99"]
)
undefinedresponse = client.resolve.fetch_dcids_by_wikidata_id(
wikidata_ids=["Q30", "Q99"]
)
undefinedTypical Workflow
典型工作流
Most Data Commons queries follow this pattern:
-
Resolve entities (if starting with names):python
resolve_response = client.resolve.fetch_dcids_by_name( names=["California", "Texas"] ) dcids = [r["candidates"][0]["dcid"] for r in resolve_response.to_dict().values() if r["candidates"]] -
Discover available variables (optional):python
variables = client.observation.fetch_available_statistical_variables( entity_dcids=dcids ) -
Query statistical data:python
response = client.observation.fetch( variable_dcids=["Count_Person", "UnemploymentRate_Person"], entity_dcids=dcids, date="latest" ) -
Process results:python
# As dictionary data = response.to_dict() # As Pandas DataFrame df = response.to_observations_as_records()
大多数Data Commons查询遵循以下模式:
-
解析实体(如果从名称开始):python
resolve_response = client.resolve.fetch_dcids_by_name( names=["California", "Texas"] ) dcids = [r["candidates"][0]["dcid"] for r in resolve_response.to_dict().values() if r["candidates"]] -
发现可用变量(可选):python
variables = client.observation.fetch_available_statistical_variables( entity_dcids=dcids ) -
查询统计数据:python
response = client.observation.fetch( variable_dcids=["Count_Person", "UnemploymentRate_Person"], entity_dcids=dcids, date="latest" ) -
处理结果:python
# 转换为字典 data = response.to_dict() # 转换为Pandas DataFrame df = response.to_observations_as_records()
Finding Statistical Variables
查找统计变量
Statistical variables use specific naming patterns in Data Commons:
Common variable patterns:
- - Total population
Count_Person - - Female population
Count_Person_Female - - Unemployment rate
UnemploymentRate_Person - - Median household income
Median_Income_Household - - Death count
Count_Death - - Median age
Median_Age_Person
Discovery methods:
python
undefinedData Commons中的统计变量使用特定的命名模式:
常见变量模式:
- - 总人口
Count_Person - - 女性人口
Count_Person_Female - - 失业率
UnemploymentRate_Person - - 家庭收入中位数
Median_Income_Household - - 死亡人数
Count_Death - - 年龄中位数
Median_Age_Person
发现方法:
python
undefinedCheck what variables are available for an entity
检查某实体的可用变量
available = client.observation.fetch_available_statistical_variables(
entity_dcids=["geoId/06"]
)
available = client.observation.fetch_available_statistical_variables(
entity_dcids=["geoId/06"]
)
Or explore via the web interface
或通过网页界面探索
undefinedundefinedWorking with Pandas
与Pandas配合使用
All observation responses integrate with Pandas:
python
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06", "geoId/48"],
date="all"
)所有观测响应都支持与Pandas集成:
python
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06", "geoId/48"],
date="all"
)Convert to DataFrame
转换为DataFrame
df = response.to_observations_as_records()
df = response.to_observations_as_records()
Columns: date, entity, variable, value
列:date, entity, variable, value
Reshape for analysis
重塑数据以用于分析
pivot = df.pivot_table(
values='value',
index='date',
columns='entity'
)
undefinedpivot = df.pivot_table(
values='value',
index='date',
columns='entity'
)
undefinedAPI Authentication
API认证
For datacommons.org (default):
- An API key is required
- Set via environment variable:
export DC_API_KEY="your_key" - Or pass when initializing:
client = DataCommonsClient(api_key="your_key") - Request keys at: https://apikeys.datacommons.org/
For custom Data Commons instances:
- No API key required
- Specify custom endpoint:
client = DataCommonsClient(url="https://custom.datacommons.org")
对于datacommons.org(默认):
- 需要API密钥
- 通过环境变量设置:
export DC_API_KEY="your_key" - 或初始化时传入:
client = DataCommonsClient(api_key="your_key") - 在以下地址申请密钥:https://apikeys.datacommons.org/
对于自定义Data Commons实例:
- 无需API密钥
- 指定自定义端点:
client = DataCommonsClient(url="https://custom.datacommons.org")
Reference Documentation
参考文档
Comprehensive documentation for each endpoint is available in the directory:
references/- : Complete Observation API documentation with all methods, parameters, response formats, and common use cases
references/observation.md - : Complete Node API documentation for graph exploration, property queries, and hierarchy navigation
references/node.md - : Complete Resolve API documentation for entity identification and DCID resolution
references/resolve.md - : Quickstart guide with end-to-end examples and common patterns
references/getting_started.md
每个端点的完整文档都在目录中:
references/- :完整的观测API文档,包含所有方法、参数、响应格式和常见使用场景
references/observation.md - :完整的节点API文档,用于图谱探索、属性查询和层级导航
references/node.md - :完整的解析API文档,用于实体识别和DCID解析
references/resolve.md - :快速入门指南,包含端到端示例和常见模式
references/getting_started.md
Additional Resources
其他资源
- Official Documentation: https://docs.datacommons.org/api/python/v2/
- Statistical Variable Explorer: https://datacommons.org/tools/statvar
- Data Commons Browser: https://datacommons.org/browser/
- GitHub Repository: https://github.com/datacommonsorg/api-python
- 官方文档:https://docs.datacommons.org/api/python/v2/
- 统计变量探索器:https://datacommons.org/tools/statvar
- Data Commons浏览器:https://datacommons.org/browser/
- GitHub仓库:https://github.com/datacommonsorg/api-python
Tips for Effective Use
高效使用技巧
- Always start with resolution: Convert names to DCIDs before querying data
- Use relation expressions for hierarchies: Query all children at once instead of individual queries
- Check data availability first: Use to see what's queryable
fetch_available_statistical_variables() - Leverage Pandas integration: Convert responses to DataFrames for analysis
- Cache resolutions: If querying the same entities repeatedly, store name→DCID mappings
- Filter by facet for consistency: Use to ensure data from the same source
filter_facet_domains - Read reference docs: Each endpoint has extensive documentation in the directory
references/
- 始终从解析开始:在查询数据前,将名称转换为DCIDs
- 使用关系表达式查询层级:一次性查询所有子实体,而非逐个查询
- 先检查数据可用性:使用查看可查询的内容
fetch_available_statistical_variables() - 利用Pandas集成:将响应转换为DataFrames以进行分析
- 缓存解析结果:如果重复查询相同实体,存储名称→DCID的映射
- 按维度筛选以保证一致性:使用确保数据来自同一来源
filter_facet_domains - 阅读参考文档:每个端点在目录中都有详细文档
references/