datacommons-client

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data Commons Client

Data Commons 客户端

Overview

概述

Provides comprehensive access to the Data Commons Python API v2 for querying statistical observations, exploring the knowledge graph, and resolving entity identifiers. Data Commons aggregates data from census bureaus, health organizations, environmental agencies, and other authoritative sources into a unified knowledge graph.

提供对Data Commons Python API v2的全面访问，用于查询统计观测数据、探索知识图谱以及解析实体标识符。Data Commons将来自人口普查局、卫生组织、环境机构和其他权威来源的数据聚合到一个统一的知识图谱中。

Installation

安装

Install the Data Commons Python client with Pandas support:

bash

uv pip install "datacommons-client[Pandas]"

For basic usage without Pandas:

bash

uv pip install datacommons-client

安装支持Pandas的Data Commons Python客户端：

bash

uv pip install "datacommons-client[Pandas]"

如果无需Pandas，仅基础使用：

bash

uv pip install datacommons-client

Core Capabilities

核心功能

The Data Commons API consists of three main endpoints, each detailed in dedicated reference files:

Data Commons API包含三个主要端点，每个端点的详细信息都在专门的参考文件中：

1. Observation Endpoint - Statistical Data Queries

1. 观测端点 - 统计数据查询

Query time-series statistical data for entities. See

references/observation.md

for comprehensive documentation.

Primary use cases:

Retrieve population, economic, health, or environmental statistics
Access historical time-series data for trend analysis
Query data for hierarchies (all counties in a state, all countries in a region)
Compare statistics across multiple entities
Filter by data source for consistency

Common patterns:

python

from datacommons_client import DataCommonsClient

client = DataCommonsClient()

查询实体的时间序列统计数据。有关完整文档，请参阅

references/observation.md

。

主要使用场景：

获取人口、经济、健康或环境统计数据
访问历史时间序列数据以进行趋势分析
查询层级数据（某一州的所有县、某一地区的所有国家）
比较多个实体的统计数据
按数据源筛选以保证数据一致性

常见用法示例：

python

from datacommons_client import DataCommonsClient

client = DataCommonsClient()

Get latest population data

获取最新人口数据

response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06"], # California date="latest" )

response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06"], # 加利福尼亚州 date="latest" )

Get time series

获取时间序列数据

response = client.observation.fetch( variable_dcids=["UnemploymentRate_Person"], entity_dcids=["country/USA"], date="all" )

Query by hierarchy

按层级查询

response = client.observation.fetch( variable_dcids=["MedianIncome_Household"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" )

undefined

response = client.observation.fetch( variable_dcids=["MedianIncome_Household"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" )

undefined

2. Node Endpoint - Knowledge Graph Exploration

2. 节点端点 - 知识图谱探索

Explore entity relationships and properties within the knowledge graph. See

references/node.md

for comprehensive documentation.

Primary use cases:

Discover available properties for entities
Navigate geographic hierarchies (parent/child relationships)
Retrieve entity names and metadata
Explore connections between entities
List all entity types in the graph

Common patterns:

python

undefined

探索知识图谱中的实体关系和属性。有关完整文档，请参阅

references/node.md

。

主要使用场景：

发现实体的可用属性
导航地理层级（父/子关系）
获取实体名称和元数据
探索实体之间的关联
列出图谱中的所有实体类型

常见用法示例：

python

undefined

Discover properties

发现属性

labels = client.node.fetch_property_labels( node_dcids=["geoId/06"], out=True )

Navigate hierarchy

导航层级结构

children = client.node.fetch_place_children( node_dcids=["country/USA"] )

Get entity names

获取实体名称

names = client.node.fetch_entity_names( node_dcids=["geoId/06", "geoId/48"] )

undefined

names = client.node.fetch_entity_names( node_dcids=["geoId/06", "geoId/48"] )

undefined

3. Resolve Endpoint - Entity Identification

3. 解析端点 - 实体识别

Translate entity names, coordinates, or external IDs into Data Commons IDs (DCIDs). See

references/resolve.md

for comprehensive documentation.

Primary use cases:

Convert place names to DCIDs for queries
Resolve coordinates to places
Map Wikidata IDs to Data Commons entities
Handle ambiguous entity names

Common patterns:

python

undefined

将实体名称、坐标或外部ID转换为Data Commons ID（DCIDs）。有关完整文档，请参阅

references/resolve.md

。

主要使用场景：

将地名转换为DCIDs以用于查询
将坐标解析为对应地点
将Wikidata ID映射到Data Commons实体
处理模糊的实体名称

常见用法示例：

python

undefined

Resolve by name

通过名称解析

response = client.resolve.fetch_dcids_by_name( names=["California", "Texas"], entity_type="State" )

Resolve by coordinates

通过坐标解析

dcid = client.resolve.fetch_dcid_by_coordinates( latitude=37.7749, longitude=-122.4194 )

Resolve Wikidata IDs

解析Wikidata ID

response = client.resolve.fetch_dcids_by_wikidata_id( wikidata_ids=["Q30", "Q99"] )

undefined

response = client.resolve.fetch_dcids_by_wikidata_id( wikidata_ids=["Q30", "Q99"] )

undefined

Typical Workflow

典型工作流

Most Data Commons queries follow this pattern:

Resolve entities (if starting with names):

python

resolve_response = client.resolve.fetch_dcids_by_name(
    names=["California", "Texas"]
)
dcids = [r["candidates"][0]["dcid"]
         for r in resolve_response.to_dict().values()
         if r["candidates"]]

Discover available variables (optional):

python

variables = client.observation.fetch_available_statistical_variables(
    entity_dcids=dcids
)

Query statistical data:

python

response = client.observation.fetch(
    variable_dcids=["Count_Person", "UnemploymentRate_Person"],
    entity_dcids=dcids,
    date="latest"
)

Process results:

python

# As dictionary
data = response.to_dict()

# As Pandas DataFrame
df = response.to_observations_as_records()

大多数Data Commons查询遵循以下模式：

解析实体（如果从名称开始）：

python

resolve_response = client.resolve.fetch_dcids_by_name(
    names=["California", "Texas"]
)
dcids = [r["candidates"][0]["dcid"]
         for r in resolve_response.to_dict().values()
         if r["candidates"]]

发现可用变量（可选）：

python

variables = client.observation.fetch_available_statistical_variables(
    entity_dcids=dcids
)

查询统计数据：

python

response = client.observation.fetch(
    variable_dcids=["Count_Person", "UnemploymentRate_Person"],
    entity_dcids=dcids,
    date="latest"
)

处理结果：

python

# 转换为字典
data = response.to_dict()

# 转换为Pandas DataFrame
df = response.to_observations_as_records()

Finding Statistical Variables

查找统计变量

Statistical variables use specific naming patterns in Data Commons:

Common variable patterns:

```
Count_Person
```
- Total population
```
Count_Person_Female
```
- Female population
```
UnemploymentRate_Person
```
- Unemployment rate
```
Median_Income_Household
```
- Median household income
```
Count_Death
```
- Death count
```
Median_Age_Person
```
- Median age

Discovery methods:

python

undefined

Data Commons中的统计变量使用特定的命名模式：

常见变量模式：

```
Count_Person
```
- 总人口
```
Count_Person_Female
```
- 女性人口
```
UnemploymentRate_Person
```
- 失业率
```
Median_Income_Household
```
- 家庭收入中位数
```
Count_Death
```
- 死亡人数
```
Median_Age_Person
```
- 年龄中位数

发现方法：

python

undefined

Check what variables are available for an entity

检查某实体的可用变量

available = client.observation.fetch_available_statistical_variables( entity_dcids=["geoId/06"] )

Or explore via the web interface

或通过网页界面探索

https://datacommons.org/tools/statvar

undefined

undefined

Working with Pandas

与Pandas配合使用

All observation responses integrate with Pandas:

python

response = client.observation.fetch(
    variable_dcids=["Count_Person"],
    entity_dcids=["geoId/06", "geoId/48"],
    date="all"
)

所有观测响应都支持与Pandas集成：

python

response = client.observation.fetch(
    variable_dcids=["Count_Person"],
    entity_dcids=["geoId/06", "geoId/48"],
    date="all"
)

Convert to DataFrame

转换为DataFrame

df = response.to_observations_as_records()

Columns: date, entity, variable, value

列：date, entity, variable, value

Reshape for analysis

重塑数据以用于分析

pivot = df.pivot_table( values='value', index='date', columns='entity' )

undefined

pivot = df.pivot_table( values='value', index='date', columns='entity' )

undefined

API Authentication

API认证

For datacommons.org (default):

An API key is required
Set via environment variable:
```
export DC_API_KEY="your_key"
```

Or pass when initializing:

client = DataCommonsClient(api_key="your_key")

Request keys at: https://apikeys.datacommons.org/

For custom Data Commons instances:

No API key required

Specify custom endpoint:

client = DataCommonsClient(url="https://custom.datacommons.org")

对于datacommons.org（默认）：

需要API密钥
通过环境变量设置：
```
export DC_API_KEY="your_key"
```

或初始化时传入：

client = DataCommonsClient(api_key="your_key")

在以下地址申请密钥：https://apikeys.datacommons.org/

对于自定义Data Commons实例：

无需API密钥

指定自定义端点：

client = DataCommonsClient(url="https://custom.datacommons.org")

Reference Documentation

参考文档

Comprehensive documentation for each endpoint is available in the

references/

directory:

references/observation.md
: Complete Observation API documentation with all methods, parameters, response formats, and common use cases
references/node.md
: Complete Node API documentation for graph exploration, property queries, and hierarchy navigation
references/resolve.md
: Complete Resolve API documentation for entity identification and DCID resolution
references/getting_started.md
: Quickstart guide with end-to-end examples and common patterns

每个端点的完整文档都在

references/

目录中：

references/observation.md
：完整的观测API文档，包含所有方法、参数、响应格式和常见使用场景
references/node.md
：完整的节点API文档，用于图谱探索、属性查询和层级导航
references/resolve.md
：完整的解析API文档，用于实体识别和DCID解析
references/getting_started.md
：快速入门指南，包含端到端示例和常见模式

Additional Resources

其他资源

Official Documentation: https://docs.datacommons.org/api/python/v2/
Statistical Variable Explorer: https://datacommons.org/tools/statvar
Data Commons Browser: https://datacommons.org/browser/
GitHub Repository: https://github.com/datacommonsorg/api-python

官方文档：https://docs.datacommons.org/api/python/v2/
统计变量探索器：https://datacommons.org/tools/statvar
Data Commons浏览器：https://datacommons.org/browser/
GitHub仓库：https://github.com/datacommonsorg/api-python

Tips for Effective Use

高效使用技巧

Always start with resolution: Convert names to DCIDs before querying data
Use relation expressions for hierarchies: Query all children at once instead of individual queries
Check data availability first: Use
```
fetch_available_statistical_variables()
```
to see what's queryable
Leverage Pandas integration: Convert responses to DataFrames for analysis
Cache resolutions: If querying the same entities repeatedly, store name→DCID mappings
Filter by facet for consistency: Use
```
filter_facet_domains
```
to ensure data from the same source
Read reference docs: Each endpoint has extensive documentation in the
```
references/
```
directory

始终从解析开始：在查询数据前，将名称转换为DCIDs
使用关系表达式查询层级：一次性查询所有子实体，而非逐个查询
先检查数据可用性：使用
```
fetch_available_statistical_variables()
```
查看可查询的内容
利用Pandas集成：将响应转换为DataFrames以进行分析
缓存解析结果：如果重复查询相同实体，存储名称→DCID的映射
按维度筛选以保证一致性：使用
```
filter_facet_domains
```
确保数据来自同一来源
阅读参考文档：每个端点在
```
references/
```
目录中都有详细文档