datacommons-client

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Commons Client

Data Commons 客户端

Overview

概述

Provides comprehensive access to the Data Commons Python API v2 for querying statistical observations, exploring the knowledge graph, and resolving entity identifiers. Data Commons aggregates data from census bureaus, health organizations, environmental agencies, and other authoritative sources into a unified knowledge graph.
提供对Data Commons Python API v2的全面访问,用于查询统计观测数据、探索知识图谱以及解析实体标识符。Data Commons将来自人口普查局、卫生组织、环境机构和其他权威来源的数据聚合到一个统一的知识图谱中。

Installation

安装

Install the Data Commons Python client with Pandas support:
bash
uv pip install "datacommons-client[Pandas]"
For basic usage without Pandas:
bash
uv pip install datacommons-client
安装支持Pandas的Data Commons Python客户端:
bash
uv pip install "datacommons-client[Pandas]"
如果无需Pandas,仅基础使用:
bash
uv pip install datacommons-client

Core Capabilities

核心功能

The Data Commons API consists of three main endpoints, each detailed in dedicated reference files:
Data Commons API包含三个主要端点,每个端点的详细信息都在专门的参考文件中:

1. Observation Endpoint - Statistical Data Queries

1. 观测端点 - 统计数据查询

Query time-series statistical data for entities. See
references/observation.md
for comprehensive documentation.
Primary use cases:
  • Retrieve population, economic, health, or environmental statistics
  • Access historical time-series data for trend analysis
  • Query data for hierarchies (all counties in a state, all countries in a region)
  • Compare statistics across multiple entities
  • Filter by data source for consistency
Common patterns:
python
from datacommons_client import DataCommonsClient

client = DataCommonsClient()
查询实体的时间序列统计数据。有关完整文档,请参阅
references/observation.md
主要使用场景:
  • 获取人口、经济、健康或环境统计数据
  • 访问历史时间序列数据以进行趋势分析
  • 查询层级数据(某一州的所有县、某一地区的所有国家)
  • 比较多个实体的统计数据
  • 按数据源筛选以保证数据一致性
常见用法示例:
python
from datacommons_client import DataCommonsClient

client = DataCommonsClient()

Get latest population data

获取最新人口数据

response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06"], # California date="latest" )
response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06"], # 加利福尼亚州 date="latest" )

Get time series

获取时间序列数据

response = client.observation.fetch( variable_dcids=["UnemploymentRate_Person"], entity_dcids=["country/USA"], date="all" )
response = client.observation.fetch( variable_dcids=["UnemploymentRate_Person"], entity_dcids=["country/USA"], date="all" )

Query by hierarchy

按层级查询

response = client.observation.fetch( variable_dcids=["MedianIncome_Household"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" )
undefined
response = client.observation.fetch( variable_dcids=["MedianIncome_Household"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" )
undefined

2. Node Endpoint - Knowledge Graph Exploration

2. 节点端点 - 知识图谱探索

Explore entity relationships and properties within the knowledge graph. See
references/node.md
for comprehensive documentation.
Primary use cases:
  • Discover available properties for entities
  • Navigate geographic hierarchies (parent/child relationships)
  • Retrieve entity names and metadata
  • Explore connections between entities
  • List all entity types in the graph
Common patterns:
python
undefined
探索知识图谱中的实体关系和属性。有关完整文档,请参阅
references/node.md
主要使用场景:
  • 发现实体的可用属性
  • 导航地理层级(父/子关系)
  • 获取实体名称和元数据
  • 探索实体之间的关联
  • 列出图谱中的所有实体类型
常见用法示例:
python
undefined

Discover properties

发现属性

labels = client.node.fetch_property_labels( node_dcids=["geoId/06"], out=True )
labels = client.node.fetch_property_labels( node_dcids=["geoId/06"], out=True )

Navigate hierarchy

导航层级结构

children = client.node.fetch_place_children( node_dcids=["country/USA"] )
children = client.node.fetch_place_children( node_dcids=["country/USA"] )

Get entity names

获取实体名称

names = client.node.fetch_entity_names( node_dcids=["geoId/06", "geoId/48"] )
undefined
names = client.node.fetch_entity_names( node_dcids=["geoId/06", "geoId/48"] )
undefined

3. Resolve Endpoint - Entity Identification

3. 解析端点 - 实体识别

Translate entity names, coordinates, or external IDs into Data Commons IDs (DCIDs). See
references/resolve.md
for comprehensive documentation.
Primary use cases:
  • Convert place names to DCIDs for queries
  • Resolve coordinates to places
  • Map Wikidata IDs to Data Commons entities
  • Handle ambiguous entity names
Common patterns:
python
undefined
将实体名称、坐标或外部ID转换为Data Commons ID(DCIDs)。有关完整文档,请参阅
references/resolve.md
主要使用场景:
  • 将地名转换为DCIDs以用于查询
  • 将坐标解析为对应地点
  • 将Wikidata ID映射到Data Commons实体
  • 处理模糊的实体名称
常见用法示例:
python
undefined

Resolve by name

通过名称解析

response = client.resolve.fetch_dcids_by_name( names=["California", "Texas"], entity_type="State" )
response = client.resolve.fetch_dcids_by_name( names=["California", "Texas"], entity_type="State" )

Resolve by coordinates

通过坐标解析

dcid = client.resolve.fetch_dcid_by_coordinates( latitude=37.7749, longitude=-122.4194 )
dcid = client.resolve.fetch_dcid_by_coordinates( latitude=37.7749, longitude=-122.4194 )

Resolve Wikidata IDs

解析Wikidata ID

response = client.resolve.fetch_dcids_by_wikidata_id( wikidata_ids=["Q30", "Q99"] )
undefined
response = client.resolve.fetch_dcids_by_wikidata_id( wikidata_ids=["Q30", "Q99"] )
undefined

Typical Workflow

典型工作流

Most Data Commons queries follow this pattern:
  1. Resolve entities (if starting with names):
    python
    resolve_response = client.resolve.fetch_dcids_by_name(
        names=["California", "Texas"]
    )
    dcids = [r["candidates"][0]["dcid"]
             for r in resolve_response.to_dict().values()
             if r["candidates"]]
  2. Discover available variables (optional):
    python
    variables = client.observation.fetch_available_statistical_variables(
        entity_dcids=dcids
    )
  3. Query statistical data:
    python
    response = client.observation.fetch(
        variable_dcids=["Count_Person", "UnemploymentRate_Person"],
        entity_dcids=dcids,
        date="latest"
    )
  4. Process results:
    python
    # As dictionary
    data = response.to_dict()
    
    # As Pandas DataFrame
    df = response.to_observations_as_records()
大多数Data Commons查询遵循以下模式:
  1. 解析实体(如果从名称开始):
    python
    resolve_response = client.resolve.fetch_dcids_by_name(
        names=["California", "Texas"]
    )
    dcids = [r["candidates"][0]["dcid"]
             for r in resolve_response.to_dict().values()
             if r["candidates"]]
  2. 发现可用变量(可选):
    python
    variables = client.observation.fetch_available_statistical_variables(
        entity_dcids=dcids
    )
  3. 查询统计数据
    python
    response = client.observation.fetch(
        variable_dcids=["Count_Person", "UnemploymentRate_Person"],
        entity_dcids=dcids,
        date="latest"
    )
  4. 处理结果
    python
    # 转换为字典
    data = response.to_dict()
    
    # 转换为Pandas DataFrame
    df = response.to_observations_as_records()

Finding Statistical Variables

查找统计变量

Statistical variables use specific naming patterns in Data Commons:
Common variable patterns:
  • Count_Person
    - Total population
  • Count_Person_Female
    - Female population
  • UnemploymentRate_Person
    - Unemployment rate
  • Median_Income_Household
    - Median household income
  • Count_Death
    - Death count
  • Median_Age_Person
    - Median age
Discovery methods:
python
undefined
Data Commons中的统计变量使用特定的命名模式:
常见变量模式:
  • Count_Person
    - 总人口
  • Count_Person_Female
    - 女性人口
  • UnemploymentRate_Person
    - 失业率
  • Median_Income_Household
    - 家庭收入中位数
  • Count_Death
    - 死亡人数
  • Median_Age_Person
    - 年龄中位数
发现方法:
python
undefined

Check what variables are available for an entity

检查某实体的可用变量

available = client.observation.fetch_available_statistical_variables( entity_dcids=["geoId/06"] )
available = client.observation.fetch_available_statistical_variables( entity_dcids=["geoId/06"] )

Or explore via the web interface

或通过网页界面探索

undefined
undefined

Working with Pandas

与Pandas配合使用

All observation responses integrate with Pandas:
python
response = client.observation.fetch(
    variable_dcids=["Count_Person"],
    entity_dcids=["geoId/06", "geoId/48"],
    date="all"
)
所有观测响应都支持与Pandas集成:
python
response = client.observation.fetch(
    variable_dcids=["Count_Person"],
    entity_dcids=["geoId/06", "geoId/48"],
    date="all"
)

Convert to DataFrame

转换为DataFrame

df = response.to_observations_as_records()
df = response.to_observations_as_records()

Columns: date, entity, variable, value

列:date, entity, variable, value

Reshape for analysis

重塑数据以用于分析

pivot = df.pivot_table( values='value', index='date', columns='entity' )
undefined
pivot = df.pivot_table( values='value', index='date', columns='entity' )
undefined

API Authentication

API认证

For datacommons.org (default):
  • An API key is required
  • Set via environment variable:
    export DC_API_KEY="your_key"
  • Or pass when initializing:
    client = DataCommonsClient(api_key="your_key")
  • Request keys at: https://apikeys.datacommons.org/
For custom Data Commons instances:
  • No API key required
  • Specify custom endpoint:
    client = DataCommonsClient(url="https://custom.datacommons.org")
对于datacommons.org(默认):
  • 需要API密钥
  • 通过环境变量设置:
    export DC_API_KEY="your_key"
  • 或初始化时传入:
    client = DataCommonsClient(api_key="your_key")
  • 在以下地址申请密钥:https://apikeys.datacommons.org/
对于自定义Data Commons实例:
  • 无需API密钥
  • 指定自定义端点:
    client = DataCommonsClient(url="https://custom.datacommons.org")

Reference Documentation

参考文档

Comprehensive documentation for each endpoint is available in the
references/
directory:
  • references/observation.md
    : Complete Observation API documentation with all methods, parameters, response formats, and common use cases
  • references/node.md
    : Complete Node API documentation for graph exploration, property queries, and hierarchy navigation
  • references/resolve.md
    : Complete Resolve API documentation for entity identification and DCID resolution
  • references/getting_started.md
    : Quickstart guide with end-to-end examples and common patterns
每个端点的完整文档都在
references/
目录中:
  • references/observation.md
    :完整的观测API文档,包含所有方法、参数、响应格式和常见使用场景
  • references/node.md
    :完整的节点API文档,用于图谱探索、属性查询和层级导航
  • references/resolve.md
    :完整的解析API文档,用于实体识别和DCID解析
  • references/getting_started.md
    :快速入门指南,包含端到端示例和常见模式

Additional Resources

其他资源

Tips for Effective Use

高效使用技巧

  1. Always start with resolution: Convert names to DCIDs before querying data
  2. Use relation expressions for hierarchies: Query all children at once instead of individual queries
  3. Check data availability first: Use
    fetch_available_statistical_variables()
    to see what's queryable
  4. Leverage Pandas integration: Convert responses to DataFrames for analysis
  5. Cache resolutions: If querying the same entities repeatedly, store name→DCID mappings
  6. Filter by facet for consistency: Use
    filter_facet_domains
    to ensure data from the same source
  7. Read reference docs: Each endpoint has extensive documentation in the
    references/
    directory
  1. 始终从解析开始:在查询数据前,将名称转换为DCIDs
  2. 使用关系表达式查询层级:一次性查询所有子实体,而非逐个查询
  3. 先检查数据可用性:使用
    fetch_available_statistical_variables()
    查看可查询的内容
  4. 利用Pandas集成:将响应转换为DataFrames以进行分析
  5. 缓存解析结果:如果重复查询相同实体,存储名称→DCID的映射
  6. 按维度筛选以保证一致性:使用
    filter_facet_domains
    确保数据来自同一来源
  7. 阅读参考文档:每个端点在
    references/
    目录中都有详细文档