neo4j-aura-graph-analytics-skill

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

适用场景

Running GDS algorithms on Aura Business Critical (BC) or Virtual Dedicated Cloud (VDC)
Processing graph data from non-Neo4j sources (Pandas, Spark, CSV)
On-demand / pipeline workloads — ephemeral sessions, pay per session-minute
Full isolation from the live database during analytics

在**Aura Business Critical（BC）或Virtual Dedicated Cloud（VDC）**上运行GDS算法
处理来自非Neo4j数据源的图数据（Pandas、Spark、CSV）
按需/流水线工作负载——临时会话，按会话分钟计费
分析期间与实时数据库完全隔离

When NOT to Use

不适用场景

Aura Pro with embedded GDS plugin →
```
neo4j-gds-skill
```
Self-managed Neo4j with embedded GDS plugin →
```
neo4j-gds-skill
```
Writing Cypher queries →
```
neo4j-cypher-skill
```
Snowflake Graph Analytics →
```
neo4j-snowflake-graph-analytics-skill
```

带有嵌入式GDS插件的Aura Pro →
```
neo4j-gds-skill
```
带有嵌入式GDS插件的自托管Neo4j →
```
neo4j-gds-skill
```
编写Cypher查询 →
```
neo4j-cypher-skill
```
Snowflake图分析 →
```
neo4j-snowflake-graph-analytics-skill
```

Deployment Decision Table

部署决策表

Deployment	Skill
Aura Free	❌ AGA not available
Aura Pro	`neo4j-gds-skill` (embedded plugin)
Aura Business Critical	this skill
Aura Virtual Dedicated Cloud	this skill
Non-Neo4j data (Pandas, Spark)	this skill (standalone mode)

部署环境	技能
Aura Free	❌ AGA不可用
Aura Pro	`neo4j-gds-skill` （嵌入式插件）
Aura Business Critical	本技能
Aura Virtual Dedicated Cloud	本技能
非Neo4j数据（Pandas、Spark）	本技能（独立模式）

Defaults

默认配置

```
graphdatascience >= 1.15
```
required;
```
>= 1.18
```
for Spark
Always call
```
gds.verify_connectivity()
```
after session creation
Always estimate memory before creating a session for large graphs
Always set TTL; default is 1 hour idle, max 7 days
Close session when done —
```
gds.delete()
```
or
```
sessions.delete(name)
```
stops billing
Use
```
AuraAPICredentials.from_env()
```
— never hardcode credentials

需要
```
graphdatascience >= 1.15
```
；Spark支持需要
```
>= 1.18
```
创建会话后务必调用
```
gds.verify_connectivity()
```
为大型图创建会话前务必估算内存
务必设置TTL；默认空闲超时为1小时，最长7天
使用完毕关闭会话——
```
gds.delete()
```
或
```
sessions.delete(name)
```
将停止计费
使用
```
AuraAPICredentials.from_env()
```
——切勿硬编码凭证

Installation

安装

bash

pip install "graphdatascience>=1.15"

bash

pip install "graphdatascience>=1.15"

Key Patterns

核心模式

Step 1 — Authenticate

步骤1 — 身份验证

python

import os
from graphdatascience.session import AuraAPICredentials, GdsSessions

sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())

python

import os
from graphdatascience.session import AuraAPICredentials, GdsSessions

sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())

Reads: AURA_CLIENT_ID, AURA_CLIENT_SECRET, AURA_PROJECT_ID (optional)

读取环境变量：AURA_CLIENT_ID、AURA_CLIENT_SECRET、AURA_PROJECT_ID（可选）

Create API credentials in Aura Console → Account → API credentials

在Aura控制台 → 账户 → API凭证中创建API凭证


If member of multiple projects, set `AURA_PROJECT_ID` or pass `project_id=` explicitly.


若属于多个项目，请设置`AURA_PROJECT_ID`或显式传入`project_id=`参数。

Step 2 — Estimate Memory

步骤2 — 估算内存

python

from graphdatascience.session import AlgorithmCategory, SessionMemory

memory = sessions.estimate(
    node_count=1_000_000,
    relationship_count=5_000_000,
    algorithm_categories=[
        AlgorithmCategory.CENTRALITY,
        AlgorithmCategory.NODE_EMBEDDING,
        AlgorithmCategory.COMMUNITY_DETECTION,
    ],
)

python

from graphdatascience.session import AlgorithmCategory, SessionMemory

memory = sessions.estimate(
    node_count=1_000_000,
    relationship_count=5_000_000,
    algorithm_categories=[
        AlgorithmCategory.CENTRALITY,
        AlgorithmCategory.NODE_EMBEDDING,
        AlgorithmCategory.COMMUNITY_DETECTION,
    ],
)

Returns a SessionMemory tier, e.g. SessionMemory.m_8GB

返回SessionMemory层级，例如SessionMemory.m_8GB

Fixed tiers: m_2GB … m_256GB — see references/limitations.md

固定层级：m_2GB … m_256GB — 参见references/limitations.md

undefined

undefined

Step 3 — Create Session

步骤3 — 创建会话

Mode A — AuraDB connected:

python

from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta

db_connection = DbmsConnectionInfo(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    aura_instance_id=os.environ["AURA_INSTANCEID"],  # from Aura Console URL
)

gds = sessions.get_or_create(
    session_name="my-analysis",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
)
gds.verify_connectivity()

Mode B — Self-managed Neo4j:

python

db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],          # e.g. "bolt://my-server:7687"
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
    session_name="my-analysis-sm",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

Mode C — Standalone (no Neo4j DB):

python

gds = sessions.get_or_create(
    session_name="my-standalone",
    memory=SessionMemory.m_4GB,
    ttl=timedelta(hours=1),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

get_or_create()

is idempotent — reconnects to existing session by name.

模式A — 连接AuraDB：

python

from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta

db_connection = DbmsConnectionInfo(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    aura_instance_id=os.environ["AURA_INSTANCEID"],  # 来自Aura控制台URL
)

gds = sessions.get_or_create(
    session_name="my-analysis",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
)
gds.verify_connectivity()

模式B — 自托管Neo4j：

python

db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],          # 例如 "bolt://my-server:7687"
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
    session_name="my-analysis-sm",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

模式C — 独立模式（无Neo4j数据库）：

python

gds = sessions.get_or_create(
    session_name="my-standalone",
    memory=SessionMemory.m_4GB,
    ttl=timedelta(hours=1),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()

get_or_create()

是幂等操作——通过会话名称重新连接到现有会话。

Step 4 — Project Graph

步骤4 — 投影图

From connected Neo4j (remote projection):

python

G, result = gds.graph.project(
    "my-graph",
    """
    CALL () {
        MATCH (p:Person)
        OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
        RETURN p AS source, r AS rel, p2 AS target,
               p {.age, .score} AS sourceNodeProperties,
               p2 {.age, .score} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
        sourceNodeLabels:     labels(source),
        targetNodeLabels:     labels(target),
        sourceNodeProperties: sourceNodeProperties,
        targetNodeProperties: targetNodeProperties,
        relationshipType:     type(rel)
    })
    """,
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")

CALL () { ... }

is required for multi-pattern MATCH. Use

UNION

inside

CALL

for multiple labels/rel types.

From Pandas DataFrames (standalone mode):

python

import pandas as pd

nodes_df = pd.DataFrame([
    {"nodeId": 0, "labels": "Person", "age": 30},
    {"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
    {"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])

G = gds.graph.construct("my-graph", nodes_df, rels_df)

从已连接的Neo4j（远程投影）：

python

G, result = gds.graph.project(
    "my-graph",
    """
    CALL () {
        MATCH (p:Person)
        OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
        RETURN p AS source, r AS rel, p2 AS target,
               p {.age, .score} AS sourceNodeProperties,
               p2 {.age, .score} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
        sourceNodeLabels:     labels(source),
        targetNodeLabels:     labels(target),
        sourceNodeProperties: sourceNodeProperties,
        targetNodeProperties: targetNodeProperties,
        relationshipType:     type(rel)
    })
    """,
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")

多模式MATCH需要使用

CALL () { ... }

。如需处理多个标签/关系类型，请在

CALL

内使用

UNION

。

从Pandas DataFrames（独立模式）：

python

import pandas as pd

nodes_df = pd.DataFrame([
    {"nodeId": 0, "labels": "Person", "age": 30},
    {"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
    {"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])

G = gds.graph.construct("my-graph", nodes_df, rels_df)

Multiple DataFrames: gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])

多个DataFrames：gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])


Required columns — nodes: `nodeId` (int), `labels` (str). Relationships: `sourceNodeId`, `targetNodeId`, `relationshipType`. String node properties not supported — drop before `construct()`.


必填列——节点：`nodeId`（整数）、`labels`（字符串）。关系：`sourceNodeId`、`targetNodeId`、`relationshipType`。不支持字符串类型的节点属性——在调用`construct()`前删除此类列。

Step 5 — Run Algorithms

步骤5 — 运行算法

python

undefined

python

undefined

Mutate — chain results without writing to DB

Mutate — 链式处理结果，无需写入数据库

gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85) gds.fastRP.mutate(G, mutateProperty="embedding", embeddingDimension=128, featureProperties=["pagerank"], randomSeed=42, )

Stream — inspect results as DataFrame

Stream — 以DataFrame形式查看结果

df = gds.pageRank.stream(G) print(df.sort_values("score", ascending=False).head(10))

Write — persist to connected Neo4j DB (connected modes only)

Write — 持久化到已连接的Neo4j数据库（仅连接模式可用）

gds.louvain.write(G, writeProperty="community")


All GDS algorithms work in AGA except topological link prediction. See `neo4j-gds-skill` for the full algorithm reference.

gds.louvain.write(G, writeProperty="community")


除拓扑链接预测外，所有GDS算法均可在AGA中运行。完整算法参考请参见`neo4j-gds-skill`。

Step 6 — Async Job Polling

步骤6 — 异步任务轮询

Algorithm calls may return a job handle for long-running computations. Poll until done:

python

import time

job = gds.pageRank.mutate(G, mutateProperty="pagerank")

对于长时间运行的计算，算法调用可能会返回任务句柄。需轮询直至完成：

python

import time

job = gds.pageRank.mutate(G, mutateProperty="pagerank")

If job object returned (async mode), poll explicitly:

如果返回任务对象（异步模式），需显式轮询：

if hasattr(job, "status"): while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"): time.sleep(5) print(f"Job status: {job.status()}") if job.status() != "RUNNING_DONE": raise RuntimeError(f"Algorithm job failed: {job.status()}")


Do NOT assume immediate completion on large graphs. Check `.status()` before reading results.


切勿假设大型图的计算会立即完成。读取结果前请检查`.status()`。

Step 7 — Retrieve Results

步骤7 — 检索结果

python

undefined

python

undefined

Stream node properties — one column per property

流式获取节点属性——每个属性对应一列

result_df = gds.graph.nodeProperties.stream( G, node_properties=["pagerank", "embedding"], separate_property_columns=True, db_node_properties=["name"], # pull from connected DB for context (connected modes only) ) result_df.head(10)


Standalone mode — no `db_node_properties`; join back to source DataFrame:
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")

result_df = gds.graph.nodeProperties.stream( G, node_properties=["pagerank", "embedding"], separate_property_columns=True, db_node_properties=["name"], # 从已连接的数据库拉取上下文信息（仅连接模式可用） ) result_df.head(10)


独立模式——无`db_node_properties`；需与源DataFrame合并：
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")

Step 8 — Write Back and Clean Up

步骤8 — 写回与清理

python

undefined

python

undefined

Write multiple node properties to connected Neo4j

将多个节点属性写入已连接的Neo4j

gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])

Write relationship properties

写入关系属性

gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])

Run Cypher against connected DB from within session

在会话内对已连接的数据库运行Cypher

gds.run_cypher("MATCH (n:Person) RETURN count(n)")

Drop projected graph (frees session memory)

删除投影图（释放会话内存）

G.drop()

Delete session — stops billing

删除会话——停止计费

sessions.delete(session_name="my-analysis")

or: gds.delete()

或：gds.delete()


Write before deleting — results not written back are lost when session closes.


删除会话前务必写入结果——未写回的结果会在会话关闭后丢失。

Session Management

会话管理

python

undefined

python

undefined

List active sessions

列出活跃会话

from pandas import DataFrame DataFrame(sessions.list())

Reconnect to existing session

重新连接到现有会话

gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)

---

gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)

---

Common Errors

常见错误

Error	Cause	Fix
`AuthenticationError` / 401	Wrong `CLIENT_ID` / `CLIENT_SECRET`	Regenerate in Aura Console → Account → API credentials
`SessionNotFoundError`	Session expired (TTL exceeded) or name typo	`sessions.list()` to check; recreate session
`GraphNotFoundError`	Projection dropped or session reconnected without re-projecting	Re-run `gds.graph.project()` or `gds.graph.construct()`
Algorithm job `FAILED`	Memory limit exceeded or unsupported algorithm	Increase `SessionMemory` ; check topological link prediction not used
`MemoryEstimationExceeded`	Graph larger than estimated	Re-estimate with actual counts; pick next tier up
Results empty after session reconnect	Results not written before session was closed	Always write/stream before `gds.delete()`
`String node properties not supported`	String column in nodes DataFrame	Drop string columns before `gds.graph.construct()`
`AGA not enabled for project`	AGA feature not activated	Enable in Aura Console → project settings

错误	原因	修复方案
`AuthenticationError` / 401	`CLIENT_ID` / `CLIENT_SECRET` 错误	在Aura控制台 → 账户 → API凭证中重新生成
`SessionNotFoundError`	会话过期（超过TTL）或名称拼写错误	使用 `sessions.list()` 检查；重新创建会话
`GraphNotFoundError`	投影已删除或重新连接会话后未重新投影	重新运行 `gds.graph.project()` 或 `gds.graph.construct()`
算法任务 `FAILED`	内存限制超出或算法不被支持	增大 `SessionMemory` ；检查是否使用了拓扑链接预测
`MemoryEstimationExceeded`	图大小超出估算值	使用实际数量重新估算；选择更高层级
重新连接会话后结果为空	会话关闭前未写入结果	务必在 `gds.delete()` 前写入/流式获取结果
`String node properties not supported`	节点DataFrame中存在字符串列	在调用 `gds.graph.construct()` 前删除字符串列
`AGA not enabled for project`	AGA功能未激活	在Aura控制台 → 项目设置中启用

References

参考资料

Load on demand:

references/workflows.md — full AuraDB and standalone workflow examples, Spark integration
references/limitations.md — AGA vs embedded GDS feature table, SessionMemory tiers, cloud locations

按需加载：

references/workflows.md — 完整的AuraDB和独立工作流示例、Spark集成
references/limitations.md — AGA与嵌入式GDS功能对比表、SessionMemory层级、云区域

WebFetch

Web资源

Need URL

AGA Python client docs

Need	URL
AGA Python client docs	`https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/`
AuraDB tutorial notebook	`https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb`
GDS algorithm reference	`https://neo4j.com/docs/graph-data-science/current/algorithms/`

https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/

AuraDB tutorial notebook

https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb

GDS algorithm reference

https://neo4j.com/docs/graph-data-science/current/algorithms/

需求 URL

AGA Python客户端文档

需求	URL
AGA Python客户端文档	`https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/`
AuraDB教程笔记本	`https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb`
GDS算法参考	`https://neo4j.com/docs/graph-data-science/current/algorithms/`

https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/

AuraDB教程笔记本

https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb

GDS算法参考

https://neo4j.com/docs/graph-data-science/current/algorithms/

Checklist

检查清单

Aura API credentials created and set in environment (
```
AURA_CLIENT_ID
```
,
```
AURA_CLIENT_SECRET
```
)
AGA feature enabled for Aura project (Aura Console → project settings)
Memory estimated before session creation (
```
sessions.estimate(...)
```
)
Cloud location chosen near data source
```
gds.verify_connectivity()
```
called after session creation
TTL set to avoid unexpected costs on idle sessions
Async algorithm jobs polled until
```
RUNNING_DONE
```
before reading results
Results written back (connected modes) or streamed and persisted (standalone) before deletion
Session deleted when done (
```
sessions.delete(...)
```
or
```
gds.delete()
```
)

已创建Aura API凭证并设置到环境变量中（
```
AURA_CLIENT_ID
```
、
```
AURA_CLIENT_SECRET
```
）
Aura项目已启用AGA功能（Aura控制台 → 项目设置）
创建会话前已估算内存（
```
sessions.estimate(...)
```
）
已选择靠近数据源的云区域
创建会话后已调用
```
gds.verify_connectivity()
```
已设置TTL以避免空闲会话产生意外费用
异步算法任务已轮询至
```
RUNNING_DONE
```
后再读取结果
删除会话前已写回结果（连接模式）或流式获取并持久化（独立模式）
使用完毕已删除会话（
```
sessions.delete(...)
```
或
```
gds.delete()
```
）