neo4j-aura-graph-analytics-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • Running GDS algorithms on Aura Business Critical (BC) or Virtual Dedicated Cloud (VDC)
  • Processing graph data from non-Neo4j sources (Pandas, Spark, CSV)
  • On-demand / pipeline workloads — ephemeral sessions, pay per session-minute
  • Full isolation from the live database during analytics
  • 在**Aura Business Critical(BC)Virtual Dedicated Cloud(VDC)**上运行GDS算法
  • 处理来自非Neo4j数据源的图数据(Pandas、Spark、CSV)
  • 按需/流水线工作负载——临时会话,按会话分钟计费
  • 分析期间与实时数据库完全隔离

When NOT to Use

不适用场景

  • Aura Pro with embedded GDS plugin
    neo4j-gds-skill
  • Self-managed Neo4j with embedded GDS plugin
    neo4j-gds-skill
  • Writing Cypher queries
    neo4j-cypher-skill
  • Snowflake Graph Analytics
    neo4j-snowflake-graph-analytics-skill

  • 带有嵌入式GDS插件的Aura Pro
    neo4j-gds-skill
  • 带有嵌入式GDS插件的自托管Neo4j
    neo4j-gds-skill
  • 编写Cypher查询
    neo4j-cypher-skill
  • Snowflake图分析
    neo4j-snowflake-graph-analytics-skill

Deployment Decision Table

部署决策表

DeploymentSkill
Aura Free❌ AGA not available
Aura Pro
neo4j-gds-skill
(embedded plugin)
Aura Business Criticalthis skill
Aura Virtual Dedicated Cloudthis skill
Non-Neo4j data (Pandas, Spark)this skill (standalone mode)

部署环境技能
Aura Free❌ AGA不可用
Aura Pro
neo4j-gds-skill
(嵌入式插件)
Aura Business Critical本技能
Aura Virtual Dedicated Cloud本技能
非Neo4j数据(Pandas、Spark)本技能(独立模式)

Defaults

默认配置

  • graphdatascience >= 1.15
    required;
    >= 1.18
    for Spark
  • Always call
    gds.verify_connectivity()
    after session creation
  • Always estimate memory before creating a session for large graphs
  • Always set TTL; default is 1 hour idle, max 7 days
  • Close session when done —
    gds.delete()
    or
    sessions.delete(name)
    stops billing
  • Use
    AuraAPICredentials.from_env()
    — never hardcode credentials

  • 需要
    graphdatascience >= 1.15
    ;Spark支持需要
    >= 1.18
  • 创建会话后务必调用
    gds.verify_connectivity()
  • 为大型图创建会话前务必估算内存
  • 务必设置TTL;默认空闲超时为1小时,最长7天
  • 使用完毕关闭会话——
    gds.delete()
    sessions.delete(name)
    将停止计费
  • 使用
    AuraAPICredentials.from_env()
    ——切勿硬编码凭证

Installation

安装

bash
pip install "graphdatascience>=1.15"

bash
pip install "graphdatascience>=1.15"

Key Patterns

核心模式

Step 1 — Authenticate

步骤1 — 身份验证

python
import os
from graphdatascience.session import AuraAPICredentials, GdsSessions

sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())
python
import os
from graphdatascience.session import AuraAPICredentials, GdsSessions

sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())

Reads: AURA_CLIENT_ID, AURA_CLIENT_SECRET, AURA_PROJECT_ID (optional)

读取环境变量:AURA_CLIENT_ID、AURA_CLIENT_SECRET、AURA_PROJECT_ID(可选)

Create API credentials in Aura Console → Account → API credentials

在Aura控制台 → 账户 → API凭证中创建API凭证


If member of multiple projects, set `AURA_PROJECT_ID` or pass `project_id=` explicitly.

若属于多个项目,请设置`AURA_PROJECT_ID`或显式传入`project_id=`参数。

Step 2 — Estimate Memory

步骤2 — 估算内存

python
from graphdatascience.session import AlgorithmCategory, SessionMemory

memory = sessions.estimate(
    node_count=1_000_000,
    relationship_count=5_000_000,
    algorithm_categories=[
        AlgorithmCategory.CENTRALITY,
        AlgorithmCategory.NODE_EMBEDDING,
        AlgorithmCategory.COMMUNITY_DETECTION,
    ],
)
python
from graphdatascience.session import AlgorithmCategory, SessionMemory

memory = sessions.estimate(
    node_count=1_000_000,
    relationship_count=5_000_000,
    algorithm_categories=[
        AlgorithmCategory.CENTRALITY,
        AlgorithmCategory.NODE_EMBEDDING,
        AlgorithmCategory.COMMUNITY_DETECTION,
    ],
)

Returns a SessionMemory tier, e.g. SessionMemory.m_8GB

返回SessionMemory层级,例如SessionMemory.m_8GB

Fixed tiers: m_2GB … m_256GB — see references/limitations.md

固定层级:m_2GB … m_256GB — 参见references/limitations.md

undefined
undefined

Step 3 — Create Session

步骤3 — 创建会话

Mode A — AuraDB connected:
python
from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta

db_connection = DbmsConnectionInfo(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    aura_instance_id=os.environ["AURA_INSTANCEID"],  # from Aura Console URL
)

gds = sessions.get_or_create(
    session_name="my-analysis",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
)
gds.verify_connectivity()
Mode B — Self-managed Neo4j:
python
db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],          # e.g. "bolt://my-server:7687"
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
    session_name="my-analysis-sm",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()
Mode C — Standalone (no Neo4j DB):
python
gds = sessions.get_or_create(
    session_name="my-standalone",
    memory=SessionMemory.m_4GB,
    ttl=timedelta(hours=1),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()
get_or_create()
is idempotent — reconnects to existing session by name.
模式A — 连接AuraDB:
python
from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta

db_connection = DbmsConnectionInfo(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    aura_instance_id=os.environ["AURA_INSTANCEID"],  # 来自Aura控制台URL
)

gds = sessions.get_or_create(
    session_name="my-analysis",
    memory=memory,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
)
gds.verify_connectivity()
模式B — 自托管Neo4j:
python
db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],          # 例如 "bolt://my-server:7687"
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
    session_name="my-analysis-sm",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    ttl=timedelta(hours=2),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()
模式C — 独立模式(无Neo4j数据库):
python
gds = sessions.get_or_create(
    session_name="my-standalone",
    memory=SessionMemory.m_4GB,
    ttl=timedelta(hours=1),
    cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()
get_or_create()
是幂等操作——通过会话名称重新连接到现有会话。

Step 4 — Project Graph

步骤4 — 投影图

From connected Neo4j (remote projection):
python
G, result = gds.graph.project(
    "my-graph",
    """
    CALL () {
        MATCH (p:Person)
        OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
        RETURN p AS source, r AS rel, p2 AS target,
               p {.age, .score} AS sourceNodeProperties,
               p2 {.age, .score} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
        sourceNodeLabels:     labels(source),
        targetNodeLabels:     labels(target),
        sourceNodeProperties: sourceNodeProperties,
        targetNodeProperties: targetNodeProperties,
        relationshipType:     type(rel)
    })
    """,
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")
CALL () { ... }
is required for multi-pattern MATCH. Use
UNION
inside
CALL
for multiple labels/rel types.
From Pandas DataFrames (standalone mode):
python
import pandas as pd

nodes_df = pd.DataFrame([
    {"nodeId": 0, "labels": "Person", "age": 30},
    {"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
    {"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])

G = gds.graph.construct("my-graph", nodes_df, rels_df)
从已连接的Neo4j(远程投影):
python
G, result = gds.graph.project(
    "my-graph",
    """
    CALL () {
        MATCH (p:Person)
        OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
        RETURN p AS source, r AS rel, p2 AS target,
               p {.age, .score} AS sourceNodeProperties,
               p2 {.age, .score} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
        sourceNodeLabels:     labels(source),
        targetNodeLabels:     labels(target),
        sourceNodeProperties: sourceNodeProperties,
        targetNodeProperties: targetNodeProperties,
        relationshipType:     type(rel)
    })
    """,
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")
多模式MATCH需要使用
CALL () { ... }
。如需处理多个标签/关系类型,请在
CALL
内使用
UNION
从Pandas DataFrames(独立模式):
python
import pandas as pd

nodes_df = pd.DataFrame([
    {"nodeId": 0, "labels": "Person", "age": 30},
    {"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
    {"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])

G = gds.graph.construct("my-graph", nodes_df, rels_df)

Multiple DataFrames: gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])

多个DataFrames:gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])


Required columns — nodes: `nodeId` (int), `labels` (str). Relationships: `sourceNodeId`, `targetNodeId`, `relationshipType`. String node properties not supported — drop before `construct()`.

必填列——节点:`nodeId`(整数)、`labels`(字符串)。关系:`sourceNodeId`、`targetNodeId`、`relationshipType`。不支持字符串类型的节点属性——在调用`construct()`前删除此类列。

Step 5 — Run Algorithms

步骤5 — 运行算法

python
undefined
python
undefined

Mutate — chain results without writing to DB

Mutate — 链式处理结果,无需写入数据库

gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85) gds.fastRP.mutate(G, mutateProperty="embedding", embeddingDimension=128, featureProperties=["pagerank"], randomSeed=42, )
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85) gds.fastRP.mutate(G, mutateProperty="embedding", embeddingDimension=128, featureProperties=["pagerank"], randomSeed=42, )

Stream — inspect results as DataFrame

Stream — 以DataFrame形式查看结果

df = gds.pageRank.stream(G) print(df.sort_values("score", ascending=False).head(10))
df = gds.pageRank.stream(G) print(df.sort_values("score", ascending=False).head(10))

Write — persist to connected Neo4j DB (connected modes only)

Write — 持久化到已连接的Neo4j数据库(仅连接模式可用)

gds.louvain.write(G, writeProperty="community")

All GDS algorithms work in AGA except topological link prediction. See `neo4j-gds-skill` for the full algorithm reference.
gds.louvain.write(G, writeProperty="community")

除拓扑链接预测外,所有GDS算法均可在AGA中运行。完整算法参考请参见`neo4j-gds-skill`。

Step 6 — Async Job Polling

步骤6 — 异步任务轮询

Algorithm calls may return a job handle for long-running computations. Poll until done:
python
import time

job = gds.pageRank.mutate(G, mutateProperty="pagerank")
对于长时间运行的计算,算法调用可能会返回任务句柄。需轮询直至完成:
python
import time

job = gds.pageRank.mutate(G, mutateProperty="pagerank")

If job object returned (async mode), poll explicitly:

如果返回任务对象(异步模式),需显式轮询:

if hasattr(job, "status"): while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"): time.sleep(5) print(f"Job status: {job.status()}") if job.status() != "RUNNING_DONE": raise RuntimeError(f"Algorithm job failed: {job.status()}")

Do NOT assume immediate completion on large graphs. Check `.status()` before reading results.
if hasattr(job, "status"): while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"): time.sleep(5) print(f"Job status: {job.status()}") if job.status() != "RUNNING_DONE": raise RuntimeError(f"Algorithm job failed: {job.status()}")

切勿假设大型图的计算会立即完成。读取结果前请检查`.status()`。

Step 7 — Retrieve Results

步骤7 — 检索结果

python
undefined
python
undefined

Stream node properties — one column per property

流式获取节点属性——每个属性对应一列

result_df = gds.graph.nodeProperties.stream( G, node_properties=["pagerank", "embedding"], separate_property_columns=True, db_node_properties=["name"], # pull from connected DB for context (connected modes only) ) result_df.head(10)

Standalone mode — no `db_node_properties`; join back to source DataFrame:
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")
result_df = gds.graph.nodeProperties.stream( G, node_properties=["pagerank", "embedding"], separate_property_columns=True, db_node_properties=["name"], # 从已连接的数据库拉取上下文信息(仅连接模式可用) ) result_df.head(10)

独立模式——无`db_node_properties`;需与源DataFrame合并:
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")

Step 8 — Write Back and Clean Up

步骤8 — 写回与清理

python
undefined
python
undefined

Write multiple node properties to connected Neo4j

将多个节点属性写入已连接的Neo4j

gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])
gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])

Write relationship properties

写入关系属性

gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])
gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])

Run Cypher against connected DB from within session

在会话内对已连接的数据库运行Cypher

gds.run_cypher("MATCH (n:Person) RETURN count(n)")
gds.run_cypher("MATCH (n:Person) RETURN count(n)")

Drop projected graph (frees session memory)

删除投影图(释放会话内存)

G.drop()
G.drop()

Delete session — stops billing

删除会话——停止计费

sessions.delete(session_name="my-analysis")
sessions.delete(session_name="my-analysis")

or: gds.delete()

或:gds.delete()


Write before deleting — results not written back are lost when session closes.

删除会话前务必写入结果——未写回的结果会在会话关闭后丢失。

Session Management

会话管理

python
undefined
python
undefined

List active sessions

列出活跃会话

from pandas import DataFrame DataFrame(sessions.list())
from pandas import DataFrame DataFrame(sessions.list())

Reconnect to existing session

重新连接到现有会话

gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)

---
gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)

---

Common Errors

常见错误

ErrorCauseFix
AuthenticationError
/ 401
Wrong
CLIENT_ID
/
CLIENT_SECRET
Regenerate in Aura Console → Account → API credentials
SessionNotFoundError
Session expired (TTL exceeded) or name typo
sessions.list()
to check; recreate session
GraphNotFoundError
Projection dropped or session reconnected without re-projectingRe-run
gds.graph.project()
or
gds.graph.construct()
Algorithm job
FAILED
Memory limit exceeded or unsupported algorithmIncrease
SessionMemory
; check topological link prediction not used
MemoryEstimationExceeded
Graph larger than estimatedRe-estimate with actual counts; pick next tier up
Results empty after session reconnectResults not written before session was closedAlways write/stream before
gds.delete()
String node properties not supported
String column in nodes DataFrameDrop string columns before
gds.graph.construct()
AGA not enabled for project
AGA feature not activatedEnable in Aura Console → project settings

错误原因修复方案
AuthenticationError
/ 401
CLIENT_ID
/
CLIENT_SECRET
错误
在Aura控制台 → 账户 → API凭证中重新生成
SessionNotFoundError
会话过期(超过TTL)或名称拼写错误使用
sessions.list()
检查;重新创建会话
GraphNotFoundError
投影已删除或重新连接会话后未重新投影重新运行
gds.graph.project()
gds.graph.construct()
算法任务
FAILED
内存限制超出或算法不被支持增大
SessionMemory
;检查是否使用了拓扑链接预测
MemoryEstimationExceeded
图大小超出估算值使用实际数量重新估算;选择更高层级
重新连接会话后结果为空会话关闭前未写入结果务必在
gds.delete()
前写入/流式获取结果
String node properties not supported
节点DataFrame中存在字符串列在调用
gds.graph.construct()
前删除字符串列
AGA not enabled for project
AGA功能未激活在Aura控制台 → 项目设置中启用

References

参考资料

Load on demand:
  • references/workflows.md — full AuraDB and standalone workflow examples, Spark integration
  • references/limitations.md — AGA vs embedded GDS feature table, SessionMemory tiers, cloud locations
按需加载:
  • references/workflows.md — 完整的AuraDB和独立工作流示例、Spark集成
  • references/limitations.md — AGA与嵌入式GDS功能对比表、SessionMemory层级、云区域

WebFetch

Web资源

NeedURL
AGA Python client docs
https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/
AuraDB tutorial notebook
https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb
GDS algorithm reference
https://neo4j.com/docs/graph-data-science/current/algorithms/

需求URL
AGA Python客户端文档
https://neo4j.com/docs/graph-data-science-client/current/aura-graph-analytics/
AuraDB教程笔记本
https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless.ipynb
GDS算法参考
https://neo4j.com/docs/graph-data-science/current/algorithms/

Checklist

检查清单

  • Aura API credentials created and set in environment (
    AURA_CLIENT_ID
    ,
    AURA_CLIENT_SECRET
    )
  • AGA feature enabled for Aura project (Aura Console → project settings)
  • Memory estimated before session creation (
    sessions.estimate(...)
    )
  • Cloud location chosen near data source
  • gds.verify_connectivity()
    called after session creation
  • TTL set to avoid unexpected costs on idle sessions
  • Async algorithm jobs polled until
    RUNNING_DONE
    before reading results
  • Results written back (connected modes) or streamed and persisted (standalone) before deletion
  • Session deleted when done (
    sessions.delete(...)
    or
    gds.delete()
    )
  • 已创建Aura API凭证并设置到环境变量中(
    AURA_CLIENT_ID
    AURA_CLIENT_SECRET
  • Aura项目已启用AGA功能(Aura控制台 → 项目设置)
  • 创建会话前已估算内存(
    sessions.estimate(...)
  • 已选择靠近数据源的云区域
  • 创建会话后已调用
    gds.verify_connectivity()
  • 已设置TTL以避免空闲会话产生意外费用
  • 异步算法任务已轮询至
    RUNNING_DONE
    后再读取结果
  • 删除会话前已写回结果(连接模式)或流式获取并持久化(独立模式)
  • 使用完毕已删除会话(
    sessions.delete(...)
    gds.delete()