neo4j-aura-graph-analytics-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use
适用场景
- Running GDS algorithms on Aura Business Critical (BC) or Virtual Dedicated Cloud (VDC)
- Processing graph data from non-Neo4j sources (Pandas, Spark, CSV)
- On-demand / pipeline workloads — ephemeral sessions, pay per session-minute
- Full isolation from the live database during analytics
- 在**Aura Business Critical(BC)或Virtual Dedicated Cloud(VDC)**上运行GDS算法
- 处理来自非Neo4j数据源的图数据(Pandas、Spark、CSV)
- 按需/流水线工作负载——临时会话,按会话分钟计费
- 分析期间与实时数据库完全隔离
When NOT to Use
不适用场景
- Aura Pro with embedded GDS plugin →
neo4j-gds-skill - Self-managed Neo4j with embedded GDS plugin →
neo4j-gds-skill - Writing Cypher queries →
neo4j-cypher-skill - Snowflake Graph Analytics →
neo4j-snowflake-graph-analytics-skill
- 带有嵌入式GDS插件的Aura Pro →
neo4j-gds-skill - 带有嵌入式GDS插件的自托管Neo4j →
neo4j-gds-skill - 编写Cypher查询 →
neo4j-cypher-skill - Snowflake图分析 →
neo4j-snowflake-graph-analytics-skill
Deployment Decision Table
部署决策表
| Deployment | Skill |
|---|---|
| Aura Free | ❌ AGA not available |
| Aura Pro | |
| Aura Business Critical | this skill |
| Aura Virtual Dedicated Cloud | this skill |
| Non-Neo4j data (Pandas, Spark) | this skill (standalone mode) |
| 部署环境 | 技能 |
|---|---|
| Aura Free | ❌ AGA不可用 |
| Aura Pro | |
| Aura Business Critical | 本技能 |
| Aura Virtual Dedicated Cloud | 本技能 |
| 非Neo4j数据(Pandas、Spark) | 本技能(独立模式) |
Defaults
默认配置
- required;
graphdatascience >= 1.15for Spark>= 1.18 - Always call after session creation
gds.verify_connectivity() - Always estimate memory before creating a session for large graphs
- Always set TTL; default is 1 hour idle, max 7 days
- Close session when done — or
gds.delete()stops billingsessions.delete(name) - Use — never hardcode credentials
AuraAPICredentials.from_env()
- 需要;Spark支持需要
graphdatascience >= 1.15>= 1.18 - 创建会话后务必调用
gds.verify_connectivity() - 为大型图创建会话前务必估算内存
- 务必设置TTL;默认空闲超时为1小时,最长7天
- 使用完毕关闭会话——或
gds.delete()将停止计费sessions.delete(name) - 使用——切勿硬编码凭证
AuraAPICredentials.from_env()
Installation
安装
bash
pip install "graphdatascience>=1.15"bash
pip install "graphdatascience>=1.15"Key Patterns
核心模式
Step 1 — Authenticate
步骤1 — 身份验证
python
import os
from graphdatascience.session import AuraAPICredentials, GdsSessions
sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())python
import os
from graphdatascience.session import AuraAPICredentials, GdsSessions
sessions = GdsSessions(api_credentials=AuraAPICredentials.from_env())Reads: AURA_CLIENT_ID, AURA_CLIENT_SECRET, AURA_PROJECT_ID (optional)
读取环境变量:AURA_CLIENT_ID、AURA_CLIENT_SECRET、AURA_PROJECT_ID(可选)
Create API credentials in Aura Console → Account → API credentials
在Aura控制台 → 账户 → API凭证中创建API凭证
If member of multiple projects, set `AURA_PROJECT_ID` or pass `project_id=` explicitly.
若属于多个项目,请设置`AURA_PROJECT_ID`或显式传入`project_id=`参数。Step 2 — Estimate Memory
步骤2 — 估算内存
python
from graphdatascience.session import AlgorithmCategory, SessionMemory
memory = sessions.estimate(
node_count=1_000_000,
relationship_count=5_000_000,
algorithm_categories=[
AlgorithmCategory.CENTRALITY,
AlgorithmCategory.NODE_EMBEDDING,
AlgorithmCategory.COMMUNITY_DETECTION,
],
)python
from graphdatascience.session import AlgorithmCategory, SessionMemory
memory = sessions.estimate(
node_count=1_000_000,
relationship_count=5_000_000,
algorithm_categories=[
AlgorithmCategory.CENTRALITY,
AlgorithmCategory.NODE_EMBEDDING,
AlgorithmCategory.COMMUNITY_DETECTION,
],
)Returns a SessionMemory tier, e.g. SessionMemory.m_8GB
返回SessionMemory层级,例如SessionMemory.m_8GB
Fixed tiers: m_2GB … m_256GB — see references/limitations.md
固定层级:m_2GB … m_256GB — 参见references/limitations.md
undefinedundefinedStep 3 — Create Session
步骤3 — 创建会话
Mode A — AuraDB connected:
python
from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta
db_connection = DbmsConnectionInfo(
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"],
aura_instance_id=os.environ["AURA_INSTANCEID"], # from Aura Console URL
)
gds = sessions.get_or_create(
session_name="my-analysis",
memory=memory,
db_connection=db_connection,
ttl=timedelta(hours=2),
)
gds.verify_connectivity()Mode B — Self-managed Neo4j:
python
db_connection = DbmsConnectionInfo(
uri=os.environ["NEO4J_URI"], # e.g. "bolt://my-server:7687"
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
session_name="my-analysis-sm",
memory=SessionMemory.m_8GB,
db_connection=db_connection,
ttl=timedelta(hours=2),
cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()Mode C — Standalone (no Neo4j DB):
python
gds = sessions.get_or_create(
session_name="my-standalone",
memory=SessionMemory.m_4GB,
ttl=timedelta(hours=1),
cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()get_or_create()模式A — 连接AuraDB:
python
from graphdatascience.session import DbmsConnectionInfo, SessionMemory, CloudLocation
from datetime import timedelta
db_connection = DbmsConnectionInfo(
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"],
aura_instance_id=os.environ["AURA_INSTANCEID"], # 来自Aura控制台URL
)
gds = sessions.get_or_create(
session_name="my-analysis",
memory=memory,
db_connection=db_connection,
ttl=timedelta(hours=2),
)
gds.verify_connectivity()模式B — 自托管Neo4j:
python
db_connection = DbmsConnectionInfo(
uri=os.environ["NEO4J_URI"], # 例如 "bolt://my-server:7687"
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"],
)
gds = sessions.get_or_create(
session_name="my-analysis-sm",
memory=SessionMemory.m_8GB,
db_connection=db_connection,
ttl=timedelta(hours=2),
cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()模式C — 独立模式(无Neo4j数据库):
python
gds = sessions.get_or_create(
session_name="my-standalone",
memory=SessionMemory.m_4GB,
ttl=timedelta(hours=1),
cloud_location=CloudLocation("gcp", "europe-west1"),
)
gds.verify_connectivity()get_or_create()Step 4 — Project Graph
步骤4 — 投影图
From connected Neo4j (remote projection):
python
G, result = gds.graph.project(
"my-graph",
"""
CALL () {
MATCH (p:Person)
OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
RETURN p AS source, r AS rel, p2 AS target,
p {.age, .score} AS sourceNodeProperties,
p2 {.age, .score} AS targetNodeProperties
}
RETURN gds.graph.project.remote(source, target, {
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
sourceNodeProperties: sourceNodeProperties,
targetNodeProperties: targetNodeProperties,
relationshipType: type(rel)
})
""",
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")CALL () { ... }UNIONCALLFrom Pandas DataFrames (standalone mode):
python
import pandas as pd
nodes_df = pd.DataFrame([
{"nodeId": 0, "labels": "Person", "age": 30},
{"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
{"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])
G = gds.graph.construct("my-graph", nodes_df, rels_df)从已连接的Neo4j(远程投影):
python
G, result = gds.graph.project(
"my-graph",
"""
CALL () {
MATCH (p:Person)
OPTIONAL MATCH (p)-[r:KNOWS]->(p2:Person)
RETURN p AS source, r AS rel, p2 AS target,
p {.age, .score} AS sourceNodeProperties,
p2 {.age, .score} AS targetNodeProperties
}
RETURN gds.graph.project.remote(source, target, {
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
sourceNodeProperties: sourceNodeProperties,
targetNodeProperties: targetNodeProperties,
relationshipType: type(rel)
})
""",
)
print(f"Projected {G.node_count()} nodes, {G.relationship_count()} relationships")多模式MATCH需要使用。如需处理多个标签/关系类型,请在内使用。
CALL () { ... }CALLUNION从Pandas DataFrames(独立模式):
python
import pandas as pd
nodes_df = pd.DataFrame([
{"nodeId": 0, "labels": "Person", "age": 30},
{"nodeId": 1, "labels": "Person", "age": 25},
])
rels_df = pd.DataFrame([
{"sourceNodeId": 0, "targetNodeId": 1, "relationshipType": "KNOWS"},
])
G = gds.graph.construct("my-graph", nodes_df, rels_df)Multiple DataFrames: gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])
多个DataFrames:gds.graph.construct("g", [nodes1, nodes2], [rels1, rels2])
Required columns — nodes: `nodeId` (int), `labels` (str). Relationships: `sourceNodeId`, `targetNodeId`, `relationshipType`. String node properties not supported — drop before `construct()`.
必填列——节点:`nodeId`(整数)、`labels`(字符串)。关系:`sourceNodeId`、`targetNodeId`、`relationshipType`。不支持字符串类型的节点属性——在调用`construct()`前删除此类列。Step 5 — Run Algorithms
步骤5 — 运行算法
python
undefinedpython
undefinedMutate — chain results without writing to DB
Mutate — 链式处理结果,无需写入数据库
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85)
gds.fastRP.mutate(G,
mutateProperty="embedding",
embeddingDimension=128,
featureProperties=["pagerank"],
randomSeed=42,
)
gds.pageRank.mutate(G, mutateProperty="pagerank", dampingFactor=0.85)
gds.fastRP.mutate(G,
mutateProperty="embedding",
embeddingDimension=128,
featureProperties=["pagerank"],
randomSeed=42,
)
Stream — inspect results as DataFrame
Stream — 以DataFrame形式查看结果
df = gds.pageRank.stream(G)
print(df.sort_values("score", ascending=False).head(10))
df = gds.pageRank.stream(G)
print(df.sort_values("score", ascending=False).head(10))
Write — persist to connected Neo4j DB (connected modes only)
Write — 持久化到已连接的Neo4j数据库(仅连接模式可用)
gds.louvain.write(G, writeProperty="community")
All GDS algorithms work in AGA except topological link prediction. See `neo4j-gds-skill` for the full algorithm reference.gds.louvain.write(G, writeProperty="community")
除拓扑链接预测外,所有GDS算法均可在AGA中运行。完整算法参考请参见`neo4j-gds-skill`。Step 6 — Async Job Polling
步骤6 — 异步任务轮询
Algorithm calls may return a job handle for long-running computations. Poll until done:
python
import time
job = gds.pageRank.mutate(G, mutateProperty="pagerank")对于长时间运行的计算,算法调用可能会返回任务句柄。需轮询直至完成:
python
import time
job = gds.pageRank.mutate(G, mutateProperty="pagerank")If job object returned (async mode), poll explicitly:
如果返回任务对象(异步模式),需显式轮询:
if hasattr(job, "status"):
while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"):
time.sleep(5)
print(f"Job status: {job.status()}")
if job.status() != "RUNNING_DONE":
raise RuntimeError(f"Algorithm job failed: {job.status()}")
Do NOT assume immediate completion on large graphs. Check `.status()` before reading results.if hasattr(job, "status"):
while job.status() not in ("RUNNING_DONE", "FAILED", "CANCELLED"):
time.sleep(5)
print(f"Job status: {job.status()}")
if job.status() != "RUNNING_DONE":
raise RuntimeError(f"Algorithm job failed: {job.status()}")
切勿假设大型图的计算会立即完成。读取结果前请检查`.status()`。Step 7 — Retrieve Results
步骤7 — 检索结果
python
undefinedpython
undefinedStream node properties — one column per property
流式获取节点属性——每个属性对应一列
result_df = gds.graph.nodeProperties.stream(
G,
node_properties=["pagerank", "embedding"],
separate_property_columns=True,
db_node_properties=["name"], # pull from connected DB for context (connected modes only)
)
result_df.head(10)
Standalone mode — no `db_node_properties`; join back to source DataFrame:
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")result_df = gds.graph.nodeProperties.stream(
G,
node_properties=["pagerank", "embedding"],
separate_property_columns=True,
db_node_properties=["name"], # 从已连接的数据库拉取上下文信息(仅连接模式可用)
)
result_df.head(10)
独立模式——无`db_node_properties`;需与源DataFrame合并:
```python
result_df = gds.graph.nodeProperties.stream(G, ["pagerank"], separate_property_columns=True)
result_df.merge(nodes_df[["nodeId", "name"]], how="left")Step 8 — Write Back and Clean Up
步骤8 — 写回与清理
python
undefinedpython
undefinedWrite multiple node properties to connected Neo4j
将多个节点属性写入已连接的Neo4j
gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])
gds.graph.nodeProperties.write(G, ["pagerank", "embedding"])
Write relationship properties
写入关系属性
gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])
gds.graph.relationshipProperties.write(G, G.relationship_types(), ["score"])
Run Cypher against connected DB from within session
在会话内对已连接的数据库运行Cypher
gds.run_cypher("MATCH (n:Person) RETURN count(n)")
gds.run_cypher("MATCH (n:Person) RETURN count(n)")
Drop projected graph (frees session memory)
删除投影图(释放会话内存)
G.drop()
G.drop()
Delete session — stops billing
删除会话——停止计费
sessions.delete(session_name="my-analysis")
sessions.delete(session_name="my-analysis")
or: gds.delete()
或:gds.delete()
Write before deleting — results not written back are lost when session closes.
删除会话前务必写入结果——未写回的结果会在会话关闭后丢失。Session Management
会话管理
python
undefinedpython
undefinedList active sessions
列出活跃会话
from pandas import DataFrame
DataFrame(sessions.list())
from pandas import DataFrame
DataFrame(sessions.list())
Reconnect to existing session
重新连接到现有会话
gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)
---gds = sessions.get_or_create(session_name="my-analysis", memory=..., db_connection=...)
---Common Errors
常见错误
| Error | Cause | Fix |
|---|---|---|
| Wrong | Regenerate in Aura Console → Account → API credentials |
| Session expired (TTL exceeded) or name typo | |
| Projection dropped or session reconnected without re-projecting | Re-run |
Algorithm job | Memory limit exceeded or unsupported algorithm | Increase |
| Graph larger than estimated | Re-estimate with actual counts; pick next tier up |
| Results empty after session reconnect | Results not written before session was closed | Always write/stream before |
| String column in nodes DataFrame | Drop string columns before |
| AGA feature not activated | Enable in Aura Console → project settings |
| 错误 | 原因 | 修复方案 |
|---|---|---|
| | 在Aura控制台 → 账户 → API凭证中重新生成 |
| 会话过期(超过TTL)或名称拼写错误 | 使用 |
| 投影已删除或重新连接会话后未重新投影 | 重新运行 |
算法任务 | 内存限制超出或算法不被支持 | 增大 |
| 图大小超出估算值 | 使用实际数量重新估算;选择更高层级 |
| 重新连接会话后结果为空 | 会话关闭前未写入结果 | 务必在 |
| 节点DataFrame中存在字符串列 | 在调用 |
| AGA功能未激活 | 在Aura控制台 → 项目设置中启用 |
References
参考资料
Load on demand:
- references/workflows.md — full AuraDB and standalone workflow examples, Spark integration
- references/limitations.md — AGA vs embedded GDS feature table, SessionMemory tiers, cloud locations
按需加载:
- references/workflows.md — 完整的AuraDB和独立工作流示例、Spark集成
- references/limitations.md — AGA与嵌入式GDS功能对比表、SessionMemory层级、云区域
WebFetch
Web资源
| Need | URL |
|---|---|
| AGA Python client docs | |
| AuraDB tutorial notebook | |
| GDS algorithm reference | |
| 需求 | URL |
|---|---|
| AGA Python客户端文档 | |
| AuraDB教程笔记本 | |
| GDS算法参考 | |
Checklist
检查清单
- Aura API credentials created and set in environment (,
AURA_CLIENT_ID)AURA_CLIENT_SECRET - AGA feature enabled for Aura project (Aura Console → project settings)
- Memory estimated before session creation ()
sessions.estimate(...) - Cloud location chosen near data source
- called after session creation
gds.verify_connectivity() - TTL set to avoid unexpected costs on idle sessions
- Async algorithm jobs polled until before reading results
RUNNING_DONE - Results written back (connected modes) or streamed and persisted (standalone) before deletion
- Session deleted when done (or
sessions.delete(...))gds.delete()
- 已创建Aura API凭证并设置到环境变量中(、
AURA_CLIENT_ID)AURA_CLIENT_SECRET - Aura项目已启用AGA功能(Aura控制台 → 项目设置)
- 创建会话前已估算内存()
sessions.estimate(...) - 已选择靠近数据源的云区域
- 创建会话后已调用
gds.verify_connectivity() - 已设置TTL以避免空闲会话产生意外费用
- 异步算法任务已轮询至后再读取结果
RUNNING_DONE - 删除会话前已写回结果(连接模式)或流式获取并持久化(独立模式)
- 使用完毕已删除会话(或
sessions.delete(...))gds.delete()