neo4j-graphrag-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Neo4j GraphRAG Skill

Neo4j GraphRAG 技能指南

When to Use

适用场景

  • Building GraphRAG retrieval pipelines with
    neo4j-graphrag
    Python package
  • Choosing between VectorRetriever, HybridRetriever, VectorCypherRetriever, HybridCypherRetriever
  • Writing
    retrieval_query
    Cypher fragments that traverse the graph after vector lookup
  • Wiring retriever + LLM into a
    GraphRAG
    pipeline
  • Debugging low retrieval quality (when to use graph traversal vs plain vector)
  • Integrating Neo4j with LangChain (
    langchain-neo4j
    ), LlamaIndex, or Haystack
  • 使用
    neo4j-graphrag
    Python包构建GraphRAG检索管道
  • 在VectorRetriever、HybridRetriever、VectorCypherRetriever、HybridCypherRetriever之间选择合适的检索器
  • 编写向量查找后遍历图谱的
    retrieval_query
    Cypher片段
  • 将检索器与LLM集成到
    GraphRAG
    管道中
  • 调试检索质量低下的问题(判断何时使用图谱遍历而非纯向量检索)
  • 将Neo4j与LangChain(
    langchain-neo4j
    )、LlamaIndex或Haystack集成

When NOT to Use

不适用场景

  • KG construction from documents
    neo4j-document-import-skill
  • Plain vector/semantic search without graph traversal
    neo4j-vector-index-skill
  • GDS algorithms (PageRank, Louvain, node embeddings)
    neo4j-gds-skill
  • Agent long-term memory
    neo4j-agent-memory-skill
  • Writing raw Cypher queries
    neo4j-cypher-skill

  • 从文档构建知识图谱(KG) → 使用
    neo4j-document-import-skill
  • 无图谱遍历的纯向量/语义搜索 → 使用
    neo4j-vector-index-skill
  • GDS算法(PageRank、Louvain、节点嵌入) → 使用
    neo4j-gds-skill
  • Agent长期内存 → 使用
    neo4j-agent-memory-skill
  • 编写原生Cypher查询 → 使用
    neo4j-cypher-skill

Step 1 — Install

步骤1 — 安装

bash
pip install neo4j-graphrag
bash
pip install neo4j-graphrag

LLM/embedder extras (choose one or more):

LLM/嵌入器扩展包(选择一个或多个):

pip install neo4j-graphrag[openai] # OpenAI + AzureOpenAI pip install neo4j-graphrag[google] # VertexAI pip install neo4j-graphrag[anthropic] # Anthropic pip install neo4j-graphrag[ollama] # Ollama (local) pip install neo4j-graphrag[cohere] # Cohere pip install neo4j-graphrag[sentence-transformers] # local embeddings
pip install neo4j-graphrag[openai] # OpenAI + AzureOpenAI pip install neo4j-graphrag[google] # VertexAI pip install neo4j-graphrag[anthropic] # Anthropic pip install neo4j-graphrag[ollama] # Ollama(本地部署) pip install neo4j-graphrag[cohere] # Cohere pip install neo4j-graphrag[sentence-transformers] # 本地嵌入模型

BREAKING: old package
neo4j-genai
is deprecated — imports also changed:

重大变更:旧包
neo4j-genai
已弃用——导入路径也已更改:

pip uninstall neo4j-genai
pip uninstall neo4j-genai

neo4j_genai.retrievers → neo4j_graphrag.retrievers

neo4j_genai.retrievers → neo4j_graphrag.retrievers

neo4j_genai.generation → neo4j_graphrag.generation

neo4j_genai.generation → neo4j_graphrag.generation


Requires: Python ≥ 3.10, Neo4j ≥ 5.18.1 or Aura ≥ 5.18.0.

---

要求:Python ≥ 3.10,Neo4j ≥ 5.18.1 或 Aura ≥ 5.18.0。

---

Step 2 — Choose Retriever

步骤2 — 选择检索器

Has fulltext index? YES → Hybrid variants (better recall)
                   NO  → Vector variants (baseline)

Needs graph context after vector lookup? YES → Cypher variants
                                         NO  → plain variants

For natural-language-to-Cypher? → Text2CypherRetriever (no embedder needed)
For multi-tool LLM routing?     → ToolsRetriever
Using external vector DB?       → WeaviateNeo4jRetriever / PineconeNeo4jRetriever / QdrantNeo4jRetriever
RetrieverVectorFulltextGraphWhen to use
VectorRetriever
Baseline; quick start
HybridRetriever
Better recall; no graph context
VectorCypherRetriever
GraphRAG without fulltext
HybridCypherRetriever
Production GraphRAG — default choice
Text2CypherRetriever
LLM generates Cypher; no embedder
ToolsRetriever
variesvariesvariesMulti-retriever LLM routing

是否有全文索引?是 → 混合变体(召回率更高)
                   否 → 向量变体(基线版本)

向量查找后是否需要图谱上下文?是 → Cypher变体
                                         否 → 普通变体

需要自然语言转Cypher? → Text2CypherRetriever(无需嵌入器)
需要多工具LLM路由? → ToolsRetriever
使用外部向量数据库? → WeaviateNeo4jRetriever / PineconeNeo4jRetriever / QdrantNeo4jRetriever
检索器向量检索全文检索图谱遍历使用场景
VectorRetriever
基线版本;快速入门
HybridRetriever
召回率更高;无需图谱上下文
VectorCypherRetriever
无全文检索的GraphRAG
HybridCypherRetriever
生产级GraphRAG——默认选择
Text2CypherRetriever
LLM生成Cypher;无需嵌入器
ToolsRetriever
视情况而定视情况而定视情况而定多检索器LLM路由

Step 3 — Create Indexes (run once)

步骤3 — 创建索引(仅需运行一次)

cypher
// Vector index (all retrievers need this)
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
  `vector.dimensions`: 1536,
  `vector.similarity_function`: 'cosine'
} };

// Fulltext index (Hybrid retrievers only)
CREATE FULLTEXT INDEX chunk_fulltext IF NOT EXISTS
FOR (c:Chunk) ON EACH [c.text];

// Confirm ONLINE before ingesting:
SHOW INDEXES YIELD name, state
WHERE name IN ['chunk_embedding', 'chunk_fulltext']
RETURN name, state;
// Both must show state = 'ONLINE'
If index not ONLINE: wait, poll every 5s. Do NOT start ingestion until ONLINE.

cypher
// 向量索引(所有检索器都需要)
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
  `vector.dimensions`: 1536,
  `vector.similarity_function`: 'cosine'
} };

// 全文索引(仅混合检索器需要)
CREATE FULLTEXT INDEX chunk_fulltext IF NOT EXISTS
FOR (c:Chunk) ON EACH [c.text];

// 确认索引处于ONLINE状态后再进行数据导入:
SHOW INDEXES YIELD name, state
WHERE name IN ['chunk_embedding', 'chunk_fulltext']
RETURN name, state;
// 两个索引的状态都必须显示为'ONLINE'
如果索引未处于ONLINE状态:请等待,每5秒检查一次。在索引变为ONLINE前不要开始数据导入。

Step 4 — Core Pattern (HybridCypherRetriever)

步骤4 — 核心模式(HybridCypherRetriever)

python
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM

driver = GraphDatabase.driver("neo4j+s://<host>:7687", auth=("neo4j", "<password>"))
embedder = OpenAIEmbeddings(model="text-embedding-3-small")  # 1536 dims — match index
python
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM

driver = GraphDatabase.driver("neo4j+s://<host>:7687", auth=("neo4j", "<password>"))
embedder = OpenAIEmbeddings(model="text-embedding-3-small")  # 1536维度——需与索引匹配

// retrieval_query:向量查找后执行的Cypher片段。
// `node` = 向量索引匹配到的节点 (自动注入——请勿声明)
// `score` = 相似度浮点值 (自动注入——请勿声明)
// 必须包含RETURN子句,且必须返回`score`列。
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)
OPTIONAL MATCH (article)-[:MENTIONS]->(org:Organization)
RETURN node.text AS chunk_text,
       article.title AS article_title,
       collect(DISTINCT org.name) AS mentioned_organizations,
       score
"""

retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name="chunk_embedding",
    fulltext_index_name="chunk_fulltext",
    retrieval_query=retrieval_query,
    embedder=embedder,
)

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)

response = rag.search(query_text="Who does Alice work for?", retriever_config={"top_k": 5})
print(response.answer)

retrieval_query: Cypher fragment executed after vector lookup.

步骤5 — query_params(参数化retrieval_query)

node
= matched node from vector index (AUTO-INJECTED — do NOT declare)

score
= similarity float (AUTO-INJECTED — do NOT declare)

MUST include RETURN clause. MUST return
score
column.

retrieval_query = """ MATCH (node)<-[:HAS_CHUNK]-(article:Article) OPTIONAL MATCH (article)-[:MENTIONS]->(org:Organization) RETURN node.text AS chunk_text, article.title AS article_title, collect(DISTINCT org.name) AS mentioned_organizations, score """
retriever = HybridCypherRetriever( driver=driver, vector_index_name="chunk_embedding", fulltext_index_name="chunk_fulltext", retrieval_query=retrieval_query, embedder=embedder, )
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0}) rag = GraphRAG(retriever=retriever, llm=llm)
response = rag.search(query_text="Who does Alice work for?", retriever_config={"top_k": 5}) print(response.answer)

---
通过
retriever_config
将运行时参数传入
retrieval_query
python
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)-[:MENTIONS]->(org:Organization)
WHERE org.name = $entity_name
RETURN node.text AS chunk_text, article.title AS title, score
"""

retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunk_embedding",
    retrieval_query=retrieval_query,
    embedder=embedder,
)

// 在每次搜索时通过retriever_config传递query_params:
response = rag.search(
    query_text="What happened at Apple?",
    retriever_config={"top_k": 10, "query_params": {"entity_name": "Apple"}},
)

// 直接调用检索器(无需GraphRAG包装器):
results = retriever.search(
    query_text="What happened at Apple?",
    top_k=10,
    query_params={"entity_name": "Apple"},
)

Step 5 — query_params (Parameterized retrieval_query)

步骤6 — 过滤器(向量搜索前预过滤)

Pass runtime parameters into
retrieval_query
via
retriever_config
:
python
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)-[:MENTIONS]->(org:Organization)
WHERE org.name = $entity_name
RETURN node.text AS chunk_text, article.title AS title, score
"""

retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunk_embedding",
    retrieval_query=retrieval_query,
    embedder=embedder,
)
python
// 过滤器会在向量相似度排序前缩小候选池范围
results = retriever.search(
    query_text="quarterly results",
    top_k=5,
    filters={"date": {"$gte": "2024-01-01"}},
)
// 支持的操作符:$eq $ne $lt $lte $gt $gte $between $in $like $ilike

Pass query_params inside retriever_config on each search:

步骤7 — VectorRetriever(return_properties)

response = rag.search( query_text="What happened at Apple?", retriever_config={"top_k": 10, "query_params": {"entity_name": "Apple"}}, )
python
from neo4j_graphrag.retrievers import VectorRetriever

retriever = VectorRetriever(
    driver=driver,
    index_name="chunk_embedding",
    embedder=embedder,
    return_properties=["text", "source", "page_number"],  // 节点属性的子集
)
// 无需retrieval_query——直接返回节点属性

Direct retriever call (without GraphRAG wrapper):

步骤8 — Text2CypherRetriever(无需嵌入器)

results = retriever.search( query_text="What happened at Apple?", top_k=10, query_params={"entity_name": "Apple"}, )

---
python
from neo4j_graphrag.retrievers import Text2CypherRetriever

// LLM从自然语言生成Cypher;无需向量索引
retriever = Text2CypherRetriever(
    driver=driver,
    llm=OpenAILLM(model_name="gpt-4o"),
    neo4j_schema=None,   // 从数据库自动获取;或传入字符串
    examples=["Q: Who works at Neo4j? A: MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Neo4j'}) RETURN p.name"],
)
results = retriever.search(query_text="Which people work at Neo4j?")
如果
neo4j_schema=None
:检索器会自动获取模式。对于大型模式,传入精简后的字符串以减少LLM提示词长度。

Step 6 — Filters (Pre-filter before vector search)

步骤9 — 自定义提示词模板

python
undefined
python
from neo4j_graphrag.generation.prompts import RagTemplate

custom_template = RagTemplate(
    template="""仅使用以下上下文回答问题。
Context: {context}
Question: {query_text}
Answer:""",
    expected_inputs=["context", "query_text"],
)

rag = GraphRAG(retriever=retriever, llm=llm, prompt_template=custom_template)

Filter reduces candidate pool BEFORE vector similarity ranking

常见错误

results = retriever.search( query_text="quarterly results", top_k=5, filters={"date": {"$gte": "2024-01-01"}}, )
错误原因修复方案
ModuleNotFoundError: neo4j_genai
安装了旧版本包执行
pip uninstall neo4j-genai && pip install neo4j-graphrag
retrieval_query
返回0行结果
缺少
MATCH
子句或关系方向错误
在Cypher前添加
EXPLAIN
前缀;使用
CALL db.schema.visualization()
验证节点/关系名称
结果中出现
KeyError: 'score'
retrieval_query
的RETURN子句中缺少
score
在每个
retrieval_query
的RETURN子句中添加
score
score
变量未找到
在Cypher中声明了
score
变量
删除该声明——
score
是自动注入的;请勿重新声明
node
变量未找到
retrieval_query中使用了错误的变量名必须使用小写的
node
;由检索器自动注入
嵌入维度不匹配创建索引时使用了不同的维度删除索引,使用正确的
vector.dimensions
重新创建,并重新嵌入所有文本块
IndexNotFoundError
索引名称拼写错误或索引未处于ONLINE状态执行
SHOW INDEXES YIELD name, state
——验证名称和状态是否为ONLINE
混合搜索召回率低全文索引未关联正确的属性全文索引必须覆盖retrieval_query中
node.text
对应的属性
perform_entity_resolution
运行缓慢
语料库较大且包含大量实体初始测试时设置
perform_entity_resolution=False
;生产环境中启用
TypeError: coroutine
调用
pipeline.run_async()
时未使用
await
/
asyncio.run()
使用
asyncio.run(pipeline.run_async(...))
包裹调用
管道运行后知识图谱为空
on_error="IGNORE"
掩盖了提取失败
临时设置
on_error="RAISE"
以查看LLM提取错误

Supported operators: $eq $ne $lt $lte $gt $gte $between $in $like $ilike

嵌入器快速参考


---
python
from neo4j_graphrag.embeddings import (
    OpenAIEmbeddings,           // OpenAI text-embedding-3-*系列
    AzureOpenAIEmbeddings,      // Azure托管的OpenAI
    VertexAIEmbeddings,         // Google Vertex AI
    MistralAIEmbeddings,        // Mistral
    CohereEmbeddings,           // Cohere embed-v3
    OllamaEmbeddings,           // 通过Ollama本地部署
    SentenceTransformerEmbeddings,  // 本地HuggingFace模型
)

// 维度映射(必须与向量索引匹配):
// text-embedding-3-small → 1536
// text-embedding-3-large → 3072
// text-embedding-ada-0021536
// all-MiniLM-L6-v2       → 384
所有嵌入器均包含自动限流功能,采用指数退避策略。

Step 7 — VectorRetriever (return_properties)

LLM快速参考

python
from neo4j_graphrag.retrievers import VectorRetriever

retriever = VectorRetriever(
    driver=driver,
    index_name="chunk_embedding",
    embedder=embedder,
    return_properties=["text", "source", "page_number"],  # subset of node props
)
python
from neo4j_graphrag.llm import (
    OpenAILLM,
    AzureOpenAILLM,
    AnthropicLLM,
    VertexAILLM,
    MistralAILLM,
    CohereLLM,
    OllamaLLM,
)
// GraphRAG也接受任何LangChain聊天模型

No retrieval_query needed — returns node properties directly

GraphRAG.search()完整签名


---
python
response = rag.search(
    query_text="...",
    retriever_config={
        "top_k": 5,              // 每次搜索的候选数量(默认5        "query_params": {...},   // 传递给retrieval_query Cypher的参数
        "filters": {...},        // 向量搜索前的预过滤条件
    },
    return_context=False,        // 设为True:在响应中包含检索到的文本块
    response_fallback="No context found.",  // 检索器无结果时返回的内容
)
// response.answer → 字符串类型
// response.retriever_result → RawSearchResult类型(当return_context=True时返回)

Step 8 — Text2CypherRetriever (no embedder)

故障排查

python
from neo4j_graphrag.retrievers import Text2CypherRetriever
  • 检索返回0结果:直接调用
    retriever.search()
    (跳过LLM);检查
    top_k
    、索引名称、嵌入维度
  • LLM产生幻觉:减小
    top_k
    ,优化
    retrieval_query
    以返回更具体的上下文
  • 查询缓慢:在
    retrieval_query
    中对耗时的扩展操作添加
    LIMIT
    ;使用
    filters
    预减少候选数量
  • 嵌入维度不匹配:执行
    SHOW INDEXES YIELD name, options
    ——检查
    vector.dimensions

LLM generates Cypher from natural language; no vector index needed

参考资料

retriever = Text2CypherRetriever( driver=driver, llm=OpenAILLM(model_name="gpt-4o"), neo4j_schema=None, # auto-fetched from db; or pass string examples=["Q: Who works at Neo4j? A: MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Neo4j'}) RETURN p.name"], ) results = retriever.search(query_text="Which people work at Neo4j?")

If `neo4j_schema=None`: retriever fetches schema automatically. For large schemas, pass a trimmed string to reduce LLM prompt size.

---

Step 9 — Custom Prompt Template

检查清单

python
from neo4j_graphrag.generation.prompts import RagTemplate

custom_template = RagTemplate(
    template="""Answer the question using ONLY the context below.
Context: {context}
Question: {query_text}
Answer:""",
    expected_inputs=["context", "query_text"],
)

rag = GraphRAG(retriever=retriever, llm=llm, prompt_template=custom_template)

  • 已卸载
    neo4j-genai
    ;已安装
    neo4j-graphrag
    ;已更新导入路径
  • 向量索引在导入数据或查询前已处于ONLINE状态
  • 使用混合检索器时,全文索引已处于ONLINE状态
  • 嵌入维度与索引配置中的
    vector.dimensions
    匹配
  • retrieval_query
    的RETURN子句包含
    node
    score
    (两者均为必填)
  • retrieval_query
    中未重新声明
    node
    score
    ——它们是自动注入的
  • query_params
    通过
    retriever_config
    或直接
    retriever.search()
    参数传递
  • rag.search()
    中设置了
    retriever_config={"top_k": N}
    (默认5)
  • 凭证存储在环境变量中;从未硬编码

Common Errors

ErrorCauseFix
ModuleNotFoundError: neo4j_genai
Old package installed
pip uninstall neo4j-genai && pip install neo4j-graphrag
retrieval_query
returns 0 rows
Missing
MATCH
or wrong rel direction
Add
EXPLAIN
prefix; verify node/rel names with
CALL db.schema.visualization()
KeyError: 'score'
in results
retrieval_query
missing
score
in RETURN
Add
score
to every
retrieval_query
RETURN clause
score
variable not found
Declared
score
as Cypher variable
Remove it —
score
is auto-injected; never re-declare
node
variable not found
Wrong variable name in retrieval_queryUse exactly
node
(lowercase); auto-injected by retriever
Embedding dimension mismatchIndex created with different dimsDrop index, recreate with correct
vector.dimensions
, re-embed all chunks
IndexNotFoundError
Index name typo or index not ONLINE
SHOW INDEXES YIELD name, state
— verify name and state=ONLINE
Low recall on hybrid searchFulltext index not on right propertyFulltext index must cover same property as
node.text
in retrieval_query
perform_entity_resolution
slow
Large corpus with many entitiesSet
perform_entity_resolution=False
for initial testing; enable in production
TypeError: coroutine
Calling
pipeline.run_async()
without
await
/
asyncio.run()
Wrap in
asyncio.run(pipeline.run_async(...))
Empty KG after pipeline run
on_error="IGNORE"
masks extraction failures
Temporarily set
on_error="RAISE"
to see LLM extraction errors

Embedder Quick Reference

python
from neo4j_graphrag.embeddings import (
    OpenAIEmbeddings,           # OpenAI text-embedding-3-*
    AzureOpenAIEmbeddings,      # Azure-hosted OpenAI
    VertexAIEmbeddings,         # Google Vertex AI
    MistralAIEmbeddings,        # Mistral
    CohereEmbeddings,           # Cohere embed-v3
    OllamaEmbeddings,           # Local via Ollama
    SentenceTransformerEmbeddings,  # Local HuggingFace
)

Dimension mapping (must match vector index):

text-embedding-3-small → 1536

text-embedding-3-large → 3072

text-embedding-ada-002 → 1536

all-MiniLM-L6-v2 → 384


All embedders include automatic rate limiting with exponential backoff.

---

LLM Quick Reference

python
from neo4j_graphrag.llm import (
    OpenAILLM,
    AzureOpenAILLM,
    AnthropicLLM,
    VertexAILLM,
    MistralAILLM,
    CohereLLM,
    OllamaLLM,
)

Any LangChain chat model also accepted by GraphRAG


---

GraphRAG.search() Full Signature

python
response = rag.search(
    query_text="...",
    retriever_config={
        "top_k": 5,              # candidates per search (default 5)
        "query_params": {...},   # passed to retrieval_query Cypher
        "filters": {...},        # pre-filter before vector search
    },
    return_context=False,        # True: include retrieved chunks in response
    response_fallback="No context found.",  # returned when retriever yields nothing
)

response.answer → str

response.retriever_result → RawSearchResult (if return_context=True)


---

Failure Recovery

  • 0 results from retrieval: run
    retriever.search()
    directly (skip LLM); check
    top_k
    , index name, embedding dims
  • LLM hallucinating: reduce
    top_k
    , improve
    retrieval_query
    to return more specific context
  • Slow queries: add
    LIMIT
    inside
    retrieval_query
    on expensive expansions; use
    filters
    to pre-reduce candidates
  • Embedding dimension mismatch:
    SHOW INDEXES YIELD name, options
    — check
    vector.dimensions

References


Checklist

  • neo4j-genai
    uninstalled;
    neo4j-graphrag
    installed; import paths updated
  • Vector index ONLINE before ingesting or querying
  • Fulltext index ONLINE if using Hybrid retriever
  • Embedding dims match
    vector.dimensions
    in index config
  • retrieval_query
    includes
    node
    and
    score
    in RETURN clause (both required)
  • node
    and
    score
    NOT re-declared in
    retrieval_query
    — auto-injected
  • query_params
    passed via
    retriever_config
    or direct
    retriever.search()
    arg
  • retriever_config={"top_k": N}
    set on
    rag.search()
    (default 5)
  • Credentials in env vars; never hardcoded