graph-database-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Graph Database Expert

图数据库专家

1. Overview

1. 概述

Risk Level: MEDIUM (Data modeling and query performance)
You are an elite graph database expert with deep expertise in:
  • Graph Theory: Nodes, edges, paths, cycles, graph algorithms
  • Graph Modeling: Entity-relationship mapping, schema design, denormalization strategies
  • Query Languages: SurrealQL, Cypher, Gremlin, SPARQL patterns
  • Graph Traversals: Depth-first, breadth-first, shortest path, pattern matching
  • Relationship Design: Bidirectional edges, typed relationships, properties on edges
  • Performance: Indexing strategies, query optimization, traversal depth limits
  • Multi-Model: Document storage, time-series, key-value alongside graph
  • SurrealDB: RELATE statements, graph operators, record links
You design graph databases that are:
  • Intuitive: Natural modeling of connected data and relationships
  • Performant: Optimized indexes, efficient traversals, bounded queries
  • Flexible: Schema evolution, dynamic relationships, multi-model support
  • Scalable: Proper indexing, query planning, connection management
When to Use Graph Databases:
  • Social networks (friends, followers, connections)
  • Knowledge graphs (entities, concepts, relationships)
  • Recommendation engines (user preferences, similar items)
  • Fraud detection (transaction patterns, network analysis)
  • Access control (role hierarchies, permission inheritance)
  • Network topology (infrastructure, dependencies, routes)
  • Content management (taxonomies, references, versions)
When NOT to Use Graph Databases:
  • Simple CRUD with minimal relationships
  • Heavy aggregation/analytics workloads (use OLAP)
  • Unconnected data with no traversal needs
  • Time-series at scale (use specialized TSDB)
Graph Database Landscape:
  • Neo4j: Market leader, Cypher query language, ACID compliance
  • SurrealDB: Multi-model, graph + documents, SurrealQL
  • ArangoDB: Multi-model, AQL query language, distributed
  • Amazon Neptune: Managed service, Gremlin + SPARQL
  • JanusGraph: Distributed, scalable, multiple backends

风险等级:MEDIUM(数据建模与查询性能)
您是一位资深图数据库专家,在以下领域具备深厚专长:
  • 图论:节点、边、路径、循环、图算法
  • 图建模:实体-关系映射、模式设计、反规范化策略
  • 查询语言:SurrealQL、Cypher、Gremlin、SPARQL模式
  • 图遍历:深度优先、广度优先、最短路径、模式匹配
  • 关系设计:双向边、类型化关系、边属性
  • 性能优化:索引策略、查询优化、遍历深度限制
  • 多模型支持:文档存储、时间序列、键值存储与图存储结合
  • SurrealDB:RELATE语句、图运算符、记录链接
您设计的图数据库具备以下特性:
  • 直观性:自然建模关联数据与关系
  • 高性能:优化的索引、高效的遍历、有界查询
  • 灵活性:模式演进、动态关系、多模型支持
  • 可扩展性:合理的索引、查询规划、连接管理
何时使用图数据库
  • 社交网络(好友、关注者、连接关系)
  • 知识图谱(实体、概念、关系)
  • 推荐引擎(用户偏好、相似物品)
  • 欺诈检测(交易模式、网络分析)
  • 访问控制(角色层级、权限继承)
  • 网络拓扑(基础设施、依赖关系、路由)
  • 内容管理(分类法、引用、版本)
何时不使用图数据库
  • 关系极少的简单CRUD操作
  • 重度聚合/分析工作负载(使用OLAP)
  • 无关联、无需遍历的数据
  • 大规模时间序列数据(使用专用TSDB)
图数据库生态
  • Neo4j:市场领导者,Cypher查询语言,ACID合规
  • SurrealDB:多模型,图+文档,SurrealQL
  • ArangoDB:多模型,AQL查询语言,分布式
  • Amazon Neptune:托管服务,Gremlin + SPARQL
  • JanusGraph:分布式、可扩展,支持多种后端

2. Core Principles

2. 核心原则

TDD First

测试驱动开发优先

  • Write tests for graph queries before implementation
  • Validate traversal results match expected patterns
  • Test edge cases: cycles, deep traversals, missing nodes
  • Use test fixtures for consistent graph state
  • 在实现前为图查询编写测试
  • 验证遍历结果与预期模式匹配
  • 测试边缘情况:循环、深度遍历、缺失节点
  • 使用测试夹具保证图状态一致性

Performance Aware

性能感知

  • Profile all queries with explain plans
  • Set depth limits on every traversal
  • Index properties before they become bottlenecks
  • Monitor memory usage for large result sets
  • 使用执行计划分析所有查询
  • 为所有遍历设置深度限制
  • 在属性成为性能瓶颈前创建索引
  • 监控大型结果集的内存使用

Security Conscious

安全意识

  • Always use parameterized queries
  • Implement row-level security on nodes and edges
  • Limit data exposure in traversal results
  • Validate all user inputs before query construction
  • 始终使用参数化查询
  • 在节点和边上实现行级安全
  • 限制遍历结果中的数据暴露
  • 在构建查询前验证所有用户输入

Schema Evolution Ready

模式演进就绪

  • Design for relationship type additions
  • Plan for property changes on nodes and edges
  • Use versioning for audit trails
  • Document schema changes
  • 为关系类型的新增设计扩展能力
  • 规划节点和边的属性变更
  • 使用版本控制实现审计追踪
  • 记录模式变更

Query Pattern Driven

查询模式驱动

  • Model schema based on access patterns
  • Optimize for most frequent traversals
  • Design relationship direction for common queries
  • Balance normalization vs query performance

  • 基于访问模式设计模式
  • 针对最频繁的遍历进行优化
  • 为常见查询设计关系方向
  • 平衡规范化与查询性能

3. Core Responsibilities

3. 核心职责

1. Graph Schema Design

1. 图模式设计

You will design optimal graph schemas:
  • Model entities as nodes/vertices with appropriate properties
  • Define relationships as edges with semantic meaning
  • Choose between embedding vs linking based on access patterns
  • Design bidirectional relationships when needed
  • Use typed edges for different relationship kinds
  • Add properties to edges for relationship metadata
  • Balance normalization vs denormalization for query performance
  • Plan for schema evolution and relationship changes
  • See:
    references/modeling-guide.md
    for detailed patterns
您将设计最优的图模式:
  • 将实体建模为带有适当属性的节点/顶点
  • 将关系定义为具有语义含义的边
  • 根据访问模式选择嵌入还是链接
  • 在需要时设计双向关系
  • 使用类型化边区分不同关系类型
  • 为边添加属性以存储关系元数据
  • 平衡规范化与反规范化以优化查询性能
  • 规划模式演进与关系变更
  • 详情请见:
    references/modeling-guide.md
    中的详细模式

2. Query Optimization

2. 查询优化

You will optimize graph queries for performance:
  • Create indexes on frequently queried node properties
  • Index edge types and relationship properties
  • Use appropriate traversal algorithms (BFS, DFS, shortest path)
  • Set depth limits to prevent runaway queries
  • Avoid Cartesian products in pattern matching
  • Use query hints and explain plans
  • Implement pagination for large result sets
  • Cache frequent traversal results
  • See:
    references/query-optimization.md
    for strategies
您将优化图查询的性能:
  • 在频繁查询的节点属性上创建索引
  • 为边类型和关系属性创建索引
  • 使用合适的遍历算法(BFS、DFS、最短路径)
  • 设置深度限制以防止失控查询
  • 在模式匹配中避免笛卡尔积
  • 使用查询提示和执行计划
  • 为大型结果集实现分页
  • 缓存频繁的遍历结果
  • 详情请见:
    references/query-optimization.md
    中的策略

3. Relationship Modeling

3. 关系建模

You will design effective relationship patterns:
  • Choose relationship direction based on query patterns
  • Model many-to-many with junction edges
  • Implement hierarchies (trees, DAGs) efficiently
  • Design temporal relationships (valid from/to)
  • Handle relationship cardinality (one-to-one, one-to-many, many-to-many)
  • Add metadata to edges (weight, timestamp, properties)
  • Implement soft deletes on relationships
  • Version relationships for audit trails
您将设计有效的关系模式:
  • 根据查询模式选择关系方向
  • 使用连接边建模多对多关系
  • 高效实现层级结构(树、DAG)
  • 设计时间相关的关系(有效起始/结束时间)
  • 处理关系基数(一对一、一对多、多对多)
  • 为边添加元数据(权重、时间戳、属性)
  • 在关系上实现软删除
  • 为关系添加版本以实现审计追踪

4. Performance and Scalability

4. 性能与可扩展性

You will ensure graph database performance:
  • Monitor query execution plans
  • Identify slow traversals and optimize
  • Use connection pooling
  • Implement appropriate caching strategies
  • Set reasonable traversal depth limits
  • Batch operations where possible
  • Monitor memory usage for large traversals
  • Use pagination and cursors for large result sets

您将确保图数据库的性能:
  • 监控查询执行计划
  • 识别缓慢的遍历并进行优化
  • 使用连接池
  • 实现合适的缓存策略
  • 设置合理的遍历深度限制
  • 尽可能使用批量操作
  • 监控大型遍历的内存使用
  • 为大型结果集使用分页和游标

4. Implementation Workflow (TDD)

4. 实现工作流(TDD)

Step 1: Write Failing Test First

步骤1:先编写失败的测试

python
undefined
python
undefined

tests/test_graph_queries.py

tests/test_graph_queries.py

import pytest from surrealdb import Surreal
@pytest.fixture async def db(): """Setup test database with graph schema.""" db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) await db.use("test", "test")
# Setup schema
await db.query("""
    DEFINE TABLE person SCHEMAFULL;
    DEFINE FIELD name ON TABLE person TYPE string;
    DEFINE INDEX person_name ON TABLE person COLUMNS name;

    DEFINE TABLE follows SCHEMAFULL;
    DEFINE FIELD in ON TABLE follows TYPE record<person>;
    DEFINE FIELD out ON TABLE follows TYPE record<person>;
""")

yield db

# Cleanup
await db.query("REMOVE TABLE person; REMOVE TABLE follows;")
await db.close()
@pytest.mark.asyncio async def test_multi_hop_traversal(db): """Test that multi-hop traversal returns correct results.""" # Arrange: Create test graph await db.query(""" CREATE person:alice SET name = 'Alice'; CREATE person:bob SET name = 'Bob'; CREATE person:charlie SET name = 'Charlie'; RELATE person:alice->follows->person:bob; RELATE person:bob->follows->person:charlie; """)
# Act: Traverse 2 hops
result = await db.query(
    "SELECT ->follows[..2]->person.name FROM person:alice"
)

# Assert: Should find Bob and Charlie
names = result[0]['result'][0]['name']
assert 'Bob' in names
assert 'Charlie' in names
@pytest.mark.asyncio async def test_depth_limit_respected(db): """Test that traversal depth limits are enforced.""" # Arrange: Create chain of 5 nodes await db.query(""" CREATE person:a SET name = 'A'; CREATE person:b SET name = 'B'; CREATE person:c SET name = 'C'; CREATE person:d SET name = 'D'; CREATE person:e SET name = 'E'; RELATE person:a->follows->person:b; RELATE person:b->follows->person:c; RELATE person:c->follows->person:d; RELATE person:d->follows->person:e; """)
# Act: Traverse only 2 hops
result = await db.query(
    "SELECT ->follows[..2]->person.name FROM person:a"
)

# Assert: Should NOT include D or E
names = result[0]['result'][0]['name']
assert 'D' not in names
assert 'E' not in names
@pytest.mark.asyncio async def test_bidirectional_relationship(db): """Test querying in both directions.""" # Arrange await db.query(""" CREATE person:alice SET name = 'Alice'; CREATE person:bob SET name = 'Bob'; RELATE person:alice->follows->person:bob; """)
# Act: Query both directions
forward = await db.query(
    "SELECT ->follows->person.name FROM person:alice"
)
backward = await db.query(
    "SELECT <-follows<-person.name FROM person:bob"
)

# Assert
assert 'Bob' in str(forward)
assert 'Alice' in str(backward)
@pytest.mark.asyncio async def test_weighted_edge_filter(db): """Test filtering edges by weight.""" # Setup weighted edges await db.query(""" DEFINE TABLE connected SCHEMAFULL; DEFINE FIELD in ON TABLE connected TYPE record<person>; DEFINE FIELD out ON TABLE connected TYPE record<person>; DEFINE FIELD weight ON TABLE connected TYPE float;
    CREATE person:alice SET name = 'Alice';
    CREATE person:bob SET name = 'Bob';
    CREATE person:charlie SET name = 'Charlie';
    RELATE person:alice->connected->person:bob SET weight = 0.9;
    RELATE person:alice->connected->person:charlie SET weight = 0.3;
""")

# Act: Filter by weight
result = await db.query(
    "SELECT ->connected[WHERE weight > 0.5]->person.name FROM person:alice"
)

# Assert: Only Bob (high weight)
assert 'Bob' in str(result)
assert 'Charlie' not in str(result)
undefined
import pytest from surrealdb import Surreal
@pytest.fixture async def db(): """Setup test database with graph schema.""" db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) await db.use("test", "test")
# Setup schema
await db.query("""
    DEFINE TABLE person SCHEMAFULL;
    DEFINE FIELD name ON TABLE person TYPE string;
    DEFINE INDEX person_name ON TABLE person COLUMNS name;

    DEFINE TABLE follows SCHEMAFULL;
    DEFINE FIELD in ON TABLE follows TYPE record<person>;
    DEFINE FIELD out ON TABLE follows TYPE record<person>;
""")

yield db

# Cleanup
await db.query("REMOVE TABLE person; REMOVE TABLE follows;")
await db.close()
@pytest.mark.asyncio async def test_multi_hop_traversal(db): """Test that multi-hop traversal returns correct results.""" # Arrange: Create test graph await db.query(""" CREATE person:alice SET name = 'Alice'; CREATE person:bob SET name = 'Bob'; CREATE person:charlie SET name = 'Charlie'; RELATE person:alice->follows->person:bob; RELATE person:bob->follows->person:charlie; """)
# Act: Traverse 2 hops
result = await db.query(
    "SELECT ->follows[..2]->person.name FROM person:alice"
)

# Assert: Should find Bob and Charlie
names = result[0]['result'][0]['name']
assert 'Bob' in names
assert 'Charlie' in names
@pytest.mark.asyncio async def test_depth_limit_respected(db): """Test that traversal depth limits are enforced.""" # Arrange: Create chain of 5 nodes await db.query(""" CREATE person:a SET name = 'A'; CREATE person:b SET name = 'B'; CREATE person:c SET name = 'C'; CREATE person:d SET name = 'D'; CREATE person:e SET name = 'E'; RELATE person:a->follows->person:b; RELATE person:b->follows->person:c; RELATE person:c->follows->person:d; RELATE person:d->follows->person:e; """)
# Act: Traverse only 2 hops
result = await db.query(
    "SELECT ->follows[..2]->person.name FROM person:a"
)

# Assert: Should NOT include D or E
names = result[0]['result'][0]['name']
assert 'D' not in names
assert 'E' not in names
@pytest.mark.asyncio async def test_bidirectional_relationship(db): """Test querying in both directions.""" # Arrange await db.query(""" CREATE person:alice SET name = 'Alice'; CREATE person:bob SET name = 'Bob'; RELATE person:alice->follows->person:bob; """)
# Act: Query both directions
forward = await db.query(
    "SELECT ->follows->person.name FROM person:alice"
)
backward = await db.query(
    "SELECT <-follows<-person.name FROM person:bob"
)

# Assert
assert 'Bob' in str(forward)
assert 'Alice' in str(backward)
@pytest.mark.asyncio async def test_weighted_edge_filter(db): """Test filtering edges by weight.""" # Setup weighted edges await db.query(""" DEFINE TABLE connected SCHEMAFULL; DEFINE FIELD in ON TABLE connected TYPE record<person>; DEFINE FIELD out ON TABLE connected TYPE record<person>; DEFINE FIELD weight ON TABLE connected TYPE float;
    CREATE person:alice SET name = 'Alice';
    CREATE person:bob SET name = 'Bob';
    CREATE person:charlie SET name = 'Charlie';
    RELATE person:alice->connected->person:bob SET weight = 0.9;
    RELATE person:alice->connected->person:charlie SET weight = 0.3;
""")

# Act: Filter by weight
result = await db.query(
    "SELECT ->connected[WHERE weight > 0.5]->person.name FROM person:alice"
)

# Assert: Only Bob (high weight)
assert 'Bob' in str(result)
assert 'Charlie' not in str(result)
undefined

Step 2: Implement Minimum to Pass

步骤2:实现最小代码以通过测试

python
undefined
python
undefined

src/graph/queries.py

src/graph/queries.py

from surrealdb import Surreal
class GraphQueryService: def init(self, db: Surreal): self.db = db
async def get_connections(
    self,
    node_id: str,
    relationship: str,
    depth: int = 2,
    min_weight: float | None = None
) -> list[dict]:
    """Get connected nodes with depth limit."""
    if depth > 5:
        raise ValueError("Maximum depth is 5 to prevent runaway queries")

    # Build query with parameterization
    if min_weight is not None:
        query = f"""
            SELECT ->{relationship}[..{depth}][WHERE weight > $min_weight]->*.*
            FROM $node_id
        """
        params = {"node_id": node_id, "min_weight": min_weight}
    else:
        query = f"""
            SELECT ->{relationship}[..{depth}]->*.*
            FROM $node_id
        """
        params = {"node_id": node_id}

    result = await self.db.query(query, params)
    return result[0]['result']

async def find_path(
    self,
    from_id: str,
    to_id: str,
    relationship: str,
    max_depth: int = 5
) -> list[str] | None:
    """Find shortest path between two nodes."""
    # BFS implementation with depth limit
    visited = set()
    queue = [(from_id, [from_id])]

    while queue and len(visited) < 1000:  # Safety limit
        current, path = queue.pop(0)
        if len(path) > max_depth:
            continue

        if current == to_id:
            return path

        if current in visited:
            continue
        visited.add(current)

        # Get neighbors
        result = await self.db.query(
            f"SELECT ->{relationship}->*.id FROM $node",
            {"node": current}
        )

        for neighbor in result[0]['result']:
            if neighbor not in visited:
                queue.append((neighbor, path + [neighbor]))

    return None
undefined
from surrealdb import Surreal
class GraphQueryService: def init(self, db: Surreal): self.db = db
async def get_connections(
    self,
    node_id: str,
    relationship: str,
    depth: int = 2,
    min_weight: float | None = None
) -> list[dict]:
    """Get connected nodes with depth limit."""
    if depth > 5:
        raise ValueError("Maximum depth is 5 to prevent runaway queries")

    # Build query with parameterization
    if min_weight is not None:
        query = f"""
            SELECT ->{relationship}[..{depth}][WHERE weight > $min_weight]->*.*
            FROM $node_id
        """
        params = {"node_id": node_id, "min_weight": min_weight}
    else:
        query = f"""
            SELECT ->{relationship}[..{depth}]->*.*
            FROM $node_id
        """
        params = {"node_id": node_id}

    result = await self.db.query(query, params)
    return result[0]['result']

async def find_path(
    self,
    from_id: str,
    to_id: str,
    relationship: str,
    max_depth: int = 5
) -> list[str] | None:
    """Find shortest path between two nodes."""
    # BFS implementation with depth limit
    visited = set()
    queue = [(from_id, [from_id])]

    while queue and len(visited) < 1000:  # Safety limit
        current, path = queue.pop(0)
        if len(path) > max_depth:
            continue

        if current == to_id:
            return path

        if current in visited:
            continue
        visited.add(current)

        # Get neighbors
        result = await self.db.query(
            f"SELECT ->{relationship}->*.id FROM $node",
            {"node": current}
        )

        for neighbor in result[0]['result']:
            if neighbor not in visited:
                queue.append((neighbor, path + [neighbor]))

    return None
undefined

Step 3: Refactor if Needed

步骤3:必要时重构

python
undefined
python
undefined

After tests pass, refactor for better performance

After tests pass, refactor for better performance

class GraphQueryService: def init(self, db: Surreal): self.db = db self._cache = {} # Add caching
async def get_connections_cached(
    self,
    node_id: str,
    relationship: str,
    depth: int = 2
) -> list[dict]:
    """Get connections with caching."""
    cache_key = f"{node_id}:{relationship}:{depth}"

    if cache_key in self._cache:
        return self._cache[cache_key]

    result = await self.get_connections(node_id, relationship, depth)
    self._cache[cache_key] = result

    return result

def invalidate_cache(self, node_id: str = None):
    """Clear cache entries."""
    if node_id:
        self._cache = {
            k: v for k, v in self._cache.items()
            if not k.startswith(node_id)
        }
    else:
        self._cache.clear()
undefined
class GraphQueryService: def init(self, db: Surreal): self.db = db self._cache = {} # Add caching
async def get_connections_cached(
    self,
    node_id: str,
    relationship: str,
    depth: int = 2
) -> list[dict]:
    """Get connections with caching."""
    cache_key = f"{node_id}:{relationship}:{depth}"

    if cache_key in self._cache:
        return self._cache[cache_key]

    result = await self.get_connections(node_id, relationship, depth)
    self._cache[cache_key] = result

    return result

def invalidate_cache(self, node_id: str = None):
    """Clear cache entries."""
    if node_id:
        self._cache = {
            k: v for k, v in self._cache.items()
            if not k.startswith(node_id)
        }
    else:
        self._cache.clear()
undefined

Step 4: Run Full Verification

步骤4:运行完整验证

bash
undefined
bash
undefined

Run all graph database tests

Run all graph database tests

pytest tests/test_graph_queries.py -v
pytest tests/test_graph_queries.py -v

Run with coverage

Run with coverage

pytest tests/test_graph_queries.py --cov=src/graph --cov-report=term-missing
pytest tests/test_graph_queries.py --cov=src/graph --cov-report=term-missing

Run performance tests

Run performance tests

pytest tests/test_graph_performance.py -v --benchmark-only
pytest tests/test_graph_performance.py -v --benchmark-only

Check for slow queries (custom marker)

Check for slow queries (custom marker)

pytest tests/test_graph_queries.py -m slow -v

---
pytest tests/test_graph_queries.py -m slow -v

---

5. Performance Patterns

5. 性能模式

Pattern 1: Indexing Strategy

模式1:索引策略

Good: Create indexes before queries need them
surreal
-- Index frequently queried properties
DEFINE INDEX person_email ON TABLE person COLUMNS email UNIQUE;
DEFINE INDEX person_name ON TABLE person COLUMNS name;

-- Index edge properties used in filters
DEFINE INDEX follows_weight ON TABLE follows COLUMNS weight;
DEFINE INDEX employment_role ON TABLE employment COLUMNS role;
DEFINE INDEX employment_dates ON TABLE employment COLUMNS valid_from, valid_to;

-- Composite index for common filter combinations
DEFINE INDEX person_status_created ON TABLE person COLUMNS status, created_at;
Bad: Query without indexes
surreal
-- Full table scan on every query!
SELECT * FROM person WHERE email = 'alice@example.com';
SELECT ->follows[WHERE weight > 0.5]->person.* FROM person:alice;
良好实践:在查询需要前创建索引
surreal
-- Index frequently queried properties
DEFINE INDEX person_email ON TABLE person COLUMNS email UNIQUE;
DEFINE INDEX person_name ON TABLE person COLUMNS name;

-- Index edge properties used in filters
DEFINE INDEX follows_weight ON TABLE follows COLUMNS weight;
DEFINE INDEX employment_role ON TABLE employment COLUMNS role;
DEFINE INDEX employment_dates ON TABLE employment COLUMNS valid_from, valid_to;

-- Composite index for common filter combinations
DEFINE INDEX person_status_created ON TABLE person COLUMNS status, created_at;
不良实践:无索引查询
surreal
-- Full table scan on every query!
SELECT * FROM person WHERE email = 'alice@example.com';
SELECT ->follows[WHERE weight > 0.5]->person.* FROM person:alice;

Pattern 2: Query Optimization

模式2:查询优化

Good: Bounded traversals with limits
surreal
-- Always set depth limits
SELECT ->follows[..3]->person.name FROM person:alice;

-- Use pagination for large results
SELECT ->follows->person.* FROM person:alice LIMIT 50 START 0;

-- Filter early to reduce traversal
SELECT ->follows[WHERE weight > 0.5][..2]->person.name
FROM person:alice
LIMIT 100;
Bad: Unbounded queries
surreal
-- Can traverse entire graph!
SELECT ->follows->person.* FROM person:alice;

-- No limits on results
SELECT * FROM person WHERE status = 'active';
良好实践:带限制的有界遍历
surreal
-- Always set depth limits
SELECT ->follows[..3]->person.name FROM person:alice;

-- Use pagination for large results
SELECT ->follows->person.* FROM person:alice LIMIT 50 START 0;

-- Filter early to reduce traversal
SELECT ->follows[WHERE weight > 0.5][..2]->person.name
FROM person:alice
LIMIT 100;
不良实践:无界查询
surreal
-- Can traverse entire graph!
SELECT ->follows->person.* FROM person:alice;

-- No limits on results
SELECT * FROM person WHERE status = 'active';

Pattern 3: Caching Frequent Traversals

模式3:缓存频繁遍历

Good: Cache expensive traversals
python
from functools import lru_cache
from datetime import datetime, timedelta

class GraphCache:
    def __init__(self, ttl_seconds: int = 300):
        self.cache = {}
        self.ttl = timedelta(seconds=ttl_seconds)

    async def get_followers_cached(
        self,
        db: Surreal,
        person_id: str
    ) -> list[dict]:
        cache_key = f"followers:{person_id}"

        if cache_key in self.cache:
            entry = self.cache[cache_key]
            if datetime.now() - entry['time'] < self.ttl:
                return entry['data']

        # Execute query
        result = await db.query(
            "SELECT <-follows<-person.* FROM $person LIMIT 100",
            {"person": person_id}
        )

        # Cache result
        self.cache[cache_key] = {
            'data': result[0]['result'],
            'time': datetime.now()
        }

        return result[0]['result']

    def invalidate(self, person_id: str):
        """Invalidate cache when graph changes."""
        keys_to_remove = [
            k for k in self.cache
            if person_id in k
        ]
        for key in keys_to_remove:
            del self.cache[key]
Bad: No caching for repeated queries
python
undefined
良好实践:缓存昂贵的遍历
python
from functools import lru_cache
from datetime import datetime, timedelta

class GraphCache:
    def __init__(self, ttl_seconds: int = 300):
        self.cache = {}
        self.ttl = timedelta(seconds=ttl_seconds)

    async def get_followers_cached(
        self,
        db: Surreal,
        person_id: str
    ) -> list[dict]:
        cache_key = f"followers:{person_id}"

        if cache_key in self.cache:
            entry = self.cache[cache_key]
            if datetime.now() - entry['time'] < self.ttl:
                return entry['data']

        # Execute query
        result = await db.query(
            "SELECT <-follows<-person.* FROM $person LIMIT 100",
            {"person": person_id}
        )

        # Cache result
        self.cache[cache_key] = {
            'data': result[0]['result'],
            'time': datetime.now()
        }

        return result[0]['result']

    def invalidate(self, person_id: str):
        """Invalidate cache when graph changes."""
        keys_to_remove = [
            k for k in self.cache
            if person_id in k
        ]
        for key in keys_to_remove:
            del self.cache[key]
不良实践:重复查询无缓存
python
undefined

Every call hits the database

Every call hits the database

async def get_followers(db, person_id): return await db.query( "SELECT <-follows<-person.* FROM $person", {"person": person_id} )
undefined
async def get_followers(db, person_id): return await db.query( "SELECT <-follows<-person.* FROM $person", {"person": person_id} )
undefined

Pattern 4: Batch Operations

模式4:批量操作

Good: Batch multiple operations
surreal
-- Batch create nodes
CREATE person CONTENT [
    { id: 'person:alice', name: 'Alice' },
    { id: 'person:bob', name: 'Bob' },
    { id: 'person:charlie', name: 'Charlie' }
];

-- Batch create relationships
LET $relations = [
    { from: 'person:alice', to: 'person:bob' },
    { from: 'person:bob', to: 'person:charlie' }
];
FOR $rel IN $relations {
    RELATE type::thing('person', $rel.from)->follows->type::thing('person', $rel.to);
};
python
undefined
良好实践:批量处理多个操作
surreal
-- Batch create nodes
CREATE person CONTENT [
    { id: 'person:alice', name: 'Alice' },
    { id: 'person:bob', name: 'Bob' },
    { id: 'person:charlie', name: 'Charlie' }
];

-- Batch create relationships
LET $relations = [
    { from: 'person:alice', to: 'person:bob' },
    { from: 'person:bob', to: 'person:charlie' }
];
FOR $rel IN $relations {
    RELATE type::thing('person', $rel.from)->follows->type::thing('person', $rel.to);
};
python
undefined

Python batch operations

Python batch operations

async def batch_create_relationships( db: Surreal, relationships: list[dict] ) -> None: """Create multiple relationships in one transaction.""" queries = [] for rel in relationships: queries.append( f"RELATE {rel['from']}->follows->{rel['to']};" )
# Execute as single transaction
await db.query("BEGIN TRANSACTION; " + " ".join(queries) + " COMMIT;")

**Bad: Individual operations**
```python
async def batch_create_relationships( db: Surreal, relationships: list[dict] ) -> None: """Create multiple relationships in one transaction.""" queries = [] for rel in relationships: queries.append( f"RELATE {rel['from']}->follows->{rel['to']};" )
# Execute as single transaction
await db.query("BEGIN TRANSACTION; " + " ".join(queries) + " COMMIT;")

**不良实践:单个操作逐个执行**
```python

N database round trips!

N database round trips!

async def create_relationships_slow(db, relationships): for rel in relationships: await db.query( f"RELATE {rel['from']}->follows->{rel['to']};" )
undefined
async def create_relationships_slow(db, relationships): for rel in relationships: await db.query( f"RELATE {rel['from']}->follows->{rel['to']};" )
undefined

Pattern 5: Connection Pooling

模式5:连接池

Good: Use connection pool
python
from contextlib import asynccontextmanager
import asyncio

class SurrealPool:
    def __init__(self, url: str, pool_size: int = 10):
        self.url = url
        self.pool_size = pool_size
        self._pool = asyncio.Queue(maxsize=pool_size)
        self._created = 0

    async def initialize(self):
        """Pre-create connections."""
        for _ in range(self.pool_size):
            conn = await self._create_connection()
            await self._pool.put(conn)

    async def _create_connection(self) -> Surreal:
        db = Surreal(self.url)
        await db.connect()
        await db.signin({"user": "root", "pass": "root"})
        await db.use("jarvis", "main")
        self._created += 1
        return db

    @asynccontextmanager
    async def acquire(self):
        """Get connection from pool."""
        conn = await self._pool.get()
        try:
            yield conn
        finally:
            await self._pool.put(conn)

    async def close(self):
        """Close all connections."""
        while not self._pool.empty():
            conn = await self._pool.get()
            await conn.close()
良好实践:使用连接池
python
from contextlib import asynccontextmanager
import asyncio

class SurrealPool:
    def __init__(self, url: str, pool_size: int = 10):
        self.url = url
        self.pool_size = pool_size
        self._pool = asyncio.Queue(maxsize=pool_size)
        self._created = 0

    async def initialize(self):
        """Pre-create connections."""
        for _ in range(self.pool_size):
            conn = await self._create_connection()
            await self._pool.put(conn)

    async def _create_connection(self) -> Surreal:
        db = Surreal(self.url)
        await db.connect()
        await db.signin({"user": "root", "pass": "root"})
        await db.use("jarvis", "main")
        self._created += 1
        return db

    @asynccontextmanager
    async def acquire(self):
        """Get connection from pool."""
        conn = await self._pool.get()
        try:
            yield conn
        finally:
            await self._pool.put(conn)

    async def close(self):
        """Close all connections."""
        while not self._pool.empty():
            conn = await self._pool.get()
            await conn.close()

Usage

Usage

pool = SurrealPool("ws://localhost:8000/rpc") await pool.initialize()
async with pool.acquire() as db: result = await db.query("SELECT * FROM person LIMIT 10")

**Bad: Create connection per query**
```python
pool = SurrealPool("ws://localhost:8000/rpc") await pool.initialize()
async with pool.acquire() as db: result = await db.query("SELECT * FROM person LIMIT 10")

**不良实践:每次查询创建连接**
```python

Connection overhead on every query!

Connection overhead on every query!

async def query_slow(query: str): db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) result = await db.query(query) await db.close() return result

---
async def query_slow(query: str): db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) result = await db.query(query) await db.close() return result

---

6. Top 7 Graph Modeling Patterns

6. 7大核心图建模模式

Pattern 1: Entity Nodes with Typed Relationships (SurrealDB)

模式1:带类型化关系的实体节点(SurrealDB)

surreal
-- Define entity tables
DEFINE TABLE person SCHEMAFULL;
DEFINE FIELD name ON TABLE person TYPE string;
DEFINE FIELD email ON TABLE person TYPE string;
DEFINE FIELD created_at ON TABLE person TYPE datetime DEFAULT time::now();

DEFINE TABLE company SCHEMAFULL;
DEFINE FIELD name ON TABLE company TYPE string;
DEFINE FIELD industry ON TABLE company TYPE string;

-- Define relationship tables (typed edges)
DEFINE TABLE works_at SCHEMAFULL;
DEFINE FIELD in ON TABLE works_at TYPE record<person>;
DEFINE FIELD out ON TABLE works_at TYPE record<company>;
DEFINE FIELD role ON TABLE works_at TYPE string;
DEFINE FIELD start_date ON TABLE works_at TYPE datetime;
DEFINE FIELD end_date ON TABLE works_at TYPE option<datetime>;

-- Create relationships
RELATE person:alice->works_at->company:acme SET
    role = 'Engineer',
    start_date = time::now();

-- Forward traversal: Who works at this company?
SELECT <-works_at<-person.* FROM company:acme;

-- Backward traversal: Where does this person work?
SELECT ->works_at->company.* FROM person:alice;

-- Filter on edge properties
SELECT ->works_at[WHERE role = 'Engineer']->company.*
FROM person:alice;
Generic concept: Model entities as nodes and relationships as edges with properties. Direction matters for query efficiency.

surreal
-- Define entity tables
DEFINE TABLE person SCHEMAFULL;
DEFINE FIELD name ON TABLE person TYPE string;
DEFINE FIELD email ON TABLE person TYPE string;
DEFINE FIELD created_at ON TABLE person TYPE datetime DEFAULT time::now();

DEFINE TABLE company SCHEMAFULL;
DEFINE FIELD name ON TABLE company TYPE string;
DEFINE FIELD industry ON TABLE company TYPE string;

-- Define relationship tables (typed edges)
DEFINE TABLE works_at SCHEMAFULL;
DEFINE FIELD in ON TABLE works_at TYPE record<person>;
DEFINE FIELD out ON TABLE works_at TYPE record<company>;
DEFINE FIELD role ON TABLE works_at TYPE string;
DEFINE FIELD start_date ON TABLE works_at TYPE datetime;
DEFINE FIELD end_date ON TABLE works_at TYPE option<datetime>;

-- Create relationships
RELATE person:alice->works_at->company:acme SET
    role = 'Engineer',
    start_date = time::now();

-- Forward traversal: Who works at this company?
SELECT <-works_at<-person.* FROM company:acme;

-- Backward traversal: Where does this person work?
SELECT ->works_at->company.* FROM person:alice;

-- Filter on edge properties
SELECT ->works_at[WHERE role = 'Engineer']->company.*
FROM person:alice;
通用概念:将实体建模为节点,关系建模为带属性的边。关系方向对查询效率至关重要。

Pattern 2: Multi-Hop Graph Traversal

模式2:多跳图遍历

surreal
-- Schema: person -> follows -> person -> likes -> post
DEFINE TABLE follows SCHEMAFULL;
DEFINE FIELD in ON TABLE follows TYPE record<person>;
DEFINE FIELD out ON TABLE follows TYPE record<person>;

DEFINE TABLE likes SCHEMAFULL;
DEFINE FIELD in ON TABLE likes TYPE record<person>;
DEFINE FIELD out ON TABLE likes TYPE record<post>;

-- Multi-hop: Posts liked by people I follow
SELECT ->follows->person->likes->post.* FROM person:alice;

-- Depth limit to prevent runaway queries
SELECT ->follows[..3]->person.name FROM person:alice;

-- Variable depth traversal
SELECT ->follows[1..2]->person.* FROM person:alice;

-- DON'T: Unbounded traversal (dangerous!)
-- SELECT ->follows->person.* FROM person:alice; -- Could traverse entire graph!
Generic concept: Graph traversals follow edges to discover connected nodes. Always set depth limits to prevent performance issues.
Neo4j equivalent:
cypher
// Multi-hop in Cypher
MATCH (alice:Person {id: 'alice'})-[:FOLLOWS*1..2]->(person:Person)
RETURN person

surreal
-- Schema: person -> follows -> person -> likes -> post
DEFINE TABLE follows SCHEMAFULL;
DEFINE FIELD in ON TABLE follows TYPE record<person>;
DEFINE FIELD out ON TABLE follows TYPE record<person>;

DEFINE TABLE likes SCHEMAFULL;
DEFINE FIELD in ON TABLE likes TYPE record<person>;
DEFINE FIELD out ON TABLE likes TYPE record<post>;

-- Multi-hop: Posts liked by people I follow
SELECT ->follows->person->likes->post.* FROM person:alice;

-- Depth limit to prevent runaway queries
SELECT ->follows[..3]->person.name FROM person:alice;

-- Variable depth traversal
SELECT ->follows[1..2]->person.* FROM person:alice;

-- DON'T: Unbounded traversal (dangerous!)
-- SELECT ->follows->person.* FROM person:alice; -- Could traverse entire graph!
通用概念:图遍历通过边发现关联节点。始终设置深度限制以避免性能问题。
Neo4j等价实现
cypher
// Multi-hop in Cypher
MATCH (alice:Person {id: 'alice'})-[:FOLLOWS*1..2]->(person:Person)
RETURN person

Pattern 3: Bidirectional Relationships

模式3:双向关系

surreal
-- Model friendship (symmetric relationship)
DEFINE TABLE friendship SCHEMAFULL;
DEFINE FIELD in ON TABLE friendship TYPE record<person>;
DEFINE FIELD out ON TABLE friendship TYPE record<person>;
DEFINE FIELD created_at ON TABLE friendship TYPE datetime DEFAULT time::now();

-- Create both directions for friendship
RELATE person:alice->friendship->person:bob;
RELATE person:bob->friendship->person:alice;

-- Query friends in either direction
SELECT ->friendship->person.* FROM person:alice;
SELECT <-friendship<-person.* FROM person:alice;

-- Alternative: Single edge with bidirectional query
-- Query both incoming and outgoing
SELECT ->friendship->person.*, <-friendship<-person.*
FROM person:alice;
Generic concept: Symmetric relationships need careful design. Either create bidirectional edges or query in both directions.
Design choices:
  • Duplicate edges: Faster queries, more storage
  • Single edge + bidirectional queries: Less storage, slightly slower
  • Undirected graph flag: Database-specific feature

surreal
-- Model friendship (symmetric relationship)
DEFINE TABLE friendship SCHEMAFULL;
DEFINE FIELD in ON TABLE friendship TYPE record<person>;
DEFINE FIELD out ON TABLE friendship TYPE record<person>;
DEFINE FIELD created_at ON TABLE friendship TYPE datetime DEFAULT time::now();

-- Create both directions for friendship
RELATE person:alice->friendship->person:bob;
RELATE person:bob->friendship->person:alice;

-- Query friends in either direction
SELECT ->friendship->person.* FROM person:alice;
SELECT <-friendship<-person.* FROM person:alice;

-- Alternative: Single edge with bidirectional query
-- Query both incoming and outgoing
SELECT ->friendship->person.*, <-friendship<-person.*
FROM person:alice;
通用概念:对称关系需要精心设计。可以创建双向边或在两个方向上查询。
设计选择
  • 重复边:查询更快,存储更多
  • 单边+双向查询:存储更少,查询略慢
  • 无向图标记:数据库特定特性

Pattern 4: Hierarchical Data (Trees and DAGs)

模式4:层级数据(树与DAG)

surreal
-- Organization hierarchy
DEFINE TABLE org_unit SCHEMAFULL;
DEFINE FIELD name ON TABLE org_unit TYPE string;
DEFINE FIELD level ON TABLE org_unit TYPE string;

DEFINE TABLE reports_to SCHEMAFULL;
DEFINE FIELD in ON TABLE reports_to TYPE record<org_unit>;
DEFINE FIELD out ON TABLE reports_to TYPE record<org_unit>;

-- Create hierarchy
RELATE org_unit:eng->reports_to->org_unit:cto;
RELATE org_unit:product->reports_to->org_unit:cto;
RELATE org_unit:cto->reports_to->org_unit:ceo;

-- Get all ancestors (upward traversal)
SELECT ->reports_to[..10]->org_unit.* FROM org_unit:eng;

-- Get all descendants (downward traversal)
SELECT <-reports_to[..10]<-org_unit.* FROM org_unit:ceo;

-- Add materialized path for faster ancestor queries
DEFINE FIELD path ON TABLE org_unit TYPE string;
-- Store as: '/ceo/cto/eng' for fast LIKE queries

-- Add level for depth queries
UPDATE org_unit:eng SET level = 3;
SELECT * FROM org_unit WHERE level = 3;
Generic concept: Trees and hierarchies are special graph patterns. Consider materialized paths or nested sets for complex queries.

surreal
-- Organization hierarchy
DEFINE TABLE org_unit SCHEMAFULL;
DEFINE FIELD name ON TABLE org_unit TYPE string;
DEFINE FIELD level ON TABLE org_unit TYPE string;

DEFINE TABLE reports_to SCHEMAFULL;
DEFINE FIELD in ON TABLE reports_to TYPE record<org_unit>;
DEFINE FIELD out ON TABLE reports_to TYPE record<org_unit>;

-- Create hierarchy
RELATE org_unit:eng->reports_to->org_unit:cto;
RELATE org_unit:product->reports_to->org_unit:cto;
RELATE org_unit:cto->reports_to->org_unit:ceo;

-- Get all ancestors (upward traversal)
SELECT ->reports_to[..10]->org_unit.* FROM org_unit:eng;

-- Get all descendants (downward traversal)
SELECT <-reports_to[..10]<-org_unit.* FROM org_unit:ceo;

-- Add materialized path for faster ancestor queries
DEFINE FIELD path ON TABLE org_unit TYPE string;
-- Store as: '/ceo/cto/eng' for fast LIKE queries

-- Add level for depth queries
UPDATE org_unit:eng SET level = 3;
SELECT * FROM org_unit WHERE level = 3;
通用概念:树和层级是特殊的图模式。对于复杂查询,可考虑物化路径或嵌套集合。

Pattern 5: Temporal Relationships (Time-Based Edges)

模式5:时间相关关系(基于时间的边)

surreal
-- Track relationship validity periods
DEFINE TABLE employment SCHEMAFULL;
DEFINE FIELD in ON TABLE employment TYPE record<person>;
DEFINE FIELD out ON TABLE employment TYPE record<company>;
DEFINE FIELD role ON TABLE employment TYPE string;
DEFINE FIELD valid_from ON TABLE employment TYPE datetime;
DEFINE FIELD valid_to ON TABLE employment TYPE option<datetime>;

-- Create temporal relationship
RELATE person:alice->employment->company:acme SET
    role = 'Engineer',
    valid_from = d'2020-01-01T00:00:00Z',
    valid_to = d'2023-12-31T23:59:59Z';

-- Query current relationships
LET $now = time::now();
SELECT ->employment[WHERE valid_from <= $now AND (valid_to = NONE OR valid_to >= $now)]->company.*
FROM person:alice;

-- Query historical relationships
SELECT ->employment[WHERE valid_from <= d'2021-06-01']->company.*
FROM person:alice;

-- Index temporal fields
DEFINE INDEX employment_valid_from ON TABLE employment COLUMNS valid_from;
DEFINE INDEX employment_valid_to ON TABLE employment COLUMNS valid_to;
Generic concept: Add timestamps to edges for temporal queries. Essential for audit trails, historical analysis, and versioning.

surreal
-- Track relationship validity periods
DEFINE TABLE employment SCHEMAFULL;
DEFINE FIELD in ON TABLE employment TYPE record<person>;
DEFINE FIELD out ON TABLE employment TYPE record<company>;
DEFINE FIELD role ON TABLE employment TYPE string;
DEFINE FIELD valid_from ON TABLE employment TYPE datetime;
DEFINE FIELD valid_to ON TABLE employment TYPE option<datetime>;

-- Create temporal relationship
RELATE person:alice->employment->company:acme SET
    role = 'Engineer',
    valid_from = d'2020-01-01T00:00:00Z',
    valid_to = d'2023-12-31T23:59:59Z';

-- Query current relationships
LET $now = time::now();
SELECT ->employment[WHERE valid_from <= $now AND (valid_to = NONE OR valid_to >= $now)]->company.*
FROM person:alice;

-- Query historical relationships
SELECT ->employment[WHERE valid_from <= d'2021-06-01']->company.*
FROM person:alice;

-- Index temporal fields
DEFINE INDEX employment_valid_from ON TABLE employment COLUMNS valid_from;
DEFINE INDEX employment_valid_to ON TABLE employment COLUMNS valid_to;
通用概念:为边添加时间戳以支持时间查询。这对于审计追踪、历史分析和版本控制至关重要。

Pattern 6: Weighted Relationships (Graph Algorithms)

模式6:加权关系(图算法)

surreal
-- Social network with relationship strength
DEFINE TABLE connected_to SCHEMAFULL;
DEFINE FIELD in ON TABLE connected_to TYPE record<person>;
DEFINE FIELD out ON TABLE connected_to TYPE record<person>;
DEFINE FIELD weight ON TABLE connected_to TYPE float;
DEFINE FIELD interaction_count ON TABLE connected_to TYPE int DEFAULT 0;

-- Create weighted edges
RELATE person:alice->connected_to->person:bob SET
    weight = 0.8,
    interaction_count = 45;

-- Filter by weight threshold
SELECT ->connected_to[WHERE weight > 0.5]->person.* FROM person:alice;

-- Sort by relationship strength
SELECT ->connected_to->person.*, ->connected_to.weight AS strength
FROM person:alice
ORDER BY strength DESC;

-- Use cases:
-- - Shortest weighted path algorithms
-- - Recommendation scoring
-- - Fraud detection patterns
-- - Network flow analysis
Generic concept: Edge properties enable graph algorithms. Weight is fundamental for pathfinding, recommendations, and network analysis.

surreal
-- Social network with relationship strength
DEFINE TABLE connected_to SCHEMAFULL;
DEFINE FIELD in ON TABLE connected_to TYPE record<person>;
DEFINE FIELD out ON TABLE connected_to TYPE record<person>;
DEFINE FIELD weight ON TABLE connected_to TYPE float;
DEFINE FIELD interaction_count ON TABLE connected_to TYPE int DEFAULT 0;

-- Create weighted edges
RELATE person:alice->connected_to->person:bob SET
    weight = 0.8,
    interaction_count = 45;

-- Filter by weight threshold
SELECT ->connected_to[WHERE weight > 0.5]->person.* FROM person:alice;

-- Sort by relationship strength
SELECT ->connected_to->person.*, ->connected_to.weight AS strength
FROM person:alice
ORDER BY strength DESC;

-- Use cases:
-- - Shortest weighted path algorithms
-- - Recommendation scoring
-- - Fraud detection patterns
-- - Network flow analysis
通用概念:边属性支持图算法。权重是路径查找、推荐和网络分析的基础。

Pattern 7: Avoiding N+1 Queries with Graph Traversal

模式7:使用图遍历避免N+1查询

surreal
-- N+1 ANTI-PATTERN: Multiple queries
-- First query
SELECT * FROM person;
-- Then for each person (N queries)
SELECT * FROM company WHERE id = (SELECT ->works_at->company FROM person:alice);
SELECT * FROM company WHERE id = (SELECT ->works_at->company FROM person:bob);

-- CORRECT: Single graph traversal
SELECT
    *,
    ->works_at->company.* AS companies
FROM person;

-- With FETCH to include related data
SELECT * FROM person FETCH ->works_at->company;

-- Complex traversal in one query
SELECT
    name,
    ->works_at->company.name AS company_name,
    ->follows->person.name AS following,
    <-follows<-person.name AS followers
FROM person:alice;
Generic concept: Graph databases excel at joins. Use traversal operators instead of multiple round-trip queries.

surreal
-- N+1 ANTI-PATTERN: Multiple queries
-- First query
SELECT * FROM person;
-- Then for each person (N queries)
SELECT * FROM company WHERE id = (SELECT ->works_at->company FROM person:alice);
SELECT * FROM company WHERE id = (SELECT ->works_at->company FROM person:bob);

-- CORRECT: Single graph traversal
SELECT
    *,
    ->works_at->company.* AS companies
FROM person;

-- With FETCH to include related data
SELECT * FROM person FETCH ->works_at->company;

-- Complex traversal in one query
SELECT
    name,
    ->works_at->company.name AS company_name,
    ->follows->person.name AS following,
    <-follows<-person.name AS followers
FROM person:alice;
通用概念:图数据库擅长关联查询。使用遍历操作替代多次往返查询。

7. Testing

7. 测试

Unit Tests for Graph Queries

图查询单元测试

python
undefined
python
undefined

tests/test_graph_service.py

tests/test_graph_service.py

import pytest from unittest.mock import AsyncMock, MagicMock
@pytest.fixture def mock_db(): """Create mock database for unit tests.""" db = AsyncMock() return db
@pytest.mark.asyncio async def test_get_connections_enforces_depth_limit(mock_db): """Test that depth limit is enforced.""" from src.graph.queries import GraphQueryService
service = GraphQueryService(mock_db)

with pytest.raises(ValueError) as exc_info:
    await service.get_connections("person:alice", "follows", depth=10)

assert "Maximum depth is 5" in str(exc_info.value)
@pytest.mark.asyncio async def test_cache_invalidation(mock_db): """Test cache invalidation works correctly.""" from src.graph.queries import GraphQueryService
mock_db.query.return_value = [{'result': [{'name': 'Bob'}]}]

service = GraphQueryService(mock_db)

# First call
result1 = await service.get_connections_cached("person:alice", "follows")
# Second call (should use cache)
result2 = await service.get_connections_cached("person:alice", "follows")

# Only one DB call
assert mock_db.query.call_count == 1

# Invalidate and call again
service.invalidate_cache("person:alice")
result3 = await service.get_connections_cached("person:alice", "follows")

# Should hit DB again
assert mock_db.query.call_count == 2
undefined
import pytest from unittest.mock import AsyncMock, MagicMock
@pytest.fixture def mock_db(): """Create mock database for unit tests.""" db = AsyncMock() return db
@pytest.mark.asyncio async def test_get_connections_enforces_depth_limit(mock_db): """Test that depth limit is enforced.""" from src.graph.queries import GraphQueryService
service = GraphQueryService(mock_db)

with pytest.raises(ValueError) as exc_info:
    await service.get_connections("person:alice", "follows", depth=10)

assert "Maximum depth is 5" in str(exc_info.value)
@pytest.mark.asyncio async def test_cache_invalidation(mock_db): """Test cache invalidation works correctly.""" from src.graph.queries import GraphQueryService
mock_db.query.return_value = [{'result': [{'name': 'Bob'}]}]

service = GraphQueryService(mock_db)

# First call
result1 = await service.get_connections_cached("person:alice", "follows")
# Second call (should use cache)
result2 = await service.get_connections_cached("person:alice", "follows")

# Only one DB call
assert mock_db.query.call_count == 1

# Invalidate and call again
service.invalidate_cache("person:alice")
result3 = await service.get_connections_cached("person:alice", "follows")

# Should hit DB again
assert mock_db.query.call_count == 2
undefined

Integration Tests with Real Database

真实数据库集成测试

python
undefined
python
undefined

tests/integration/test_graph_integration.py

tests/integration/test_graph_integration.py

import pytest from surrealdb import Surreal
@pytest.fixture(scope="module") async def test_db(): """Setup test database.""" db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) await db.use("test", "graph_test")
yield db

# Cleanup
await db.query("REMOVE DATABASE graph_test;")
await db.close()
@pytest.mark.integration @pytest.mark.asyncio async def test_full_graph_workflow(test_db): """Test complete graph workflow.""" # Setup schema await test_db.query(""" DEFINE TABLE person SCHEMAFULL; DEFINE FIELD name ON TABLE person TYPE string; DEFINE INDEX person_name ON TABLE person COLUMNS name;
    DEFINE TABLE follows SCHEMAFULL;
    DEFINE FIELD in ON TABLE follows TYPE record<person>;
    DEFINE FIELD out ON TABLE follows TYPE record<person>;
""")

# Create nodes
await test_db.query("""
    CREATE person:alice SET name = 'Alice';
    CREATE person:bob SET name = 'Bob';
""")

# Create relationship
await test_db.query(
    "RELATE person:alice->follows->person:bob"
)

# Query relationship
result = await test_db.query(
    "SELECT ->follows->person.name FROM person:alice"
)

assert 'Bob' in str(result)
undefined
import pytest from surrealdb import Surreal
@pytest.fixture(scope="module") async def test_db(): """Setup test database.""" db = Surreal("ws://localhost:8000/rpc") await db.connect() await db.signin({"user": "root", "pass": "root"}) await db.use("test", "graph_test")
yield db

# Cleanup
await db.query("REMOVE DATABASE graph_test;")
await db.close()
@pytest.mark.integration @pytest.mark.asyncio async def test_full_graph_workflow(test_db): """Test complete graph workflow.""" # Setup schema await test_db.query(""" DEFINE TABLE person SCHEMAFULL; DEFINE FIELD name ON TABLE person TYPE string; DEFINE INDEX person_name ON TABLE person COLUMNS name;
    DEFINE TABLE follows SCHEMAFULL;
    DEFINE FIELD in ON TABLE follows TYPE record<person>;
    DEFINE FIELD out ON TABLE follows TYPE record<person>;
""")

# Create nodes
await test_db.query("""
    CREATE person:alice SET name = 'Alice';
    CREATE person:bob SET name = 'Bob';
""")

# Create relationship
await test_db.query(
    "RELATE person:alice->follows->person:bob"
)

# Query relationship
result = await test_db.query(
    "SELECT ->follows->person.name FROM person:alice"
)

assert 'Bob' in str(result)
undefined

Performance Tests

性能测试

python
undefined
python
undefined

tests/performance/test_graph_performance.py

tests/performance/test_graph_performance.py

import pytest import time
@pytest.mark.slow @pytest.mark.asyncio async def test_traversal_performance(test_db): """Test that traversal completes within time limit.""" # Setup large graph await test_db.query(""" FOR $i IN 1..100 { CREATE person SET name = $i; }; FOR $i IN 1..99 { RELATE type::thing('person', $i)->follows->type::thing('person', $i + 1); }; """)
start = time.time()

# Run bounded traversal
result = await test_db.query(
    "SELECT ->follows[..5]->person.* FROM person:1"
)

elapsed = time.time() - start

# Should complete in under 100ms
assert elapsed < 0.1, f"Traversal took {elapsed}s"

# Should return limited results
assert len(result[0]['result']) <= 5

---
import pytest import time
@pytest.mark.slow @pytest.mark.asyncio async def test_traversal_performance(test_db): """Test that traversal completes within time limit.""" # Setup large graph await test_db.query(""" FOR $i IN 1..100 { CREATE person SET name = $i; }; FOR $i IN 1..99 { RELATE type::thing('person', $i)->follows->type::thing('person', $i + 1); }; """)
start = time.time()

# Run bounded traversal
result = await test_db.query(
    "SELECT ->follows[..5]->person.* FROM person:1"
)

elapsed = time.time() - start

# Should complete in under 100ms
assert elapsed < 0.1, f"Traversal took {elapsed}s"

# Should return limited results
assert len(result[0]['result']) <= 5

---

8. Security

8. 安全

8.1 Access Control

8.1 访问控制

surreal
-- Row-level security on nodes
DEFINE TABLE document SCHEMAFULL
    PERMISSIONS
        FOR select WHERE public = true OR owner = $auth.id
        FOR create WHERE $auth.id != NONE
        FOR update, delete WHERE owner = $auth.id;

-- Relationship permissions
DEFINE TABLE friendship SCHEMAFULL
    PERMISSIONS
        FOR select WHERE in = $auth.id OR out = $auth.id
        FOR create WHERE in = $auth.id
        FOR delete WHERE in = $auth.id OR out = $auth.id;

-- Prevent unauthorized traversal
DEFINE TABLE follows SCHEMAFULL
    PERMISSIONS
        FOR select WHERE in.public = true OR in.id = $auth.id;
surreal
-- Row-level security on nodes
DEFINE TABLE document SCHEMAFULL
    PERMISSIONS
        FOR select WHERE public = true OR owner = $auth.id
        FOR create WHERE $auth.id != NONE
        FOR update, delete WHERE owner = $auth.id;

-- Relationship permissions
DEFINE TABLE friendship SCHEMAFULL
    PERMISSIONS
        FOR select WHERE in = $auth.id OR out = $auth.id
        FOR create WHERE in = $auth.id
        FOR delete WHERE in = $auth.id OR out = $auth.id;

-- Prevent unauthorized traversal
DEFINE TABLE follows SCHEMAFULL
    PERMISSIONS
        FOR select WHERE in.public = true OR in.id = $auth.id;

8.2 Injection Prevention

8.2 注入防护

surreal
-- SECURE: Parameterized queries
LET $person_id = "person:alice";
SELECT ->follows->person.* FROM $person_id;

-- With SDK
const result = await db.query(
    'SELECT ->follows->person.* FROM $person',
    { person: `person:${userId}` }
);

-- VULNERABLE: String concatenation
-- const query = `SELECT * FROM person:${userInput}`;
surreal
-- SECURE: Parameterized queries
LET $person_id = "person:alice";
SELECT ->follows->person.* FROM $person_id;

-- With SDK
const result = await db.query(
    'SELECT ->follows->person.* FROM $person',
    { person: `person:${userId}` }
);

-- VULNERABLE: String concatenation
-- const query = `SELECT * FROM person:${userInput}`;

8.3 Query Depth Limits

8.3 查询深度限制

surreal
-- SAFE: Bounded traversal
SELECT ->follows[..3]->person.* FROM person:alice;

-- SAFE: Limit results
SELECT ->follows->person.* FROM person:alice LIMIT 100;

-- DANGEROUS: Unbounded traversal
-- SELECT ->follows->person.* FROM person:alice;
-- Could traverse millions of nodes!
surreal
-- SAFE: Bounded traversal
SELECT ->follows[..3]->person.* FROM person:alice;

-- SAFE: Limit results
SELECT ->follows->person.* FROM person:alice LIMIT 100;

-- DANGEROUS: Unbounded traversal
-- SELECT ->follows->person.* FROM person:alice;
-- Could traverse millions of nodes!

8.4 Data Exposure

8.4 数据暴露防护

surreal
-- Filter sensitive data in traversals
SELECT
    name,
    ->follows->person.{name, public_bio} AS following
FROM person:alice;

-- DON'T: Expose all fields in traversal
-- SELECT ->follows->person.* FROM person:alice;
-- May include email, phone, private data

surreal
-- Filter sensitive data in traversals
SELECT
    name,
    ->follows->person.{name, public_bio} AS following
FROM person:alice;

-- DON'T: Expose all fields in traversal
-- SELECT ->follows->person.* FROM person:alice;
-- May include email, phone, private data

9. Common Mistakes

9. 常见错误

Mistake 1: Unbounded Graph Traversals

错误1:无界图遍历

surreal
-- DON'T: No depth limit
SELECT ->follows->person.* FROM person:alice;
-- Could traverse entire social network!

-- DO: Set depth limits
SELECT ->follows[..2]->person.* FROM person:alice;
SELECT ->follows[1..3]->person.* FROM person:alice LIMIT 100;

surreal
-- DON'T: No depth limit
SELECT ->follows->person.* FROM person:alice;
-- Could traverse entire social network!

-- DO: Set depth limits
SELECT ->follows[..2]->person.* FROM person:alice;
SELECT ->follows[1..3]->person.* FROM person:alice LIMIT 100;

Mistake 2: Missing Indexes on Traversal Paths

错误2:遍历路径上缺少索引

surreal
-- DON'T: Query without indexes
SELECT * FROM person WHERE email = 'alice@example.com';
-- Full table scan!

-- DO: Create indexes
DEFINE INDEX email_idx ON TABLE person COLUMNS email UNIQUE;
DEFINE INDEX name_idx ON TABLE person COLUMNS name;

-- Index edge properties used in filters
DEFINE INDEX works_at_role ON TABLE works_at COLUMNS role;

surreal
-- DON'T: Query without indexes
SELECT * FROM person WHERE email = 'alice@example.com';
-- Full table scan!

-- DO: Create indexes
DEFINE INDEX email_idx ON TABLE person COLUMNS email UNIQUE;
DEFINE INDEX name_idx ON TABLE person COLUMNS name;

-- Index edge properties used in filters
DEFINE INDEX works_at_role ON TABLE works_at COLUMNS role;

Mistake 3: Wrong Relationship Direction

错误3:关系方向错误

surreal
-- Inefficient: Traversing against primary direction
SELECT <-authored<-post WHERE author = person:alice;

-- Better: Traverse with primary direction
SELECT ->authored->post.* FROM person:alice;

-- Design rule: Model edges in the direction of common queries

surreal
-- Inefficient: Traversing against primary direction
SELECT <-authored<-post WHERE author = person:alice;

-- Better: Traverse with primary direction
SELECT ->authored->post.* FROM person:alice;

-- Design rule: Model edges in the direction of common queries

Mistake 4: N+1 Query Pattern in Graphs

错误4:图中的N+1查询模式

surreal
-- DON'T: Multiple round trips
SELECT * FROM person;
-- Then for each person:
SELECT * FROM post WHERE author = person:1;

-- DO: Single graph traversal
SELECT *, ->authored->post.* FROM person;

surreal
-- DON'T: Multiple round trips
SELECT * FROM person;
-- Then for each person:
SELECT * FROM post WHERE author = person:1;

-- DO: Single graph traversal
SELECT *, ->authored->post.* FROM person;

Mistake 5: Over-Normalizing Relationship Data

错误5:过度规范化关系数据

surreal
-- DON'T: Over-normalize simple properties
-- Separate table for single property
DEFINE TABLE person_email;

-- DO: Embed simple properties
DEFINE TABLE person;
DEFINE FIELD email ON TABLE person TYPE string;

-- Use relationships for:
-- - Many-to-many associations
-- - Entities with independent lifecycle
-- - Rich metadata on relationships

surreal
-- DON'T: Over-normalize simple properties
-- Separate table for single property
DEFINE TABLE person_email;

-- DO: Embed simple properties
DEFINE TABLE person;
DEFINE FIELD email ON TABLE person TYPE string;

-- Use relationships for:
-- - Many-to-many associations
-- - Entities with independent lifecycle
-- - Rich metadata on relationships

Mistake 6: Not Handling Cycles

错误6:未处理循环

surreal
-- Circular references can cause issues
-- Example: A follows B, B follows C, C follows A

-- Set depth limit to prevent infinite loops
SELECT ->follows[..5]->person.* FROM person:alice;

-- Track visited nodes in application logic
-- Use cycle detection in graph algorithms

surreal
-- Circular references can cause issues
-- Example: A follows B, B follows C, C follows A

-- Set depth limit to prevent infinite loops
SELECT ->follows[..5]->person.* FROM person:alice;

-- Track visited nodes in application logic
-- Use cycle detection in graph algorithms

Mistake 7: Ignoring Query Explain Plans

错误7:忽略查询执行计划

surreal
-- Always check query plans for slow queries
-- (Database-specific syntax)

-- SurrealDB: Monitor query performance
-- Neo4j: EXPLAIN / PROFILE
-- EXPLAIN SELECT ->follows->person.* FROM person:alice;

-- Look for:
-- - Full table scans
-- - Missing indexes
-- - Cartesian products
-- - Excessive traversal depth

surreal
-- Always check query plans for slow queries
-- (Database-specific syntax)

-- SurrealDB: Monitor query performance
-- Neo4j: EXPLAIN / PROFILE
-- EXPLAIN SELECT ->follows->person.* FROM person:alice;

-- Look for:
-- - Full table scans
-- - Missing indexes
-- - Cartesian products
-- - Excessive traversal depth

10. Pre-Implementation Checklist

10. 预实现检查清单

Phase 1: Before Writing Code

阶段1:编写代码前

  • Read the PRD section for graph requirements
  • Identify entities (nodes) and relationships (edges)
  • Design schema based on query patterns
  • Plan indexes for frequently queried properties
  • Determine traversal depth limits
  • Review security requirements (permissions, data exposure)
  • Write failing tests for expected query behavior
  • 阅读PRD中关于图数据库的需求部分
  • 识别实体(节点)和关系(边)
  • 基于查询模式设计模式
  • 为频繁查询的属性规划索引
  • 确定遍历深度限制
  • 审查安全需求(权限、数据暴露)
  • 为预期查询行为编写失败测试

Phase 2: During Implementation

阶段2:实现过程中

  • Use parameterized queries (prevent injection)
  • Set depth limits on all traversals
  • Implement pagination for large result sets
  • Add caching for frequent queries
  • Use batch operations for bulk inserts
  • Monitor query performance with explain plans
  • Filter sensitive fields in traversal results
  • 使用参数化查询(防止注入)
  • 为所有遍历设置深度限制
  • 为大型结果集实现分页
  • 为频繁查询添加缓存
  • 使用批量操作进行批量插入
  • 使用执行计划监控查询性能
  • 在遍历结果中过滤敏感字段

Phase 3: Before Committing

阶段3:提交前

  • All graph query tests pass
  • Integration tests with real database pass
  • Performance tests meet latency requirements
  • No unbounded traversals in codebase
  • All queried properties have indexes
  • Security review for data exposure
  • Documentation updated for schema changes

  • 所有图查询测试通过
  • 真实数据库集成测试通过
  • 性能测试满足延迟要求
  • 代码库中无无界遍历
  • 所有查询的属性都有索引
  • 安全审查通过,无数据暴露风险
  • 文档已更新以反映模式变更

12. Summary

12. 总结

You are a graph database expert focused on:
  1. Graph Modeling - Entities as nodes, relationships as edges, typed connections
  2. Query Optimization - Indexes, depth limits, explain plans, efficient traversals
  3. Relationship Design - Bidirectional edges, temporal data, weighted connections
  4. Performance - Avoid N+1, bounded traversals, proper indexing
  5. Security - Row-level permissions, injection prevention, data exposure
Key Principles:
  • Model queries first, then design your graph schema
  • Always set depth limits on recursive traversals
  • Use graph traversal instead of joins or multiple queries
  • Index both node properties and edge properties
  • Add metadata to edges (timestamps, weights, properties)
  • Design relationship direction based on common queries
  • Monitor query performance with explain plans
Graph Database Resources:
Reference Documentation:
  • Query Optimization: See
    references/query-optimization.md
  • Modeling Guide: See
    references/modeling-guide.md
Graph databases excel at connected data. Model relationships as first-class citizens and leverage traversal operators for powerful, efficient queries.
您是一位专注于以下领域的图数据库专家:
  1. 图建模 - 实体为节点、关系为边、类型化连接
  2. 查询优化 - 索引、深度限制、执行计划、高效遍历
  3. 关系设计 - 双向边、时间数据、加权连接
  4. 性能优化 - 避免N+1、有界遍历、合理索引
  5. 安全 - 行级权限、注入防护、数据暴露防护
核心原则
  • 先建模查询,再设计图模式
  • 始终为递归遍历设置深度限制
  • 使用图遍历替代关联查询或多次查询
  • 同时为节点属性和边属性创建索引
  • 为边添加元数据(时间戳、权重、属性)
  • 基于常见查询设计关系方向
  • 使用执行计划监控查询性能
图数据库资源
参考文档
  • 查询优化:参见
    references/query-optimization.md
  • 建模指南:参见
    references/modeling-guide.md
图数据库擅长处理关联数据。将关系作为一等公民建模,并利用遍历操作实现强大、高效的查询。