grepai-storage-postgres

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GrepAI Storage with PostgreSQL

基于PostgreSQL的GrepAI存储

This skill covers using PostgreSQL with the pgvector extension as the storage backend for GrepAI.
本技能介绍如何使用带有pgvector扩展的PostgreSQL作为GrepAI的存储后端。

When to Use This Skill

何时使用此技能

  • Team environments with shared index
  • Large codebases (10K+ files)
  • Need concurrent access
  • Integration with existing PostgreSQL infrastructure
  • 带有共享索引的团队环境
  • 大型代码库(10K+文件)
  • 需要并发访问
  • 与现有PostgreSQL基础设施集成

Prerequisites

前提条件

  1. PostgreSQL 14+ with pgvector extension
  2. Database user with create table permissions
  3. Network access to PostgreSQL server
  1. 带有pgvector扩展的PostgreSQL 14+
  2. 拥有创建表权限的数据库用户
  3. 可访问PostgreSQL服务器的网络权限

Advantages

优势

BenefitDescription
👥 Team sharingMultiple users can access same index
📏 ScalableHandles large codebases
🔄 ConcurrentMultiple simultaneous searches
💾 PersistentData survives machine restarts
🔧 FamiliarStandard database tooling
优势描述
👥 团队共享多个用户可访问同一索引
📏 可扩展支持大型代码库
🔄 并发处理支持多用户同时搜索
💾 持久化数据在机器重启后仍保留
🔧 易上手采用标准数据库工具

Setting Up PostgreSQL with pgvector

配置带有pgvector的PostgreSQL

Option 1: Docker (Recommended for Development)

选项1:Docker(开发环境推荐)

bash
undefined
bash
undefined

Run PostgreSQL with pgvector

Run PostgreSQL with pgvector

docker run -d
--name grepai-postgres
-e POSTGRES_USER=grepai
-e POSTGRES_PASSWORD=grepai
-e POSTGRES_DB=grepai
-p 5432:5432
pgvector/pgvector:pg16
undefined
docker run -d
--name grepai-postgres
-e POSTGRES_USER=grepai
-e POSTGRES_PASSWORD=grepai
-e POSTGRES_DB=grepai
-p 5432:5432
pgvector/pgvector:pg16
undefined

Option 2: Install on Existing PostgreSQL

选项2:在现有PostgreSQL上安装

bash
undefined
bash
undefined

Install pgvector extension (Ubuntu/Debian)

Install pgvector extension (Ubuntu/Debian)

sudo apt install postgresql-16-pgvector
sudo apt install postgresql-16-pgvector

Or compile from source

Or compile from source

git clone https://github.com/pgvector/pgvector.git cd pgvector make sudo make install

Then enable the extension:

```sql
-- Connect to your database
CREATE EXTENSION IF NOT EXISTS vector;
git clone https://github.com/pgvector/pgvector.git cd pgvector make sudo make install

然后启用扩展:

```sql
-- Connect to your database
CREATE EXTENSION IF NOT EXISTS vector;

Option 3: Managed Services

选项3:托管服务

  • Supabase: pgvector included by default
  • Neon: pgvector available
  • AWS RDS: Install pgvector extension
  • Azure Database: pgvector available
  • Supabase: 默认包含pgvector
  • Neon: 支持pgvector
  • AWS RDS: 可安装pgvector扩展
  • Azure Database: 支持pgvector

Configuration

配置

Basic Configuration

基础配置

yaml
undefined
yaml
undefined

.grepai/config.yaml

.grepai/config.yaml

store: backend: postgres postgres: dsn: postgres://user:password@localhost:5432/grepai
undefined
store: backend: postgres postgres: dsn: postgres://user:password@localhost:5432/grepai
undefined

With Environment Variable

使用环境变量

yaml
store:
  backend: postgres
  postgres:
    dsn: ${DATABASE_URL}
Set the environment variable:
bash
export DATABASE_URL="postgres://user:password@localhost:5432/grepai"
yaml
store:
  backend: postgres
  postgres:
    dsn: ${DATABASE_URL}
设置环境变量:
bash
export DATABASE_URL="postgres://user:password@localhost:5432/grepai"

Full DSN Options

完整DSN选项

yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:password@host:5432/database?sslmode=require
DSN components:
  • user
    : Database username
  • password
    : Database password
  • host
    : Server hostname or IP
  • 5432
    : Port (default: 5432)
  • database
    : Database name
  • sslmode
    : SSL mode (disable, require, verify-full)
yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:password@host:5432/database?sslmode=require
DSN组件:
  • user
    :数据库用户名
  • password
    :数据库密码
  • host
    :服务器主机名或IP
  • 5432
    :端口(默认:5432)
  • database
    :数据库名称
  • sslmode
    :SSL模式(disable、require、verify-full)

SSL Modes

SSL模式

ModeDescriptionUse Case
disable
No SSLLocal development
require
SSL requiredProduction
verify-full
SSL + verify certificateHigh security
yaml
undefined
模式描述使用场景
disable
不使用SSL本地开发
require
必须使用SSL生产环境
verify-full
SSL + 证书验证高安全场景
yaml
undefined

Production with SSL

Production with SSL

store: backend: postgres postgres: dsn: postgres://user:pass@prod.db.com:5432/grepai?sslmode=require
undefined
store: backend: postgres postgres: dsn: postgres://user:pass@prod.db.com:5432/grepai?sslmode=require
undefined

Database Schema

数据库架构

GrepAI automatically creates these tables:
sql
-- Vector embeddings table
CREATE TABLE IF NOT EXISTS embeddings (
    id SERIAL PRIMARY KEY,
    file_path TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    start_line INTEGER,
    end_line INTEGER,
    embedding vector(768),  -- Dimension matches your model
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(file_path, chunk_index)
);

-- Index for vector similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops);
GrepAI会自动创建以下表:
sql
-- Vector embeddings table
CREATE TABLE IF NOT EXISTS embeddings (
    id SERIAL PRIMARY KEY,
    file_path TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    start_line INTEGER,
    end_line INTEGER,
    embedding vector(768),  -- Dimension matches your model
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(file_path, chunk_index)
);

-- Index for vector similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops);

Verifying Setup

验证配置

Check pgvector Extension

检查pgvector扩展

sql
-- Connect to database
psql -U grepai -d grepai

-- Check extension is installed
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Check GrepAI tables exist (after first grepai watch)
\dt
sql
-- Connect to database
psql -U grepai -d grepai

-- Check extension is installed
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Check GrepAI tables exist (after first grepai watch)\dt

Test Connection from GrepAI

从GrepAI测试连接

bash
undefined
bash
undefined

Check status

Check status

grepai status
grepai status

Should show PostgreSQL backend info

Should show PostgreSQL backend info

undefined
undefined

Performance Tuning

性能调优

PostgreSQL Configuration

PostgreSQL配置

For better vector search performance:
sql
-- Increase work memory for vector operations
SET work_mem = '256MB';

-- Adjust for your hardware
SET effective_cache_size = '4GB';
SET shared_buffers = '1GB';
为提升向量搜索性能:
sql
-- Increase work memory for vector operations
SET work_mem = '256MB';

-- Adjust for your hardware
SET effective_cache_size = '4GB';
SET shared_buffers = '1GB';

Index Tuning

索引调优

For large indices, tune the IVFFlat index:
sql
-- More lists = faster search, more memory
CREATE INDEX ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- Adjust based on row count
Rule of thumb:
lists = sqrt(rows)
对于大型索引,调优IVFFlat索引:
sql
-- More lists = faster search, more memory
CREATE INDEX ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- Adjust based on row count
经验法则:
lists = sqrt(rows)

Concurrent Access

并发访问

PostgreSQL handles concurrent access automatically:
  • Multiple
    grepai search
    commands work simultaneously
  • One
    grepai watch
    daemon per codebase
  • Many users can share the same index
PostgreSQL会自动处理并发访问:
  • 多个
    grepai search
    命令可同时运行
  • 每个代码库对应一个
    grepai watch
    守护进程
  • 多用户可共享同一索引

Team Setup

团队配置

Shared Database

共享数据库

All team members point to the same database:
yaml
undefined
所有团队成员指向同一数据库:
yaml
undefined

Each developer's .grepai/config.yaml

Each developer's .grepai/config.yaml

store: backend: postgres postgres: dsn: postgres://team:secret@shared-db.company.com:5432/grepai
undefined
store: backend: postgres postgres: dsn: postgres://team:secret@shared-db.company.com:5432/grepai
undefined

Per-Project Databases

按项目划分数据库

For isolated projects, use separate databases:
bash
undefined
对于隔离的项目,使用独立数据库:
bash
undefined

Create databases

Create databases

createdb -U postgres grepai_projecta createdb -U postgres grepai_projectb

```yaml
createdb -U postgres grepai_projecta createdb -U postgres grepai_projectb

```yaml

Project A config

Project A config

store: backend: postgres postgres: dsn: postgres://user:pass@localhost:5432/grepai_projecta
undefined
store: backend: postgres postgres: dsn: postgres://user:pass@localhost:5432/grepai_projecta
undefined

Backup and Restore

备份与恢复

Backup

备份

bash
pg_dump -U grepai -d grepai > grepai_backup.sql
bash
pg_dump -U grepai -d grepai > grepai_backup.sql

Restore

恢复

bash
psql -U grepai -d grepai < grepai_backup.sql
bash
psql -U grepai -d grepai < grepai_backup.sql

Migrating from GOB

从GOB迁移

  1. Set up PostgreSQL with pgvector
  2. Update configuration:
yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai
  1. Delete old index:
bash
rm .grepai/index.gob
  1. Re-index:
bash
grepai watch
  1. 配置带有pgvector的PostgreSQL
  2. 更新配置:
yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai
  1. 删除旧索引:
bash
rm .grepai/index.gob
  1. 重新索引:
bash
grepai watch

Common Issues

常见问题

Problem:
FATAL: password authentication failed
Solution: Check DSN credentials and pg_hba.conf
Problem:
ERROR: extension "vector" is not available
Solution: Install pgvector:
bash
sudo apt install postgresql-16-pgvector
问题:
FATAL: password authentication failed
解决方案: 检查DSN凭据和pg_hba.conf
问题:
ERROR: extension "vector" is not available
解决方案: 安装pgvector:
bash
sudo apt install postgresql-16-pgvector

Then: CREATE EXTENSION vector;

Then: CREATE EXTENSION vector;


❌ **Problem:** `ERROR: type "vector" does not exist`
✅ **Solution:** Enable extension in the database:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
Problem: Connection refused ✅ Solution:
  • Check PostgreSQL is running
  • Verify host and port
  • Check firewall rules
Problem: Slow searches ✅ Solution:
  • Add IVFFlat index
  • Increase
    work_mem
  • Vacuum and analyze tables

❌ **问题:** `ERROR: type "vector" does not exist`
✅ **解决方案:** 在数据库中启用扩展:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
问题: 连接被拒绝 ✅ 解决方案:
  • 检查PostgreSQL是否正在运行
  • 验证主机和端口
  • 检查防火墙规则
问题: 搜索缓慢 ✅ 解决方案:
  • 添加IVFFlat索引
  • 增大
    work_mem
  • 清理并分析表

Best Practices

最佳实践

  1. Use environment variables: Don't commit credentials
  2. Enable SSL: For remote databases
  3. Regular backups: pg_dump before major changes
  4. Monitor performance: Check query times
  5. Index maintenance: Regular VACUUM ANALYZE
  1. 使用环境变量: 不要提交凭据
  2. 启用SSL: 对于远程数据库
  3. 定期备份: 重大变更前使用pg_dump
  4. 监控性能: 检查查询时间
  5. 索引维护: 定期执行VACUUM ANALYZE

Output Format

输出格式

PostgreSQL storage status:
✅ PostgreSQL Storage Configured

   Backend: PostgreSQL + pgvector
   Host: localhost:5432
   Database: grepai
   SSL: disabled

   Contents:
   - Files: 2,450
   - Chunks: 12,340
   - Vector dimension: 768

   Performance:
   - Connection: OK
   - IVFFlat index: Yes
   - Search latency: ~50ms
PostgreSQL存储状态:
✅ PostgreSQL Storage Configured

   Backend: PostgreSQL + pgvector
   Host: localhost:5432
   Database: grepai
   SSL: disabled

   Contents:
   - Files: 2,450
   - Chunks: 12,340
   - Vector dimension: 768

   Performance:
   - Connection: OK
   - IVFFlat index: Yes
   - Search latency: ~50ms