grepai-storage-gob

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GrepAI Storage with GOB

基于GOB的GrepAI存储方案

This skill covers using GOB (Go Binary) as the storage backend for GrepAI, the default and simplest option.
本技能介绍如何将GOB(Go Binary,Go语言原生二进制序列化格式)作为GrepAI的存储后端,这是默认且最简单的存储选项。

When to Use This Skill

适用场景

  • Single developer projects
  • Small to medium codebases
  • Simple setup without external dependencies
  • Local development environments
  • 个人开发者项目
  • 中小型代码库
  • 无需外部依赖的简单部署
  • 本地开发环境

What is GOB Storage?

什么是GOB存储?

GOB is Go's native binary serialization format. GrepAI uses it to store:
  • Vector embeddings
  • File metadata
  • Chunk information
Everything is stored in a single local file.
GOB是Go语言的原生二进制序列化格式。GrepAI使用它存储以下内容:
  • 向量嵌入(Vector embeddings)
  • 文件元数据
  • 代码块信息
所有数据都存储在单个本地文件中。

Advantages

优势

BenefitDescription
🚀 SimpleNo external services needed
Fast setupWorks immediately
📁 PortableSingle file, easy to backup
💰 FreeNo infrastructure costs
🔒 PrivateData stays local
优势说明
🚀 简单易用无需依赖外部服务
快速部署配置后即可立即使用
📁 可移植性强单文件存储,便于备份
💰 零成本无基础设施开销
🔒 数据私密数据完全保存在本地

Limitations

局限性

LimitationDescription
📏 ScalabilityNot ideal for very large codebases
👤 Single userNo concurrent access
🔄 No sharingCan't share index across machines
💾 MemoryLoads into RAM for searches
限制说明
📏 扩展性不足不适用于超大规模代码库
👤 单用户限制不支持并发访问
🔄 无法共享无法跨机器共享索引
💾 内存占用搜索时需将整个索引加载至内存

Configuration

配置方法

Default Configuration

默认配置

GOB is the default backend. Minimal config:
yaml
undefined
GOB是默认的存储后端,仅需最简配置:
yaml
undefined

.grepai/config.yaml

.grepai/config.yaml

store: backend: gob
undefined
store: backend: gob
undefined

Explicit Configuration

显式配置

yaml
store:
  backend: gob
  # Index stored in .grepai/index.gob (automatic)
yaml
store:
  backend: gob
  # 索引自动存储于 .grepai/index.gob

Storage Location

存储位置

GOB storage creates files in your project's
.grepai/
directory:
.grepai/
├── config.yaml    # Configuration
├── index.gob      # Vector embeddings
└── symbols.gob    # Symbol index for trace
GOB存储会在项目的
.grepai/
目录下生成文件:
.grepai/
├── config.yaml    # 配置文件
├── index.gob      # 向量嵌入索引
└── symbols.gob    # 用于追踪的符号索引

File Sizes

文件大小参考

Approximate
.grepai/index.gob
sizes:
CodebaseFilesChunksIndex Size
Small100500~5 MB
Medium1,0005,000~50 MB
Large10,00050,000~500 MB
.grepai/index.gob
的近似大小:
代码库规模文件数量代码块数量索引大小
小型100500~5 MB
中型1,0005,000~50 MB
大型10,00050,000~500 MB

Operations

操作指南

Creating the Index

创建索引

bash
undefined
bash
undefined

Initialize project

初始化项目

grepai init
grepai init

Start indexing (creates index.gob)

开始索引(生成index.gob)

grepai watch
undefined
grepai watch
undefined

Checking Index Status

查看索引状态

bash
grepai status
bash
grepai status

Output:

输出示例:

Index: .grepai/index.gob

Index: .grepai/index.gob

Files: 245

Files: 245

Chunks: 1,234

Chunks: 1,234

Size: 12.5 MB

Size: 12.5 MB

Last updated: 2025-01-28 10:30:00

Last updated: 2025-01-28 10:30:00

undefined
undefined

Backing Up the Index

备份索引

bash
undefined
bash
undefined

Simple file copy

简单文件复制

cp .grepai/index.gob .grepai/index.gob.backup
undefined
cp .grepai/index.gob .grepai/index.gob.backup
undefined

Clearing the Index

清空索引

bash
undefined
bash
undefined

Delete and re-index

删除后重新索引

rm .grepai/index.gob grepai watch
undefined
rm .grepai/index.gob grepai watch
undefined

Moving to a New Machine

迁移至新机器

bash
undefined
bash
undefined

Copy entire .grepai directory

复制整个.grepai目录

cp -r .grepai /path/to/new/location/
cp -r .grepai /path/to/new/location/

Note: Only works if using same embedding model

注意:仅当使用相同的嵌入模型时生效

undefined
undefined

Performance Considerations

性能考量

Memory Usage

内存占用

GOB loads the entire index into RAM for searches:
Index SizeRAM Usage
10 MB~20 MB
50 MB~100 MB
500 MB~1 GB
GOB会将整个索引加载至内存以进行搜索:
索引大小内存占用
10 MB~20 MB
50 MB~100 MB
500 MB~1 GB

Search Speed

搜索速度

GOB provides fast searches for typical codebases:
Codebase SizeSearch Time
Small (100 files)<50ms
Medium (1K files)<200ms
Large (10K files)<1s
对于常规代码库,GOB可提供较快的搜索速度:
代码库规模搜索耗时
小型(100个文件)<50ms
中型(1000个文件)<200ms
大型(10000个文件)<1s

When to Upgrade

升级时机

Consider PostgreSQL or Qdrant when:
  • Index exceeds 1 GB
  • Need concurrent access
  • Want to share index across team
  • Codebase has 50K+ files
当出现以下情况时,建议迁移至PostgreSQL或Qdrant:
  • 索引大小超过1 GB
  • 需要支持并发访问
  • 需在团队内共享索引
  • 代码库文件数量超过50000个

.gitignore Configuration

.gitignore配置

Add
.grepai/
to your
.gitignore
:
gitignore
undefined
.grepai/
添加至你的
.gitignore
gitignore
undefined

GrepAI (machine-specific index)

GrepAI(机器专属索引)

.grepai/

**Why:** The index is machine-specific because:
- Contains binary embeddings
- Tied to the embedding model used
- Each machine should generate its own
.grepai/

**原因:** 索引是机器专属的,因为:
- 包含二进制嵌入数据
- 与所使用的嵌入模型绑定
- 每台机器应生成自己的索引

Sharing Index (Not Recommended)

索引共享(不推荐)

While you can copy the index file, it's not recommended because:
  1. Must use identical embedding model
  2. File paths are absolute
  3. Different machines may have different code versions
Better approach: Each developer runs their own
grepai watch
.
虽然可以复制索引文件,但不建议这么做,原因如下:
  1. 必须使用完全相同的嵌入模型
  2. 文件路径为绝对路径
  3. 不同机器的代码版本可能存在差异
更佳方案: 每位开发者独立运行
grepai watch
生成索引。

Migrating to Other Backends

迁移至其他后端

To PostgreSQL

迁移至PostgreSQL

  1. Update config:
yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai
  1. Re-index:
bash
rm .grepai/index.gob
grepai watch
  1. 更新配置:
yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai
  1. 重新索引:
bash
rm .grepai/index.gob
grepai watch

To Qdrant

迁移至Qdrant

  1. Update config:
yaml
store:
  backend: qdrant
  qdrant:
    endpoint: localhost
    port: 6334
  1. Re-index:
bash
rm .grepai/index.gob
grepai watch
  1. 更新配置:
yaml
store:
  backend: qdrant
  qdrant:
    endpoint: localhost
    port: 6334
  1. 重新索引:
bash
rm .grepai/index.gob
grepai watch

Common Issues

常见问题

Problem: Index file too large ✅ Solution: Add more ignore patterns or migrate to PostgreSQL/Qdrant
Problem: Slow searches on large codebase ✅ Solution: Migrate to Qdrant for better performance
Problem: Corrupted index ✅ Solution: Delete and re-index:
bash
rm .grepai/index.gob .grepai/symbols.gob
grepai watch
Problem: "Index not found" error ✅ Solution: Run
grepai watch
to create the index
问题: 索引文件过大 ✅ 解决方案: 添加更多忽略规则,或迁移至PostgreSQL/Qdrant
问题: 大型代码库搜索缓慢 ✅ 解决方案: 迁移至Qdrant以获得更好的性能
问题: 索引损坏 ✅ 解决方案: 删除后重新索引:
bash
rm .grepai/index.gob .grepai/symbols.gob
grepai watch
问题: 出现“索引未找到”错误 ✅ 解决方案: 运行
grepai watch
生成索引

Best Practices

最佳实践

  1. Use for small/medium projects: Up to ~10K files
  2. Add to .gitignore: Don't commit the index
  3. Backup before major changes: Copy index.gob before experiments
  4. Re-index after model changes: If you change embedding models
  5. Monitor file size: Migrate if index exceeds 1GB
  1. 适用于中小型项目: 建议代码库文件数量不超过10000个
  2. 添加至.gitignore: 不要提交索引文件至版本库
  3. 重大变更前备份: 实验前复制index.gob进行备份
  4. 更换模型后重新索引: 若更换嵌入模型,需重新生成索引
  5. 监控文件大小: 若索引超过1GB,考虑迁移至其他后端

Output Format

输出格式

GOB storage status:
✅ GOB Storage Configured

   Backend: GOB (local file)
   Index: .grepai/index.gob
   Size: 12.5 MB

   Contents:
   - Files: 245
   - Chunks: 1,234
   - Vectors: 1,234 × 768 dimensions

   Performance:
   - Search latency: <100ms
   - Memory usage: ~25 MB
GOB存储状态示例:
✅ GOB存储已配置

   后端类型: GOB(本地文件)
   索引路径: .grepai/index.gob
   大小: 12.5 MB

   内容详情:
   - 已索引文件: 245
   - 已索引代码块: 1,234
   - 向量数量: 1,234 × 768 维度

   性能指标:
   - 搜索延迟: <100ms
   - 内存占用: ~25 MB