grepai-embeddings-ollama

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GrepAI Embeddings with Ollama

为GrepAI配置Ollama作为嵌入模型提供商

This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.
本技能介绍如何将Ollama用作GrepAI的嵌入模型提供商,实现100%私有、本地的代码搜索。

When to Use This Skill

适用场景

  • Setting up private, local embeddings
  • Choosing the right Ollama model
  • Optimizing Ollama performance
  • Troubleshooting Ollama connection issues
  • 搭建私有、本地的嵌入生成环境
  • 选择合适的Ollama模型
  • 优化Ollama性能
  • 排查Ollama连接问题

Why Ollama?

选择Ollama的原因

AdvantageDescription
🔒 PrivacyCode never leaves your machine
💰 FreeNo API costs or usage limits
SpeedNo network latency
🔌 OfflineWorks without internet
🔧 ControlChoose your model
优势说明
🔒 隐私性代码绝不会离开你的设备
💰 免费无API费用或使用限制
速度快无网络延迟
🔌 离线可用无需互联网即可运行
🔧 可控性强可自主选择模型

Prerequisites

前置条件

  1. Ollama installed and running
  2. An embedding model downloaded
bash
undefined
  1. 已安装并运行Ollama
  2. 已下载嵌入模型
bash
undefined

Install Ollama

安装Ollama

brew install ollama # macOS
brew install ollama # macOS系统

or

curl -fsSL https://ollama.com/install.sh | sh # Linux
curl -fsSL https://ollama.com/install.sh | sh # Linux系统

Start Ollama

启动Ollama

ollama serve
ollama serve

Download model

下载模型

ollama pull nomic-embed-text
undefined
ollama pull nomic-embed-text
undefined

Configuration

配置方法

Basic Configuration

基础配置

yaml
undefined
yaml
undefined

.grepai/config.yaml

.grepai/config.yaml

embedder: provider: ollama model: nomic-embed-text endpoint: http://localhost:11434
undefined
embedder: provider: ollama model: nomic-embed-text endpoint: http://localhost:11434
undefined

With Custom Endpoint

自定义端点配置

yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://192.168.1.100:11434  # Remote Ollama server
yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://192.168.1.100:11434  # 远程Ollama服务器地址

With Explicit Dimensions

显式设置维度

yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768  # Usually auto-detected
yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768  # 通常会自动检测

Available Models

可用模型

Recommended: nomic-embed-text

推荐模型:nomic-embed-text

bash
ollama pull nomic-embed-text
PropertyValue
Dimensions768
Size~274 MB
SpeedFast
QualityExcellent for code
LanguageEnglish-optimized
Configuration:
yaml
embedder:
  provider: ollama
  model: nomic-embed-text
bash
ollama pull nomic-embed-text
属性数值
维度768
大小~274 MB
速度
质量适用于代码场景,表现优秀
语言针对英语优化
配置示例:
yaml
embedder:
  provider: ollama
  model: nomic-embed-text

Multilingual: nomic-embed-text-v2-moe

多语言模型:nomic-embed-text-v2-moe

bash
ollama pull nomic-embed-text-v2-moe
PropertyValue
Dimensions768
Size~500 MB
SpeedMedium
QualityExcellent
LanguageMultilingual
Best for codebases with non-English comments/documentation.
Configuration:
yaml
embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe
bash
ollama pull nomic-embed-text-v2-moe
属性数值
维度768
大小~500 MB
速度中等
质量表现优秀
语言多语言支持
适合包含非英语注释/文档的代码库。
配置示例:
yaml
embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe

High Quality: bge-m3

高质量模型:bge-m3

bash
ollama pull bge-m3
PropertyValue
Dimensions1024
Size~1.2 GB
SpeedSlower
QualityVery high
LanguageMultilingual
Best for large, complex codebases where accuracy is critical.
Configuration:
yaml
embedder:
  provider: ollama
  model: bge-m3
  dimensions: 1024
bash
ollama pull bge-m3
属性数值
维度1024
大小~1.2 GB
速度较慢
质量极高
语言多语言支持
适合对准确性要求极高的大型复杂代码库。
配置示例:
yaml
embedder:
  provider: ollama
  model: bge-m3
  dimensions: 1024

Maximum Quality: mxbai-embed-large

最高质量模型:mxbai-embed-large

bash
ollama pull mxbai-embed-large
PropertyValue
Dimensions1024
Size~670 MB
SpeedMedium
QualityHighest
LanguageEnglish
Configuration:
yaml
embedder:
  provider: ollama
  model: mxbai-embed-large
  dimensions: 1024
bash
ollama pull mxbai-embed-large
属性数值
维度1024
大小~670 MB
速度中等
质量最高
语言英语
配置示例:
yaml
embedder:
  provider: ollama
  model: mxbai-embed-large
  dimensions: 1024

Model Comparison

模型对比

ModelDimsSizeSpeedQualityUse Case
nomic-embed-text
768274MB⚡⚡⚡⭐⭐⭐General use
nomic-embed-text-v2-moe
768500MB⚡⚡⭐⭐⭐⭐Multilingual
bge-m3
10241.2GB⭐⭐⭐⭐⭐Large codebases
mxbai-embed-large
1024670MB⚡⚡⭐⭐⭐⭐⭐Maximum accuracy
模型维度大小速度质量适用场景
nomic-embed-text
768274MB⚡⚡⚡⭐⭐⭐通用场景
nomic-embed-text-v2-moe
768500MB⚡⚡⭐⭐⭐⭐多语言场景
bge-m3
10241.2GB⭐⭐⭐⭐⭐大型代码库
mxbai-embed-large
1024670MB⚡⚡⭐⭐⭐⭐⭐追求最高准确性

Performance Optimization

性能优化

Memory Management

内存管理

Models load into RAM. Ensure sufficient memory:
ModelRAM Required
nomic-embed-text
~500 MB
nomic-embed-text-v2-moe
~800 MB
bge-m3
~1.5 GB
mxbai-embed-large
~1 GB
模型会加载到RAM中,请确保设备有足够内存:
模型所需内存
nomic-embed-text
~500 MB
nomic-embed-text-v2-moe
~800 MB
bge-m3
~1.5 GB
mxbai-embed-large
~1 GB

GPU Acceleration

GPU加速

Ollama automatically uses:
  • macOS: Metal (Apple Silicon)
  • Linux/Windows: CUDA (NVIDIA GPUs)
Check GPU usage:
bash
ollama ps
Ollama会自动使用以下GPU加速:
  • macOS: Metal(Apple Silicon芯片)
  • Linux/Windows: CUDA(NVIDIA显卡)
查看GPU使用情况:
bash
ollama ps

Keeping Model Loaded

保持模型加载状态

By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:
bash
undefined
默认情况下,Ollama会在模型闲置5分钟后卸载。如需持续加载:
bash
undefined

Keep model loaded indefinitely

让模型保持永久加载状态

curl http://localhost:11434/api/generate -d '{ "model": "nomic-embed-text", "keep_alive": -1 }'
undefined
curl http://localhost:11434/api/generate -d '{ "model": "nomic-embed-text", "keep_alive": -1 }'
undefined

Verifying Connection

连接验证

Check Ollama is Running

检查Ollama是否运行

bash
curl http://localhost:11434/api/tags
bash
curl http://localhost:11434/api/tags

List Available Models

列出可用模型

bash
ollama list
bash
ollama list

Test Embedding

测试嵌入功能

bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function authenticate(user, password)"
}'
bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function authenticate(user, password)"
}'

Running Ollama as a Service

将Ollama作为服务运行

macOS (launchd)

macOS(launchd)

Ollama app runs automatically on login.
Ollama应用会在登录时自动启动。

Linux (systemd)

Linux(systemd)

bash
undefined
bash
undefined

Enable service

启用服务

sudo systemctl enable ollama
sudo systemctl enable ollama

Start service

启动服务

sudo systemctl start ollama
sudo systemctl start ollama

Check status

查看状态

sudo systemctl status ollama
undefined
sudo systemctl status ollama
undefined

Manual Background

手动后台运行

bash
nohup ollama serve > /dev/null 2>&1 &
bash
nohup ollama serve > /dev/null 2>&1 &

Remote Ollama Server

远程Ollama服务器

Run Ollama on a powerful server and connect remotely:
在性能强劲的服务器上运行Ollama,然后远程连接:

On the Server

服务器端设置

bash
undefined
bash
undefined

Allow remote connections

允许远程连接

OLLAMA_HOST=0.0.0.0 ollama serve
undefined
OLLAMA_HOST=0.0.0.0 ollama serve
undefined

On the Client

客户端配置

yaml
undefined
yaml
undefined

.grepai/config.yaml

.grepai/config.yaml

embedder: provider: ollama model: nomic-embed-text endpoint: http://server-ip:11434
undefined
embedder: provider: ollama model: nomic-embed-text endpoint: http://server-ip:11434
undefined

Common Issues

常见问题

Problem: Connection refused ✅ Solution:
bash
undefined
问题: 连接被拒绝 ✅ 解决方案:
bash
undefined

Start Ollama

启动Ollama

ollama serve

❌ **Problem:** Model not found
✅ **Solution:**
```bash
ollama serve

❌ **问题:** 模型未找到
✅ **解决方案:**
```bash

Pull the model

拉取模型

ollama pull nomic-embed-text

❌ **Problem:** Slow embedding generation
✅ **Solutions:**
- Use a smaller model (`nomic-embed-text`)
- Ensure GPU is being used (`ollama ps`)
- Close memory-intensive applications
- Consider a remote server with better hardware

❌ **Problem:** Out of memory
✅ **Solutions:**
- Use a smaller model
- Close other applications
- Upgrade RAM
- Use remote Ollama server

❌ **Problem:** Embeddings differ after model update
✅ **Solution:** Re-index after model updates:
```bash
rm .grepai/index.gob
grepai watch
ollama pull nomic-embed-text

❌ **问题:** 嵌入生成速度慢
✅ **解决方案:**
- 使用更小的模型(如`nomic-embed-text`)
- 确认GPU已被使用(执行`ollama ps`查看)
- 关闭占用大量内存的应用
- 考虑使用硬件配置更好的远程服务器

❌ **问题:** 内存不足
✅ **解决方案:**
- 使用更小的模型
- 关闭其他应用程序
- 升级内存
- 使用远程Ollama服务器

❌ **问题:** 模型更新后嵌入结果不一致
✅ **解决方案:** 模型更新后重新索引:
```bash
rm .grepai/index.gob
grepai watch

Best Practices

最佳实践

  1. Start with
    nomic-embed-text
    :
    Best balance of speed/quality
  2. Keep Ollama running: Background service recommended
  3. Match dimensions: Don't mix models with different dimensions
  4. Re-index on model change: Delete index and re-run watch
  5. Monitor memory: Embedding models use significant RAM
  1. nomic-embed-text
    开始:
    速度与质量的最佳平衡
  2. 保持Ollama运行: 推荐将其配置为后台服务
  3. 匹配维度: 不要混合使用不同维度的模型
  4. 更换模型后重新索引: 删除索引文件并重新运行watch命令
  5. 监控内存使用: 嵌入模型会占用大量内存

Output Format

输出格式

Successful Ollama configuration:
✅ Ollama Embedding Provider Configured

   Provider: Ollama
   Model: nomic-embed-text
   Endpoint: http://localhost:11434
   Dimensions: 768 (auto-detected)
   Status: Connected

   Model Info:
   - Size: 274 MB
   - Loaded: Yes
   - GPU: Apple Metal
Ollama配置成功后的输出示例:
✅ Ollama嵌入模型提供商已配置完成

   提供商:Ollama
   模型:nomic-embed-text
   端点:http://localhost:11434
   维度:768(自动检测)
   状态:已连接

   模型信息:
   - 大小:274 MB
   - 是否已加载:是
   - GPU:Apple Metal