grepai-embeddings-openai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GrepAI Embeddings with OpenAI

为GrepAI配置OpenAI作为嵌入提供者

This skill covers using OpenAI's embedding API with GrepAI for high-quality, cloud-based embeddings.
本技能介绍如何将OpenAI的嵌入API与GrepAI结合使用,以获取高质量的云端嵌入服务。

When to Use This Skill

何时使用本技能

  • Need highest quality embeddings
  • Team environment with shared infrastructure
  • Don't want to manage local embedding server
  • Willing to trade privacy for quality/convenience
  • 需要最高质量的嵌入结果
  • 团队环境下使用共享基础设施
  • 不想管理本地嵌入服务器
  • 愿意以隐私为代价换取质量与便捷性

Considerations

注意事项

AspectDetails
QualityState-of-the-art embeddings
SpeedFast, no local compute needed
ScalabilityHandles any codebase size
⚠️ PrivacyCode sent to OpenAI servers
⚠️ CostPay per token
⚠️ InternetRequires connection
方面详情
质量业界领先的嵌入技术
速度速度快,无需本地计算
可扩展性支持任意规模的代码库
⚠️ 隐私性代码会发送至OpenAI服务器
⚠️ 成本按token计费
⚠️ 网络要求需要网络连接

Prerequisites

前置条件

  1. OpenAI API key
  2. Billing enabled on OpenAI account
  1. OpenAI API密钥
  2. OpenAI账户已启用计费功能
获取API密钥请访问:https://platform.openai.com/api-keys

Configuration

配置步骤

Basic Configuration

基础配置

yaml
undefined
yaml
undefined

.grepai/config.yaml

.grepai/config.yaml

embedder: provider: openai model: text-embedding-3-small api_key: ${OPENAI_API_KEY}

Set the environment variable:

```bash
export OPENAI_API_KEY="sk-..."
embedder: provider: openai model: text-embedding-3-small api_key: ${OPENAI_API_KEY}

设置环境变量:

```bash
export OPENAI_API_KEY="sk-..."

With Parallel Processing

启用并行处理

yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  parallelism: 8  # Concurrent requests for speed
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  parallelism: 8  # 并发请求以提升速度

Direct API Key (Not Recommended)

直接配置API密钥(不推荐)

yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: sk-your-api-key-here  # Avoid committing secrets!
Warning: Never commit API keys to version control.
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: sk-your-api-key-here  # 请勿提交密钥至版本控制!
警告: 请勿将API密钥提交至版本控制系统。

Available Models

可用模型

text-embedding-3-small (Recommended)

text-embedding-3-small(推荐)

PropertyValue
Dimensions1536
Price$0.00002 / 1K tokens
QualityVery high
SpeedFast
Best for: Most use cases, good balance of cost/quality.
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
属性
维度1536
价格$0.00002 / 1K tokens
质量非常高
速度
最适用场景: 大多数使用场景,在成本与质量间达到良好平衡。
yaml
embedder:
  provider: openai
  model: text-embedding-3-small

text-embedding-3-large

text-embedding-3-large

PropertyValue
Dimensions3072
Price$0.00013 / 1K tokens
QualityHighest
SpeedFast
Best for: Maximum accuracy, cost not a concern.
yaml
embedder:
  provider: openai
  model: text-embedding-3-large
  dimensions: 3072
属性
维度3072
价格$0.00013 / 1K tokens
质量最高
速度
最适用场景: 追求最高准确性,成本不是主要考虑因素。
yaml
embedder:
  provider: openai
  model: text-embedding-3-large
  dimensions: 3072

Dimension Reduction

维度缩减

You can reduce dimensions to save storage:
yaml
embedder:
  provider: openai
  model: text-embedding-3-large
  dimensions: 1024  # Reduced from 3072
你可以通过缩减维度来节省存储空间:
yaml
embedder:
  provider: openai
  model: text-embedding-3-large
  dimensions: 1024  # 从3072缩减至1024

Model Comparison

模型对比

ModelDimensionsCost/1K tokensQuality
text-embedding-3-small
1536$0.00002⭐⭐⭐⭐
text-embedding-3-large
3072$0.00013⭐⭐⭐⭐⭐
模型维度每1K tokens成本质量
text-embedding-3-small
1536$0.00002⭐⭐⭐⭐
text-embedding-3-large
3072$0.00013⭐⭐⭐⭐⭐

Cost Estimation

成本估算

Approximate costs per 1000 source files:
Codebase SizeChunksSmall ModelLarge Model
Small (100 files)~500$0.01$0.06
Medium (1000 files)~5,000$0.10$0.65
Large (10000 files)~50,000$1.00$6.50
Note: Costs are one-time for initial indexing. Updates only re-embed changed files.
每1000个源文件的大致成本:
代码库规模数据块数小模型成本大模型成本
小型(100个文件)~500$0.01$0.06
中型(1000个文件)~5,000$0.10$0.65
大型(10000个文件)~50,000$1.00$6.50
注意: 初始索引的成本为一次性费用。更新时仅会重新嵌入已修改的文件。

Optimizing for Speed

速度优化

Parallel Requests

并行请求

GrepAI v0.24.0+ supports adaptive rate limiting and parallel requests:
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  parallelism: 8  # Adjust based on your rate limit tier
Parallelism recommendations:
  • Tier 1 (Free): 1-2
  • Tier 2: 4-8
  • Tier 3+: 8-16
GrepAI v0.24.0及以上版本支持自适应速率限制与并行请求:
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  parallelism: 8  # 根据你的速率限制层级调整
并行数建议:
  • 免费层级: 1-2
  • 层级2: 4-8
  • 层级3及以上: 8-16

Batching

批量处理

GrepAI automatically batches chunks for efficient API usage.
GrepAI会自动对数据块进行批量处理,以提升API使用效率。

Rate Limits

速率限制

OpenAI has rate limits based on your account tier:
TierRPMTPM
Free3150,000
Tier 15001,000,000
Tier 25,0005,000,000
GrepAI handles rate limiting automatically with adaptive backoff.
OpenAI根据账户层级设置了速率限制:
层级RPMTPM
免费3150,000
层级15001,000,000
层级25,0005,000,000
GrepAI会通过自适应退避机制自动处理速率限制问题。

Environment Variables

环境变量

Setting the API Key

设置API密钥

macOS/Linux:
bash
undefined
macOS/Linux:
bash
undefined

In ~/.bashrc, ~/.zshrc, or ~/.profile

/.bashrc、/.zshrc 或 ~/.profile 中添加

export OPENAI_API_KEY="sk-..."

**Windows (PowerShell):**
```powershell
$env:OPENAI_API_KEY = "sk-..."
export OPENAI_API_KEY="sk-..."

**Windows (PowerShell):**
```powershell
$env:OPENAI_API_KEY = "sk-..."

Or permanently

或永久设置

[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')
undefined
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')
undefined

Using .env Files

使用.env文件

Create
.env
in your project root:
OPENAI_API_KEY=sk-...
Add to
.gitignore
:
gitignore
.env
在项目根目录创建
.env
文件:
OPENAI_API_KEY=sk-...
将其添加至
.gitignore
gitignore
.env

Azure OpenAI

Azure OpenAI

For Azure-hosted OpenAI:
yaml
embedder:
  provider: openai
  model: your-deployment-name
  api_key: ${AZURE_OPENAI_API_KEY}
  endpoint: https://your-resource.openai.azure.com
对于Azure托管的OpenAI:
yaml
embedder:
  provider: openai
  model: your-deployment-name
  api_key: ${AZURE_OPENAI_API_KEY}
  endpoint: https://your-resource.openai.azure.com

Security Best Practices

安全最佳实践

  1. Use environment variables: Never hardcode API keys
  2. Add to .gitignore: Exclude
    .env
    files
  3. Rotate keys: Regularly rotate API keys
  4. Monitor usage: Check OpenAI dashboard for unexpected usage
  5. Review code: Ensure sensitive code isn't being indexed
  1. 使用环境变量: 请勿硬编码API密钥
  2. 添加至.gitignore: 排除
    .env
    文件
  3. 轮换密钥: 定期轮换API密钥
  4. 监控使用情况: 在OpenAI控制台检查是否有异常使用
  5. 审核代码: 确保敏感代码未被索引

Common Issues

常见问题

Problem:
401 Unauthorized
Solution: Check API key is correct and environment variable is set:
bash
echo $OPENAI_API_KEY
Problem:
429 Rate limit exceeded
Solution: Reduce parallelism or upgrade OpenAI tier:
yaml
embedder:
  parallelism: 2  # Lower value
Problem: High costs ✅ Solutions:
  • Use
    text-embedding-3-small
    instead of large
  • Reduce dimension size
  • Add more ignore patterns to reduce indexed files
Problem: Slow indexing ✅ Solution: Increase parallelism:
yaml
embedder:
  parallelism: 8
Problem: Privacy concerns ✅ Solution: Use Ollama for local embeddings instead
问题:
401 Unauthorized
(未授权) ✅ 解决方案: 检查API密钥是否正确,以及环境变量是否已设置:
bash
echo $OPENAI_API_KEY
问题:
429 Rate limit exceeded
(超出速率限制) ✅ 解决方案: 降低并行数或升级OpenAI账户层级:
yaml
embedder:
  parallelism: 2  # 降低并行数
问题: 成本过高 ✅ 解决方案:
  • 使用
    text-embedding-3-small
    而非大模型
  • 缩减维度大小
  • 添加更多忽略规则以减少被索引的文件数量
问题: 索引速度慢 ✅ 解决方案: 提高并行数:
yaml
embedder:
  parallelism: 8
问题: 隐私顾虑 ✅ 解决方案: 改用Ollama实现本地嵌入

Migrating from Ollama to OpenAI

从Ollama迁移至OpenAI

  1. Update configuration:
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  1. Delete existing index:
bash
rm .grepai/index.gob
  1. Re-index:
bash
grepai watch
Important: You cannot mix embeddings from different models/providers.
  1. 更新配置:
yaml
embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  1. 删除现有索引:
bash
rm .grepai/index.gob
  1. 重新索引:
bash
grepai watch
重要提示: 请勿混合使用来自不同模型或提供者的嵌入结果。

Output Format

输出格式

Successful OpenAI configuration:
✅ OpenAI Embedding Provider Configured

   Provider: OpenAI
   Model: text-embedding-3-small
   Dimensions: 1536
   Parallelism: 4
   API Key: sk-...xxxx (from environment)

   Estimated cost for this codebase:
   - Files: 245
   - Chunks: ~1,200
   - Cost: ~$0.02

   Note: Code will be sent to OpenAI servers.
成功配置OpenAI后的输出:
✅ OpenAI嵌入提供者配置完成

   提供者:OpenAI
   模型:text-embedding-3-small
   维度:1536
   并行数:4
   API密钥:sk-...xxxx(来自环境变量)

   本代码库的估算成本:
   - 文件数:245
   - 数据块数:约1200
   - 成本:约0.02美元

   注意:代码将被发送至OpenAI服务器。