grepai-embeddings-openai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrepAI Embeddings with OpenAI
为GrepAI配置OpenAI作为嵌入提供者
This skill covers using OpenAI's embedding API with GrepAI for high-quality, cloud-based embeddings.
本技能介绍如何将OpenAI的嵌入API与GrepAI结合使用,以获取高质量的云端嵌入服务。
When to Use This Skill
何时使用本技能
- Need highest quality embeddings
- Team environment with shared infrastructure
- Don't want to manage local embedding server
- Willing to trade privacy for quality/convenience
- 需要最高质量的嵌入结果
- 团队环境下使用共享基础设施
- 不想管理本地嵌入服务器
- 愿意以隐私为代价换取质量与便捷性
Considerations
注意事项
| Aspect | Details |
|---|---|
| ✅ Quality | State-of-the-art embeddings |
| ✅ Speed | Fast, no local compute needed |
| ✅ Scalability | Handles any codebase size |
| ⚠️ Privacy | Code sent to OpenAI servers |
| ⚠️ Cost | Pay per token |
| ⚠️ Internet | Requires connection |
| 方面 | 详情 |
|---|---|
| ✅ 质量 | 业界领先的嵌入技术 |
| ✅ 速度 | 速度快,无需本地计算 |
| ✅ 可扩展性 | 支持任意规模的代码库 |
| ⚠️ 隐私性 | 代码会发送至OpenAI服务器 |
| ⚠️ 成本 | 按token计费 |
| ⚠️ 网络要求 | 需要网络连接 |
Prerequisites
前置条件
- OpenAI API key
- Billing enabled on OpenAI account
Get your API key at: https://platform.openai.com/api-keys
- OpenAI API密钥
- OpenAI账户已启用计费功能
获取API密钥请访问:https://platform.openai.com/api-keys
Configuration
配置步骤
Basic Configuration
基础配置
yaml
undefinedyaml
undefined.grepai/config.yaml
.grepai/config.yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
Set the environment variable:
```bash
export OPENAI_API_KEY="sk-..."embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
设置环境变量:
```bash
export OPENAI_API_KEY="sk-..."With Parallel Processing
启用并行处理
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
parallelism: 8 # Concurrent requests for speedyaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
parallelism: 8 # 并发请求以提升速度Direct API Key (Not Recommended)
直接配置API密钥(不推荐)
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: sk-your-api-key-here # Avoid committing secrets!Warning: Never commit API keys to version control.
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: sk-your-api-key-here # 请勿提交密钥至版本控制!警告: 请勿将API密钥提交至版本控制系统。
Available Models
可用模型
text-embedding-3-small (Recommended)
text-embedding-3-small(推荐)
| Property | Value |
|---|---|
| Dimensions | 1536 |
| Price | $0.00002 / 1K tokens |
| Quality | Very high |
| Speed | Fast |
Best for: Most use cases, good balance of cost/quality.
yaml
embedder:
provider: openai
model: text-embedding-3-small| 属性 | 值 |
|---|---|
| 维度 | 1536 |
| 价格 | $0.00002 / 1K tokens |
| 质量 | 非常高 |
| 速度 | 快 |
最适用场景: 大多数使用场景,在成本与质量间达到良好平衡。
yaml
embedder:
provider: openai
model: text-embedding-3-smalltext-embedding-3-large
text-embedding-3-large
| Property | Value |
|---|---|
| Dimensions | 3072 |
| Price | $0.00013 / 1K tokens |
| Quality | Highest |
| Speed | Fast |
Best for: Maximum accuracy, cost not a concern.
yaml
embedder:
provider: openai
model: text-embedding-3-large
dimensions: 3072| 属性 | 值 |
|---|---|
| 维度 | 3072 |
| 价格 | $0.00013 / 1K tokens |
| 质量 | 最高 |
| 速度 | 快 |
最适用场景: 追求最高准确性,成本不是主要考虑因素。
yaml
embedder:
provider: openai
model: text-embedding-3-large
dimensions: 3072Dimension Reduction
维度缩减
You can reduce dimensions to save storage:
yaml
embedder:
provider: openai
model: text-embedding-3-large
dimensions: 1024 # Reduced from 3072你可以通过缩减维度来节省存储空间:
yaml
embedder:
provider: openai
model: text-embedding-3-large
dimensions: 1024 # 从3072缩减至1024Model Comparison
模型对比
| Model | Dimensions | Cost/1K tokens | Quality |
|---|---|---|---|
| 1536 | $0.00002 | ⭐⭐⭐⭐ |
| 3072 | $0.00013 | ⭐⭐⭐⭐⭐ |
| 模型 | 维度 | 每1K tokens成本 | 质量 |
|---|---|---|---|
| 1536 | $0.00002 | ⭐⭐⭐⭐ |
| 3072 | $0.00013 | ⭐⭐⭐⭐⭐ |
Cost Estimation
成本估算
Approximate costs per 1000 source files:
| Codebase Size | Chunks | Small Model | Large Model |
|---|---|---|---|
| Small (100 files) | ~500 | $0.01 | $0.06 |
| Medium (1000 files) | ~5,000 | $0.10 | $0.65 |
| Large (10000 files) | ~50,000 | $1.00 | $6.50 |
Note: Costs are one-time for initial indexing. Updates only re-embed changed files.
每1000个源文件的大致成本:
| 代码库规模 | 数据块数 | 小模型成本 | 大模型成本 |
|---|---|---|---|
| 小型(100个文件) | ~500 | $0.01 | $0.06 |
| 中型(1000个文件) | ~5,000 | $0.10 | $0.65 |
| 大型(10000个文件) | ~50,000 | $1.00 | $6.50 |
注意: 初始索引的成本为一次性费用。更新时仅会重新嵌入已修改的文件。
Optimizing for Speed
速度优化
Parallel Requests
并行请求
GrepAI v0.24.0+ supports adaptive rate limiting and parallel requests:
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
parallelism: 8 # Adjust based on your rate limit tierParallelism recommendations:
- Tier 1 (Free): 1-2
- Tier 2: 4-8
- Tier 3+: 8-16
GrepAI v0.24.0及以上版本支持自适应速率限制与并行请求:
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
parallelism: 8 # 根据你的速率限制层级调整并行数建议:
- 免费层级: 1-2
- 层级2: 4-8
- 层级3及以上: 8-16
Batching
批量处理
GrepAI automatically batches chunks for efficient API usage.
GrepAI会自动对数据块进行批量处理,以提升API使用效率。
Rate Limits
速率限制
OpenAI has rate limits based on your account tier:
| Tier | RPM | TPM |
|---|---|---|
| Free | 3 | 150,000 |
| Tier 1 | 500 | 1,000,000 |
| Tier 2 | 5,000 | 5,000,000 |
GrepAI handles rate limiting automatically with adaptive backoff.
OpenAI根据账户层级设置了速率限制:
| 层级 | RPM | TPM |
|---|---|---|
| 免费 | 3 | 150,000 |
| 层级1 | 500 | 1,000,000 |
| 层级2 | 5,000 | 5,000,000 |
GrepAI会通过自适应退避机制自动处理速率限制问题。
Environment Variables
环境变量
Setting the API Key
设置API密钥
macOS/Linux:
bash
undefinedmacOS/Linux:
bash
undefinedIn ~/.bashrc, ~/.zshrc, or ~/.profile
在 /.bashrc、/.zshrc 或 ~/.profile 中添加
export OPENAI_API_KEY="sk-..."
**Windows (PowerShell):**
```powershell
$env:OPENAI_API_KEY = "sk-..."export OPENAI_API_KEY="sk-..."
**Windows (PowerShell):**
```powershell
$env:OPENAI_API_KEY = "sk-..."Or permanently
或永久设置
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')
undefined[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')
undefinedUsing .env Files
使用.env文件
Create in your project root:
.envOPENAI_API_KEY=sk-...Add to :
.gitignoregitignore
.env在项目根目录创建 文件:
.envOPENAI_API_KEY=sk-...将其添加至 :
.gitignoregitignore
.envAzure OpenAI
Azure OpenAI
For Azure-hosted OpenAI:
yaml
embedder:
provider: openai
model: your-deployment-name
api_key: ${AZURE_OPENAI_API_KEY}
endpoint: https://your-resource.openai.azure.com对于Azure托管的OpenAI:
yaml
embedder:
provider: openai
model: your-deployment-name
api_key: ${AZURE_OPENAI_API_KEY}
endpoint: https://your-resource.openai.azure.comSecurity Best Practices
安全最佳实践
- Use environment variables: Never hardcode API keys
- Add to .gitignore: Exclude files
.env - Rotate keys: Regularly rotate API keys
- Monitor usage: Check OpenAI dashboard for unexpected usage
- Review code: Ensure sensitive code isn't being indexed
- 使用环境变量: 请勿硬编码API密钥
- 添加至.gitignore: 排除 文件
.env - 轮换密钥: 定期轮换API密钥
- 监控使用情况: 在OpenAI控制台检查是否有异常使用
- 审核代码: 确保敏感代码未被索引
Common Issues
常见问题
❌ Problem:
✅ Solution: Check API key is correct and environment variable is set:
401 Unauthorizedbash
echo $OPENAI_API_KEY❌ Problem:
✅ Solution: Reduce parallelism or upgrade OpenAI tier:
429 Rate limit exceededyaml
embedder:
parallelism: 2 # Lower value❌ Problem: High costs
✅ Solutions:
- Use instead of large
text-embedding-3-small - Reduce dimension size
- Add more ignore patterns to reduce indexed files
❌ Problem: Slow indexing
✅ Solution: Increase parallelism:
yaml
embedder:
parallelism: 8❌ Problem: Privacy concerns
✅ Solution: Use Ollama for local embeddings instead
❌ 问题: (未授权)
✅ 解决方案: 检查API密钥是否正确,以及环境变量是否已设置:
401 Unauthorizedbash
echo $OPENAI_API_KEY❌ 问题: (超出速率限制)
✅ 解决方案: 降低并行数或升级OpenAI账户层级:
429 Rate limit exceededyaml
embedder:
parallelism: 2 # 降低并行数❌ 问题: 成本过高
✅ 解决方案:
- 使用 而非大模型
text-embedding-3-small - 缩减维度大小
- 添加更多忽略规则以减少被索引的文件数量
❌ 问题: 索引速度慢
✅ 解决方案: 提高并行数:
yaml
embedder:
parallelism: 8❌ 问题: 隐私顾虑
✅ 解决方案: 改用Ollama实现本地嵌入
Migrating from Ollama to OpenAI
从Ollama迁移至OpenAI
- Update configuration:
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}- Delete existing index:
bash
rm .grepai/index.gob- Re-index:
bash
grepai watchImportant: You cannot mix embeddings from different models/providers.
- 更新配置:
yaml
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}- 删除现有索引:
bash
rm .grepai/index.gob- 重新索引:
bash
grepai watch重要提示: 请勿混合使用来自不同模型或提供者的嵌入结果。
Output Format
输出格式
Successful OpenAI configuration:
✅ OpenAI Embedding Provider Configured
Provider: OpenAI
Model: text-embedding-3-small
Dimensions: 1536
Parallelism: 4
API Key: sk-...xxxx (from environment)
Estimated cost for this codebase:
- Files: 245
- Chunks: ~1,200
- Cost: ~$0.02
Note: Code will be sent to OpenAI servers.成功配置OpenAI后的输出:
✅ OpenAI嵌入提供者配置完成
提供者:OpenAI
模型:text-embedding-3-small
维度:1536
并行数:4
API密钥:sk-...xxxx(来自环境变量)
本代码库的估算成本:
- 文件数:245
- 数据块数:约1200
- 成本:约0.02美元
注意:代码将被发送至OpenAI服务器。