osgrep-reference

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

osgrep: Semantic Code Search

osgrep:语义代码搜索

Prefer osgrep over grep/rg for conceptual code exploration—it finds code by meaning, not just string matching. For exact identifier or literal string searches, grep/rg remains appropriate.
在进行概念性代码探索时,优先使用osgrep而非grep/rg——它通过代码含义而非单纯的字符串匹配来查找代码。若要搜索确切的标识符或字面字符串,grep/rg仍是合适的选择。

Overview

概述

osgrep is a natural-language semantic code search tool that finds code by concept rather than keyword matching. Unlike
grep
which matches literal strings, osgrep understands code semantics using local AI embeddings.
Version 0.5.16 (Dec 2025) highlights:
  • skeleton
    command: Compress files to function/class signatures (~85% token reduction)
  • trace
    command: Show who calls/what calls for any symbol (call graph)
  • symbols
    command: List all indexed symbols with definitions
  • doctor
    command: Health/integrity verification
  • list
    command: Display all indexed repositories
  • Per-project
    .osgrep/
    directories (no longer global
    ~/.osgrep/data
    )
  • V2 architecture with improved performance (~20% token savings, ~30% speedup)
  • Go language support
  • --reset
    flag for clean re-indexing
  • ColBERT reranking for better result relevance
  • Role detection: distinguishes orchestration logic from type definitions
  • Split searching: separate "Code" and "Docs" indices
When to use osgrep:
  • Exploring unfamiliar codebases ("where is the auth logic?")
  • Finding conceptual patterns ("show me error handling")
  • Locating cross-cutting concerns ("all database migrations")
  • User explicitly asks to search code semantically
When to use traditional tools:
  • Searching for exact strings or identifiers (use
    Grep
    )
  • Finding files by name pattern (use
    Glob
    )
  • Already know the exact location (use
    Read
    )
osgrep是一款自然语言语义代码搜索工具,它通过概念而非关键词匹配来查找代码。与匹配字面字符串的
grep
不同,osgrep利用本地AI嵌入理解代码语义。
版本0.5.16(2025年12月)亮点:
  • skeleton
    命令:将文件压缩为函数/类签名(约减少85%的令牌数量)
  • trace
    命令:显示任意符号的调用方与被调用方(调用图)
  • symbols
    命令:列出所有已索引的符号及其定义
  • doctor
    命令:健康/完整性验证
  • list
    命令:显示所有已索引的仓库
  • 每个项目独立的
    .osgrep/
    目录(不再使用全局
    ~/.osgrep/data
  • V2架构,性能提升(约减少20%令牌使用,速度提升约30%)
  • 支持Go语言
  • --reset
    标志:用于完全重新索引
  • ColBERT重排序:提升结果相关性
  • 角色检测:区分编排逻辑与类型定义
  • 拆分搜索:独立的「代码」和「文档」索引
何时使用osgrep:
  • 探索不熟悉的代码库(如“认证逻辑在哪里?”)
  • 查找概念性模式(如“展示错误处理代码”)
  • 定位横切关注点(如“所有数据库迁移代码”)
  • 用户明确要求进行语义化代码搜索
何时使用传统工具:
  • 搜索确切字符串或标识符(使用
    Grep
  • 按名称模式查找文件(使用
    Glob
  • 已知道代码的确切位置(使用
    Read

Quick Start

快速开始

Run osgrep from within the project directory, since it uses per-project
.osgrep/
indexes.
bash
cd /path/to/project      # REQUIRED: cd into the project first
osgrep "your query"      # Now search works
请在项目目录内运行osgrep,因为它使用每个项目独立的
.osgrep/
索引。
bash
cd /path/to/project      # 必须:先进入项目目录
osgrep "your query"      # 现在可以进行搜索

Basic Search

基础搜索

bash
osgrep "your semantic query"
osgrep search "your query" path/to/scope    # Scope to subdirectory
osgrep skeleton src/file.py                 # Compress file to signatures
osgrep trace functionName                   # Show call graph
osgrep symbols                              # List all symbols
Examples:
bash
osgrep "user registration flow"
osgrep "webhook signature validation"
osgrep "database transaction handling"
osgrep "how are plugins loaded" packages/src
bash
osgrep "你的语义查询"
osgrep search "你的查询" path/to/scope    # 限定搜索子目录
osgrep skeleton src/file.py                 # 将文件压缩为签名
osgrep trace functionName                   # 显示调用图
osgrep symbols                              # 列出所有符号
示例:
bash
osgrep "用户注册流程"
osgrep "Webhook签名验证"
osgrep "数据库事务处理"
osgrep "插件如何加载" packages/src

Output Format

输出格式

Returns results in this format:
IMPLEMENTATION path/to/file:line
Score: 0.95

Preamble:
[code snippet or content preview]
...
  • IMPLEMENTATION: Tag indicating the type of match
  • Score: Relevance score (0-1, higher is better)
  • ...: Truncation marker—snippet is incomplete, use
    Read
    for full context
返回结果格式如下:
IMPLEMENTATION path/to/file:line
Score: 0.95

Preamble:
[代码片段或内容预览]
...
  • IMPLEMENTATION:匹配类型的标签
  • Score:相关性分数(0-1,分数越高相关性越好)
  • ...:截断标记——片段不完整,使用
    Read
    查看完整上下文

Search Strategy

搜索策略

For Architectural/System-Level Questions

针对架构/系统级问题

Use for: auth, integrations, file watching, cross-cutting concerns
  1. Search broadly first to map the landscape:
    bash
    osgrep "authentication authorization checks"
  2. Survey the results - look for patterns across multiple files:
    • Are checks in middleware? Decorators? Multiple services?
    • Do file paths suggest different layers (gateway, handlers, utils)?
  3. Read strategically - pick 2-4 files that represent different aspects:
    • Read the main entry point
    • Read representative middleware/util files
    • Follow imports if architecture is unclear
  4. Refine with specific searches if one aspect is unclear:
    bash
    osgrep "session validation logic"
    osgrep "API authentication middleware"
适用于:认证、集成、文件监听、横切关注点
  1. 先进行宽泛搜索以梳理整体架构:
    bash
    osgrep "authentication authorization checks"
  2. 分析结果——查找多个文件中的模式:
    • 检查逻辑是否在中间件、装饰器或多个服务中?
    • 文件路径是否暗示不同的层级(网关、处理器、工具类)?
  3. 有策略地阅读——选择2-4个代表不同方面的文件:
    • 阅读主入口文件
    • 阅读具有代表性的中间件/工具类文件
    • 若架构不清晰,跟随导入关系探索
  4. 若某方面不清晰,使用特定搜索细化
    bash
    osgrep "session validation logic"
    osgrep "API authentication middleware"

For Targeted Implementation Details

针对具体实现细节

Use for: specific function, algorithm, single feature
  1. Search specifically about the precise logic:
    bash
    osgrep "logic for merging user and default configuration"
  2. Evaluate the semantic match:
    • Does the snippet look relevant?
    • If it ends in
      ...
      or cuts off mid-logic, read the file
  3. One search, one read: Use osgrep to pinpoint the best file, then read it fully.
适用于:特定函数、算法、单一功能
  1. 针对精确逻辑进行具体搜索
    bash
    osgrep "logic for merging user and default configuration"
  2. 评估语义匹配度
    • 片段看起来相关吗?
    • 如果片段以
      ...
      结尾或逻辑被截断,请阅读完整文件
  3. 一次搜索,一次阅读:使用osgrep定位最佳文件,然后完整阅读该文件

CLI Reference

CLI参考

Search Options

搜索选项

Control result count:
bash
osgrep "validation logic" -m 20           # Max 20 results total (default: 10)
osgrep "validation logic" --per-file 3    # Up to 3 matches per file (default: 1)
Output formats:
bash
osgrep "API endpoints" --compact           # File paths only
osgrep "API endpoints" --content           # Full chunk content (not just snippets)
osgrep "API endpoints" --scores            # Show relevance scores
osgrep "API endpoints" --plain             # Disable ANSI colors
Sync before search:
bash
osgrep "validation logic" -s               # Sync files to index before searching
osgrep "validation logic" -d               # Dry run (show what would sync)
控制结果数量:
bash
osgrep "validation logic" -m 20           # 最多返回20条结果(默认:10条)
osgrep "validation logic" --per-file 3    # 每个文件最多返回3个匹配(默认:1个)
输出格式:
bash
osgrep "API endpoints" --compact           # 仅显示文件路径
osgrep "API endpoints" --content           # 显示完整块内容(而非仅片段)
osgrep "API endpoints" --scores            # 显示相关性分数
osgrep "API endpoints" --plain             # 禁用ANSI颜色
搜索前同步索引:
bash
osgrep "validation logic" -s               # 搜索前将文件同步到索引
osgrep "validation logic" -d               # 试运行(显示将同步的内容)

Index Management

索引管理

bash
osgrep index                    # Incremental update
osgrep index -r                 # Full re-index from scratch (--reset)
osgrep index -p /path/to/repo   # Index a specific directory
osgrep index -d                 # Preview what would be indexed (--dry-run)
bash
osgrep index                    # 增量更新索引
osgrep index -r                 # 从头开始完全重新索引(--reset)
osgrep index -p /path/to/repo   # 索引特定目录
osgrep index -d                 # 预览将被索引的内容(--dry-run)

Advanced Commands (v0.5+)

高级命令(v0.5+)

Skeleton - Compress files to signatures:
bash
osgrep skeleton src/server.py              # Show function/class signatures only
osgrep skeleton src/server.py --no-summary # Omit call/complexity summaries
osgrep skeleton "auth logic" -l 5          # Query mode: skeleton of top 5 matching files
Output shows: function signatures with
# → calls | C:N | ORCH
summaries inside bodies.
Trace - Show call graph:
bash
osgrep trace handleRequest                 # Who calls this? What does it call?
Symbols - List all indexed symbols:
bash
osgrep symbols                             # All symbols (default limit: 20)
osgrep symbols "Request"                   # Filter by pattern
osgrep symbols -p src/api/ -l 50           # Filter by path, increase limit
Skeleton - 将文件压缩为签名:
bash
osgrep skeleton src/server.py              # 仅显示函数/类签名
osgrep skeleton src/server.py --no-summary # 省略调用/复杂度摘要
osgrep skeleton "auth logic" -l 5          # 查询模式:显示前5个匹配文件的签名
输出内容:函数签名,内部包含
# → calls | C:N | ORCH
格式的摘要。
Trace - 显示调用图:
bash
osgrep trace handleRequest                 # 谁调用了该函数?该函数调用了什么?
Symbols - 列出所有已索引的符号:
bash
osgrep symbols                             # 所有符号(默认限制:20个)
osgrep symbols "Request"                   # 按模式过滤
osgrep symbols -p src/api/ -l 50           # 按路径过滤,增加结果限制数量

Other Commands

其他命令

bash
osgrep list                     # Show all indexed repositories
osgrep doctor                   # Check health and configuration
osgrep setup                    # Pre-download models (~150MB)
osgrep serve                    # Run background daemon (port 4444)
osgrep serve -p 8080            # Custom port (or OSGREP_PORT=8080)
osgrep serve -b                 # Run in background (--background)
osgrep serve status             # Check if daemon is running
osgrep serve stop               # Stop daemon
osgrep serve stop --all         # Stop all daemons
Serve endpoints:
  • GET /health
    - Health check
  • POST /search
    - Search with
    { query, limit, path, rerank }
  • Lock file:
    .osgrep/server.json
    with
    port
    /
    pid
bash
osgrep list                     # 显示所有已索引的仓库
osgrep doctor                   # 检查健康状态与配置
osgrep setup                    # 预下载模型(约150MB)
osgrep serve                    # 运行后台守护进程(端口4444)
osgrep serve -p 8080            # 自定义端口(或设置环境变量OSGREP_PORT=8080)
osgrep serve -b                 # 在后台运行(--background)
osgrep serve status             # 检查守护进程是否运行
osgrep serve stop               # 停止守护进程
osgrep serve stop --all         # 停止所有守护进程
Serve端点:
  • GET /health
    - 健康检查
  • POST /search
    - 使用
    { query, limit, path, rerank }
    参数进行搜索
  • 锁定文件:
    .osgrep/server.json
    ,包含
    port
    /
    pid
    信息

Claude Code Integration

Claude Code集成

bash
osgrep install-claude-code      # Install as Claude Code plugin
osgrep install-opencode         # Install for Opencode
Both plugins automatically manage the background server lifecycle during sessions.
bash
osgrep install-claude-code      # 安装为Claude Code插件
osgrep install-opencode         # 为Opencode安装插件
这两个插件会在会话期间自动管理后台服务器的生命周期。

Common Search Patterns

常见搜索模式

Architecture Exploration

架构探索

bash
undefined
bash
undefined

Mental processes (Open Souls / Daimonic)

思维流程(Open Souls / Daimonic)

osgrep "mental processes that orchestrate conversation flow" osgrep "subprocesses that learn about the user" osgrep "cognitive steps using structured output"
osgrep "mental processes that orchestrate conversation flow" osgrep "subprocesses that learn about the user" osgrep "cognitive steps using structured output"

React/Next.js

React/Next.js

osgrep "where do we fetch data in components?" osgrep "custom hooks for API calls" osgrep "protected route implementation"
osgrep "where do we fetch data in components?" osgrep "custom hooks for API calls" osgrep "protected route implementation"

Backend

后端

osgrep "request validation middleware" osgrep "authentication flow" osgrep "rate limiting logic"
undefined
osgrep "request validation middleware" osgrep "authentication flow" osgrep "rate limiting logic"
undefined

Business Logic

业务逻辑

bash
osgrep "payment processing"
osgrep "notification sending"
osgrep "user permission checks"
osgrep "order fulfillment workflow"
bash
osgrep "payment processing"
osgrep "notification sending"
osgrep "user permission checks"
osgrep "order fulfillment workflow"

Cross-Cutting Concerns

横切关注点

bash
osgrep "error handling patterns"
osgrep "logging configuration"
osgrep "database migrations"
osgrep "environment variable usage"
bash
osgrep "error handling patterns"
osgrep "logging configuration"
osgrep "database migrations"
osgrep "environment variable usage"

Tips for Effective Queries

高效查询技巧

Trust the Semantics

信任语义匹配

You don't need exact names. Conceptual queries work better:
bash
undefined
无需使用确切名称,概念性查询效果更好:
bash
undefined

Good - conceptual

推荐:概念性查询

osgrep "how does the server start" osgrep "component state management"
osgrep "how does the server start" osgrep "component state management"

Less effective - too literal

效果较差:过于字面化

osgrep "server.init" osgrep "useState"
undefined
osgrep "server.init" osgrep "useState"
undefined

Be Specific

明确查询意图

bash
undefined
bash
undefined

Too vague

过于模糊

osgrep "code"
osgrep "code"

Clear intent

意图清晰

osgrep "user registration validation logic"
undefined
osgrep "user registration validation logic"
undefined

Use Natural Language

使用自然语言

bash
osgrep "how do we handle payment failures?"
osgrep "what happens when a webhook arrives?"
osgrep "where is user input sanitized?"
bash
osgrep "how do we handle payment failures?"
osgrep "what happens when a webhook arrives?"
osgrep "where is user input sanitized?"

Watch for Distributed Patterns

留意分布式模式

If results span 5+ files in different directories, the feature is likely architectural—survey before diving deep.
如果结果分布在5个以上不同目录的文件中,该功能很可能是架构级的——先整体梳理再深入研究。

Don't Over-Rely on Snippets

不要过度依赖代码片段

For architectural questions, snippets are signposts, not answers. Read the key files.
对于架构级问题,代码片段只是指引,而非答案。请阅读关键文件。

Technical Details

技术细节

  • 100% Local: Uses transformers.js embeddings (no remote API calls)
  • Auto-Isolated: Each repo gets its own index in
    .osgrep/
    directory (v0.5+)
  • Adaptive Performance: Bounded concurrency keeps system responsive
  • Index Location:
    .osgrep/
    in project root (was
    ~/.osgrep/data/
    in v0.4.x)
  • Model Download: ~150MB on first run (
    osgrep setup
    to pre-download)
  • Chunking Strategy: Tree-sitter parses code into function/class boundaries
  • Deduplication: Identical code blocks are deduplicated
  • Dual Channels: Separate "Code" and "Docs" indices with ColBERT reranking
  • Structural Boosting: Functions/classes prioritized over test files
  • Skeleton Compression: ~85% token reduction when viewing file structure
  • 100%本地运行:使用transformers.js嵌入(无远程API调用)
  • 自动隔离:每个仓库在
    .osgrep/
    目录中拥有独立索引(v0.5+)
  • 自适应性能:限制并发数以保持系统响应性
  • 索引位置:项目根目录下的
    .osgrep/
    (v0.4.x版本中为
    ~/.osgrep/data/
  • 模型下载:首次运行时需下载约150MB模型(可使用
    osgrep setup
    预下载)
  • 分块策略:Tree-sitter将代码解析为函数/类边界
  • 去重:相同的代码块会被去重
  • 双渠道:独立的「代码」和「文档」索引,搭配ColBERT重排序
  • 结构加权:优先返回函数/类,而非测试文件
  • Skeleton压缩:查看文件结构时可减少约85%的令牌数量

Troubleshooting

故障排查

"Still Indexing..." message:
  • Index is ongoing. Results will be partial until complete.
  • Alert the user and ask if they wish to proceed.
Slow first search:
  • Expected—indexing takes 30-60s for medium repos
  • Use
    osgrep setup
    to pre-download models
Index out of date:
  • Run
    osgrep index
    to refresh
  • Run
    osgrep index --reset
    for a complete re-index
  • osgrep usually auto-detects changes
Installation issues:
bash
osgrep doctor              # Diagnose problems
npm install -g osgrep      # Reinstall if needed
No results found:
  • Try broader queries ("authentication" vs "JWT middleware")
  • Ensure index is up to date (
    osgrep index
    )
  • Verify you're in the correct repository directory
显示「Still Indexing...」消息:
  • 索引正在进行中,完成前结果会不完整。
  • 告知用户并询问是否继续。
首次搜索速度慢:
  • 这是正常现象——中等规模仓库的索引需要30-60秒
  • 使用
    osgrep setup
    预下载模型
索引过期:
  • 运行
    osgrep index
    刷新索引
  • 运行
    osgrep index --reset
    进行完全重新索引
  • osgrep通常会自动检测文件变化
安装问题:
bash
osgrep doctor              # 诊断问题
npm install -g osgrep      # 如有需要重新安装
未找到结果:
  • 尝试更宽泛的查询(如“authentication”而非“JWT middleware”)
  • 确保索引是最新的(运行
    osgrep index
  • 确认你在正确的仓库目录中