semble

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

semble — Fast Token-Efficient Code Search for Agents

semble — 面向Agent的高效低Token代码搜索工具

~98% fewer tokens than grep+read. Index in ~250ms. Query in ~1.5ms. No GPU, no API key.
Semble returns only the relevant code snippets agents need, without grepping full files or reading directories. A natural-language or symbol query like
"authentication flow"
or
"save_pretrained"
returns exact chunks with file paths and line ranges — nothing more.
相比grep+read减少约98%的token使用量。索引耗时约250ms,查询耗时约1.5ms。无需GPU,无需API密钥。
Semble仅返回Agent所需的相关代码片段,无需搜索完整文件或读取目录。输入如
"authentication flow"
"save_pretrained"
这样的自然语言或符号查询,即可返回包含文件路径和行号范围的精准代码块——无多余内容。

Installation

安装

MCP (Claude Code — recommended)

MCP(Claude Code — 推荐)

bash
undefined
bash
undefined
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
undefined
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
undefined

MCP (Codex)

MCP(Codex)

Add to
~/.codex/config.toml
:
toml
[mcp_servers.semble]
command = "uvx"
args = ["--from", "semble[mcp]", "semble"]
添加至
~/.codex/config.toml
toml
[mcp_servers.semble]
command = "uvx"
args = ["--from", "semble[mcp]", "semble"]

MCP (Cursor)

MCP(Cursor)

Add to
~/.cursor/mcp.json
:
json
{
  "mcpServers": {
    "semble": {
      "command": "uvx",
      "args": ["--from", "semble[mcp]", "semble"]
    }
  }
}
添加至
~/.cursor/mcp.json
json
{
  "mcpServers": {
    "semble": {
      "command": "uvx",
      "args": ["--from", "semble[mcp]", "semble"]
    }
  }
}

MCP (OpenCode)

MCP(OpenCode)

Add to
~/.opencode/config.json
:
json
{
  "mcp": {
    "semble": {
      "type": "local",
      "command": ["uvx", "--from", "semble[mcp]", "semble"]
    }
  }
}
添加至
~/.opencode/config.json
json
{
  "mcp": {
    "semble": {
      "type": "local",
      "command": ["uvx", "--from", "semble[mcp]", "semble"]
    }
  }
}

CLI / pip

CLI / pip

bash
pip install semble        # pip
uv tool install semble    # uv (recommended for CLI use)
bash
pip install semble        # pip
uv tool install semble    # uv(CLI使用推荐)

Skill (any platform)

Skill(任意平台)

bash
npx skills add https://github.com/akillness/oh-my-skills --skill semble
bash
npx skills add https://github.com/akillness/oh-my-skills --skill semble

When to use

使用场景

  • Search a codebase by describing behavior in natural language (
    "how is rate limiting handled"
    )
  • Look up a symbol or identifier without knowing the exact file (
    "save_pretrained"
    )
  • Discover code semantically similar to a known location (
    find-related
    )
  • Give an agent token-efficient access to any repo via MCP instead of letting it grep/read full files
  • Index a remote git repo without cloning first
  • 用自然语言描述行为来搜索代码库(如
    "how is rate limiting handled"
  • 在不知道具体文件的情况下查找符号或标识符(如
    "save_pretrained"
  • 发现与已知位置语义相似的代码(
    find-related
    功能)
  • 通过MCP为Agent提供低Token消耗的代码仓库访问权限,替代让其搜索/读取完整文件
  • 无需克隆即可索引远程Git仓库

Do not use when

不适用场景

  • You need to read a full file or directory listing → use native
    Read
    ,
    Glob
    tools
  • You need regex or exact-string search →
    Grep
    is more appropriate
  • The repo is too small to justify indexing (a few files) — just read them directly
  • You need to run tests, build, or execute code — this is a search-only tool
  • 需要读取完整文件或目录列表 → 使用原生
    Read
    Glob
    工具
  • 需要正则或精确字符串搜索 →
    Grep
    更合适
  • 代码仓库过小(仅几个文件)无需索引 —— 直接读取即可
  • 需要运行测试、构建或执行代码 —— 本工具仅用于搜索

CLI usage

CLI 使用示例

bash
undefined
bash
undefined

Natural-language search in a local repo

在本地仓库中进行自然语言搜索

semble search "authentication flow" ./my-project
semble search "authentication flow" ./my-project

Symbol search

符号搜索

semble search "save_pretrained" ./my-project
semble search "save_pretrained" ./my-project

Search with a limit on returned chunks

限制返回代码块数量的搜索

semble search "save model to disk" ./my-project --top-k 10
semble search "save model to disk" ./my-project --top-k 10

Search a remote git repo (no clone needed)

搜索远程Git仓库(无需克隆)

semble search "save model to disk" https://github.com/MinishLab/model2vec
semble search "save model to disk" https://github.com/MinishLab/model2vec

Find semantically similar code given a known file+line

根据已知文件+行号查找语义相似的代码

semble find-related src/auth.py 42 ./my-project
semble find-related src/auth.py 42 ./my-project

Show token savings vs grep+read for the last query

显示上一次查询相比grep+read的token节省情况

semble savings semble savings --verbose
undefined
semble savings semble savings --verbose
undefined

Python library

Python 库

python
from semble import SembleIndex
python
from semble import SembleIndex

Index local directory

索引本地目录

index = SembleIndex.from_path("./my-project")
index = SembleIndex.from_path("./my-project")

Index remote repository (no clone required)

索引远程仓库(无需克隆)

index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

Natural-language or symbol query

自然语言或符号查询

results = index.search("save model to disk", top_k=3)
results = index.search("save model to disk", top_k=3)

Find semantically similar code to a known chunk

查找与已知代码块语义相似的代码

related = index.find_related(results[0], top_k=3)
related = index.find_related(results[0], top_k=3)

Inspect results

查看结果

result = results[0] print(result.chunk.file_path) # "model2vec/model.py" print(result.chunk.start_line) # 127 print(result.chunk.end_line) # 150 print(result.chunk.content) # the function/class body
undefined
result = results[0] print(result.chunk.file_path) # "model2vec/model.py" print(result.chunk.start_line) # 127 print(result.chunk.end_line) # 150 print(result.chunk.content) # 函数/类主体内容
undefined

AGENTS.md / CLAUDE.md integration

AGENTS.md / CLAUDE.md 集成

Add this section to your project's
AGENTS.md
or
CLAUDE.md
to enable semble for all agents:
markdown
undefined
将以下部分添加到项目的
AGENTS.md
CLAUDE.md
中,为所有Agent启用semble:
markdown
undefined

Code Search

代码搜索

Use
semble search
to find code by describing what it does or naming a symbol, instead of grep:
bash semble search "authentication flow" ./my-project semble search "save_pretrained" ./my-project semble search "save model to disk" ./my-project --top-k 10 ​
Use
semble find-related
to discover code similar to a known location (pass
file_path
and
line
from a prior search result):
bash semble find-related src/auth.py 42 ./my-project ​
path
defaults to the current directory when omitted; git URLs are accepted.
If
semble
is not on
$PATH
, use
uvx --from "semble[mcp]" semble
in its place.

For Claude Code sub-agents, initialize once in the project root:
```bash
semble init
使用
semble search
通过描述功能或命名符号来查找代码,替代grep:
bash semble search "authentication flow" ./my-project semble search "save_pretrained" ./my-project semble search "save model to disk" ./my-project --top-k 10 ​
使用
semble find-related
发现与已知位置相似的代码(传入之前搜索结果中的
file_path
line
):
bash semble find-related src/auth.py 42 ./my-project ​
省略
path
时默认使用当前目录;支持Git URL。
如果
semble
不在
$PATH
中,可替换为
uvx --from "semble[mcp]" semble

对于Claude Code子Agent,只需在项目根目录初始化一次:
```bash
semble init

Performance benchmarks

性能基准测试

MetricSemblegrep+read
Indexing speed~250msn/a
Query speed~1.5msvaries
Token use at 94% recall~2k tokens~100k tokens
NDCG@100.854
vs 137M-param CodeRankEmbed99% quality
Indexing vs transformer218× faster
指标Semblegrep+read
索引速度~250msn/a
查询速度~1.5ms变化不定
召回率94%时的Token使用量~2k tokens~100k tokens
NDCG@100.854
与1.37亿参数CodeRankEmbed对比99%的质量
索引速度与Transformer对比快218倍

Operating rules

操作规则

  1. Prefer MCP installation for interactive agent use; prefer CLI/pip for scripting and CI.
  2. Use
    --top-k
    to limit results and keep context small — default is often too generous for agent prompts.
  3. Use
    find-related
    after
    search
    when you need to expand from one known chunk into similar code.
  4. Use
    semble init
    in project roots to pre-warm the index for Claude Code sub-agents.
  5. If
    semble
    is not on
    $PATH
    , replace with
    uvx --from "semble[mcp]" semble
    in scripts.
  6. Treat semble as the first pass — read full files only when the returned chunk is insufficient context.
  1. 交互式Agent使用优先选择MCP安装方式;脚本和CI环境优先选择CLI/pip安装。
  2. 使用
    --top-k
    限制结果数量,减少上下文内容——默认值对Agent提示词来说通常过于宽泛。
  3. 当需要从已知代码块扩展到相似代码时,先使用
    search
    再使用
    find-related
  4. 在项目根目录使用
    semble init
    为Claude Code子Agent预加载索引。
  5. 如果
    semble
    不在
    $PATH
    中,脚本中可替换为
    uvx --from "semble[mcp]" semble
  6. 将semble作为第一步骤——仅当返回的代码块上下文不足时才读取完整文件。

Examples

示例

bash
undefined
bash
undefined

Search for how a feature is implemented

搜索某功能的实现方式

semble search "rate limiting middleware" ./api-service
semble search "rate limiting middleware" ./api-service

Find all code related to database migrations

查找所有与数据库迁移相关的代码

semble search "database migration" ./backend --top-k 5
semble search "database migration" ./backend --top-k 5

Explore similar code patterns near a known function

探索已知函数附近的相似代码模式

semble find-related src/middleware/auth.py 88 ./api-service
semble find-related src/middleware/auth.py 88 ./api-service

Index and query a remote library without cloning

无需克隆即可索引并查询远程库

semble search "tokenizer padding" https://github.com/huggingface/transformers

Source: [MinishLab/semble](https://github.com/MinishLab/semble) — MIT License
semble search "tokenizer padding" https://github.com/huggingface/transformers

来源:[MinishLab/semble](https://github.com/MinishLab/semble) — MIT 许可证