indexion-grep

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

indexion grep

indexion grep

KGF-aware token pattern search, structural queries, and vector similarity search.
支持KGF的token模式搜索、结构查询和向量相似度搜索。

When to Use

适用场景

  • User asks to find specific code patterns (e.g. "nested for loops", "pub fn without docs")
  • User asks "where is this function used?" or "find all pub structs"
  • User wants structural search — not regex on raw text, but token-level matching
  • User asks for semantic code search ("find functions that parse configuration")
  • User wants to find proxy functions, long functions, or functions with many params
  • Replaces manual grep/ripgrep for code-aware searches
  • Use instead of Explore agent for targeted codebase queries
  • 用户要求查找特定代码模式(例如「嵌套for循环」、「无文档的pub fn」)
  • 用户询问「这个函数在哪里被使用?」或「查找所有pub结构体」
  • 用户需要结构搜索——不是对原始文本做正则匹配,而是token级别的匹配
  • 用户需要语义代码搜索(「查找用于解析配置的函数」)
  • 用户需要查找代理函数、长函数或参数过多的函数
  • 可替代手动grep/ripgrep做代码感知搜索
  • 针对定向代码库查询,可替代Explore Agent使用

Token Pattern Search

Token模式搜索

Patterns are space-separated token matchers. KGF aliases resolve automatically (e.g.
pub
KW_pub
), so you write natural code keywords:
bash
undefined
模式是空格分隔的token匹配器,KGF别名会自动解析(例如
pub
KW_pub
),因此你可以直接写自然的代码关键字:
bash
undefined

Find all pub fn declarations

Find all pub fn declarations

indexion grep "pub fn *" src/
indexion grep "pub fn *" src/

Find pub struct definitions

Find pub struct definitions

indexion grep "pub struct *" src/
indexion grep "pub struct *" src/

Nested for loops (O(n²) candidates)

Nested for loops (O(n²) candidates)

indexion grep "for ... for" src/
indexion grep "for ... for" src/

Functions named "sort"

Functions named "sort"

indexion grep "fn Ident:sort" src/
indexion grep "fn Ident:sort" src/

Any token except pub, followed by fn

Any token except pub, followed by fn

indexion grep "!pub fn" src/
indexion grep "!pub fn" src/

Using raw KGF token kinds also works

Using raw KGF token kinds also works

indexion grep "KW_pub KW_fn Ident" src/
undefined
indexion grep "KW_pub KW_fn Ident" src/
undefined

Pattern Syntax

模式语法

PatternMeaning
pub
Match keyword (auto-alias →
KW_pub
)
KW_fn
Match token kind exactly
Ident:foo
Match kind and text
*
Match any single token
...
Match zero or more tokens (non-greedy)
!pub
Negation — any token except this kind
(
,
)
,
{
,
->
Punctuation aliases
Aliases are auto-generated from KGF specs — not hardcoded. Works for all KGF-supported languages.
模式含义
pub
匹配关键字(自动别名→
KW_pub
KW_fn
精确匹配token类型
Ident:foo
匹配类型和文本
*
匹配任意单个token
...
匹配0个或多个token(非贪婪模式)
!pub
取反——匹配除此类型外的任意token
(
,
)
,
{
,
->
标点符号别名
别名是从KGF规范自动生成的,并非硬编码,适用于所有KGF支持的语言。

Semantic Queries

语义查询

Structural analysis beyond token patterns:
bash
undefined
超出token模式的结构分析能力:
bash
undefined

Find proxy functions (wrappers that just delegate)

Find proxy functions (wrappers that just delegate)

indexion grep --semantic=proxy src/
indexion grep --semantic=proxy src/

Find long functions (30+ lines)

Find long functions (30+ lines)

indexion grep --semantic=long:30 src/
indexion grep --semantic=long:30 src/

Find short functions (3 lines or less)

Find short functions (3 lines or less)

indexion grep --semantic=short:3 src/
indexion grep --semantic=short:3 src/

Find functions with 4+ parameters

Find functions with 4+ parameters

indexion grep --semantic=params-gte:4 src/
indexion grep --semantic=params-gte:4 src/

Find functions by name substring

Find functions by name substring

indexion grep --semantic=name:sort src/
indexion grep --semantic=name:sort src/

Find undocumented pub declarations (also available as --undocumented)

Find undocumented pub declarations (also available as --undocumented)

indexion grep --undocumented src/ indexion grep --semantic=undocumented src/
undefined
indexion grep --undocumented src/ indexion grep --semantic=undocumented src/
undefined

Vector Similarity Search

向量相似度搜索

Find code by natural language description using TF-IDF embeddings (shared infrastructure with
digest
):
bash
undefined
使用TF-IDF嵌入,通过自然语言描述查找代码(与
digest
共用基础设施):
bash
undefined

Find functions related to "parse JSON configuration"

Find functions related to "parse JSON configuration"

indexion grep --semantic="similar:parse JSON configuration" src/
indexion grep --semantic="similar:parse JSON configuration" src/

Find tokenization-related code

Find tokenization-related code

indexion grep --semantic="similar:tokenize source code into tokens" src/
indexion grep --semantic="similar:tokenize source code into tokens" src/

Find error handling patterns

Find error handling patterns

indexion grep --semantic="similar:handle error and return" src/

Results are ranked by cosine similarity score.
indexion grep --semantic="similar:handle error and return" src/

结果按余弦相似度得分排序。

Output Control

输出控制

bash
undefined
bash
undefined

File paths only

File paths only

indexion grep --files "pub fn *" src/
indexion grep --files "pub fn *" src/

Match count per file

Match count per file

indexion grep --count "pub fn *" src/
indexion grep --count "pub fn *" src/

Context lines around matches

Context lines around matches

indexion grep --context=3 "for ... for" src/
indexion grep --context=3 "for ... for" src/

Include/exclude patterns

Include/exclude patterns

indexion grep --include='.mbt' --exclude='_test.mbt' "pub fn *" src/
undefined
indexion grep --include='.mbt' --exclude='_test.mbt' "pub fn *" src/
undefined

Options

选项

OptionDefaultDescription
--semantic=QUERY
Semantic query (see above)
--undocumented
falseFind pub declarations without doc comments
--files
falseShow matching file paths only
--count
falseShow match count per file only
--context=N
0Lines of context around matches
--include=GLOB
Include file pattern (repeatable)
--exclude=GLOB
Exclude file pattern (repeatable)
--specs-dir=DIR
kgfsKGF specs directory
选项默认值描述
--semantic=QUERY
语义查询(见上文)
--undocumented
false查找无文档注释的pub声明
--files
false仅展示匹配的文件路径
--count
false仅展示每个文件的匹配次数
--context=N
0匹配结果前后的上下文行数
--include=GLOB
包含的文件模式(可重复使用)
--exclude=GLOB
排除的文件模式(可重复使用)
--specs-dir=DIR
kgfsKGF规范目录

Relationship to Other Commands

与其他命令的关系

CommandPurposeWhen to use
grep
Find specific patterns/functions"Find all nested for loops"
explore
File-level similarity matrix"What files are similar?"
plan refactor
Actionable refactoring plan"What duplicates should I fix?"
plan unwrap
Proxy function detection + auto-fix"Remove unnecessary wrappers"
digest query
Purpose-based function index"Find function that handles X" (requires build step)
grep --semantic=proxy
overlaps with
plan unwrap
in detection, but
plan unwrap
adds auto-fix capability. Use
grep
for quick discovery,
plan unwrap
for action.
grep --semantic=similar:...
overlaps with
digest query
but requires no build step.
digest
is better for repeated queries on large codebases (cached index).
命令用途使用时机
grep
查找特定模式/函数「查找所有嵌套for循环」
explore
文件级相似度矩阵「哪些文件是相似的?」
plan refactor
可落地的重构方案「我应该修复哪些重复代码?」
plan unwrap
代理函数检测+自动修复「移除不必要的包装函数」
digest query
基于用途的函数索引「查找处理X功能的函数」(需要构建步骤)
grep --semantic=proxy
plan unwrap
的检测能力有重叠,但
plan unwrap
额外提供自动修复功能。快速查找用
grep
,执行落地用
plan unwrap
grep --semantic=similar:...
digest query
能力有重叠,但无需构建步骤。
digest
更适合大型代码库的重复查询(有索引缓存)。

Dogfooding Workflow

内部使用工作流

bash
undefined
bash
undefined

After writing new code, check for patterns that need attention:

写完新代码后,检查需要关注的模式:

1. Find potential O(n²) sorts (nested loops)

1. 查找潜在O(n²)排序(嵌套循环)

indexion grep "for ... for" src/
indexion grep "for ... for" src/

2. Check for undocumented public API

2. 检查未文档化的公开API

indexion grep --undocumented src/
indexion grep --undocumented src/

3. Find proxy functions to consider unwrapping

3. 查找可考虑拆解的代理函数

indexion grep --semantic=proxy src/
indexion grep --semantic=proxy src/

4. Find overly long functions

4. 查找过长的函数

indexion grep --semantic=long:50 src/
indexion grep --semantic=long:50 src/

5. Search for specific refactoring targets by similarity

5. 按相似度查找特定的重构目标

indexion grep --semantic="similar:extract substring" src/
indexion grep --semantic="similar:extract substring" src/

6. Trace all references to a type before moving it

6. 移动类型前追踪该类型的所有引用

indexion grep "TypeIdent:TfidfEmbeddingProvider" src/
indexion grep "TypeIdent:TfidfEmbeddingProvider" src/

7. Find all sort-related functions across the codebase

7. 查找整个代码库中所有与排序相关的函数

indexion grep --semantic=name:sort src/
indexion grep --semantic=name:sort src/

8. Verify a refactoring didn't leave orphan references

8. 验证重构没有遗留孤立引用

indexion grep "Ident:old_function_name" src/ cmd/indexion/
undefined
indexion grep "Ident:old_function_name" src/ cmd/indexion/
undefined

Dogfooding Lessons

内部使用经验

  • Use instead of Explore agent:
    grep "TypeIdent:X"
    is faster and more precise than spawning an agent to search for a type definition.
  • Alias resolution is automatic: You don't need to know that
    pub
    maps to
    KW_pub
    — just write the keyword as it appears in source code.
  • ...
    is non-greedy
    :
    for ... for
    finds the closest pair of for loops, not the furthest. This is usually what you want for finding nesting.
  • Vector search quality:
    --semantic="similar:..."
    works best with descriptive phrases. "parse JSON configuration" works better than just "json".
  • Combine with plan refactor: Use
    grep --semantic=name:X
    to find all instances before consolidating, then
    plan refactor
    to verify they're gone.
  • 可替代Explore Agent使用
    grep "TypeIdent:X"
    比启动Agent查找类型定义更快、更精准。
  • 别名自动解析:你不需要知道
    pub
    映射到
    KW_pub
    ,直接写源代码中出现的关键字即可。
  • ...
    是非贪婪模式
    for ... for
    会找到最近的一对for循环,而不是最远的,这通常是你查找嵌套时需要的效果。
  • 向量搜索质量
    --semantic="similar:..."
    搭配描述性短语效果最好,「parse JSON configuration」比单独的「json」效果好。
  • 可与重构计划搭配使用:合并前用
    grep --semantic=name:X
    查找所有实例,然后用
    plan refactor
    验证它们都已被移除。