semtools

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Semtools: Semantic Search

Semtools：语义搜索

Perform semantic (meaning-based) search across code and documents using embedding-based similarity matching.

使用基于嵌入的相似性匹配，对代码和文档执行语义（基于含义）搜索。

Purpose

用途

The semtools skill provides access to Semtools, a high-performance Rust-based CLI for semantic search and document processing. Unlike traditional text search (ripgrep) which matches exact strings, or structural search (ast-grep) which matches syntax patterns, semtools understands semantic meaning through embeddings.

Key capabilities:

Semantic Search: Find code/text by meaning, not just keywords
Workspace Management: Index large codebases for fast repeated searches
Document Parsing: Convert PDFs, DOCX, PPTX to searchable text (requires API key)

Semtools excels at discovery - finding relevant code when you don't know the exact keywords, function names, or syntax patterns.

Semtools Skill提供对Semtools的访问，这是一个基于Rust开发的高性能CLI工具，用于语义搜索和文档处理。与匹配精确字符串的传统文本搜索工具（ripgrep）或匹配语法模式的结构搜索工具（ast-grep）不同，Semtools通过嵌入技术理解语义含义。

核心功能：

语义搜索：根据含义查找代码/文本，而非仅依赖关键词
工作区管理：为大型代码库建立索引，实现快速重复搜索
文档解析：将PDF、DOCX、PPTX转换为可搜索文本（需要API密钥）

Semtools擅长发现工作——当你不知道确切的关键词、函数名或语法模式时，帮助你找到相关代码。

When to Use This Skill

何时使用该Skill

Use the semtools skill when you need meaning-based search:

Semantic Code Discovery:

Finding code that implements a concept ("error handling", "data validation")
Discovering similar functionality across different modules
Locating examples of a pattern when you don't know exact names
Understanding what code does without reading everything

Documentation & Knowledge:

Searching documentation by concept, not keywords
Finding related discussions in comments or docs
Discovering similar issues or solutions
Analyzing technical documents (PDFs, reports)

Use Cases:

"Find all authentication-related code" (without knowing function names)
"Show me error handling patterns" (regardless of specific error types)
"Find code similar to this implementation" (semantic similarity)
"Search research papers for 'distributed consensus'" (document search)

Choose semtools over file-search (ripgrep/ast-grep) when:

You know the concept but not the keywords
Exact string matching misses relevant results
You want semantically similar code, not exact matches
Searching across languages or mixed content

Still use file-search when:

You know exact keywords, function names, or patterns
You need structural code matching (ast-grep)
Speed is critical (ripgrep is faster for exact matches)
You're searching for specific symbols or references

当你需要基于语义的搜索时，使用Semtools Skill：

语义代码发现：

查找实现某个概念的代码（如“错误处理”、“数据验证”）
发现不同模块中的相似功能
当你不知道确切名称时，查找某类模式的示例
无需通读所有内容即可理解代码功能

文档与知识检索：

按概念而非关键词搜索文档
在注释或文档中查找相关讨论
发现相似的问题或解决方案
分析技术文档（PDF、报告等）

使用场景：

“查找所有与认证相关的代码”（无需知道函数名）
“展示错误处理模式”（无论具体错误类型）
“查找与该实现相似的代码”（语义相似性）
“在研究论文中搜索‘分布式共识’”（文档搜索）

以下情况选择Semtools而非文件搜索工具（ripgrep/ast-grep）：

你知道概念但不知道关键词
精确字符串匹配会遗漏相关结果
你需要语义相似的代码，而非精确匹配
跨语言或混合内容搜索

以下情况仍使用文件搜索工具：

你知道确切的关键词、函数名或模式
你需要结构代码匹配（ast-grep）
速度是关键（ripgrep在精确匹配时更快）
你正在搜索特定符号或引用

Available Commands

可用命令

Semtools provides three CLI commands you can use via

execute_command

search
- Semantic search across code and text files
workspace
- Manage workspaces for caching embeddings
parse
- Convert documents (PDF, DOCX, PPTX) to searchable text

All commands work out-of-the-box in your execution environment. Document parsing requires the LLAMA_CLOUD_API_KEY environment variable to be set.

Semtools提供三个CLI命令，你可以通过

execute_command

调用：

search
- 对代码和文本文件执行语义搜索
workspace
- 管理工作区以缓存嵌入向量
parse
- 将文档（PDF、DOCX、PPTX）转换为可搜索文本

所有命令可在你的执行环境中直接使用。文档解析需要设置LLAMA_CLOUD_API_KEY环境变量。

Core Operations

核心操作

1. Semantic Search (

search

)

1. 语义搜索（

search

）

Find files and code sections by semantic meaning:

bash

undefined

根据语义含义查找文件和代码片段：

bash

undefined

Basic semantic search

基础语义搜索

search "authentication logic" src/

Search with more context (5 lines before/after)

搜索并显示更多上下文（前后各5行）

search "error handling" --n-lines 5 src/

Get more results (default: 3)

获取更多结果（默认：3条）

search "database queries" --top-k 10 src/

Control similarity threshold (0.0-1.0, lower = more lenient)

控制相似性阈值（0.0-1.0，值越低越严格）

search "API endpoints" --max-distance 0.4 src/


**Parameters:**
- `--n-lines N`: Show N lines of context around matches (default: 3)
- `--top-k K`: Return top K most similar matches (default: 3)
- `--max-distance D`: Maximum embedding distance (0.0-1.0, default: 0.3)
- `-i`: Case-insensitive matching

**Output format:**

Match 1 (similarity: 0.12) File: src/auth/handlers.py Lines: 42-47

def authenticate_user(username: str, password: str) -> Optional[User]: """Authenticate user credentials against database.""" user = get_user_by_username(username) if user and verify_password(password, user.password_hash): return user return None

Match 2 (similarity: 0.18) File: src/middleware/auth.py ...

undefined

search "API endpoints" --max-distance 0.4 src/


**参数说明：**
- `--n-lines N`：显示匹配内容前后N行上下文（默认：3）
- `--top-k K`：返回前K个最相似的匹配结果（默认：3）
- `--max-distance D`：最大嵌入距离（0.0-1.0，默认：0.3）
- `-i`：不区分大小写匹配

**输出格式：**

Match 1 (similarity: 0.12) File: src/auth/handlers.py Lines: 42-47

def authenticate_user(username: str, password: str) -> Optional[User]: """Authenticate user credentials against database.""" user = get_user_by_username(username) if user and verify_password(password, user.password_hash): return user return None

Match 2 (similarity: 0.18) File: src/middleware/auth.py ...

undefined

2. Workspace Management (

workspace

)

2. 工作区管理（

workspace

）

For large codebases, create workspaces to cache embeddings and enable fast repeated searches:

bash

undefined

对于大型代码库，创建工作区来缓存嵌入向量，实现快速重复搜索：

bash

undefined

Create/activate workspace

创建/激活工作区

workspace use my-project

Set workspace via environment variable

通过环境变量设置工作区

export SEMTOOLS_WORKSPACE=my-project

Index files in workspace (workspace auto-detected from env var)

在工作区中索引文件（工作区会从环境变量自动检测）

search "query" src/

Check workspace status

检查工作区状态

workspace status

Clean up old workspaces

清理旧工作区

workspace prune


**Benefits:**
- **Fast repeated searches**: Embeddings cached, no re-computation
- **Large codebases**: IVF_PQ indexing for scalability
- **Session persistence**: Maintain context across multiple searches

**When to use workspaces:**
- Searching the same codebase multiple times
- Very large projects (1000+ files)
- Interactive exploration sessions
- CI/CD pipelines with repeated searches

workspace prune


**优势：**
- **快速重复搜索**：嵌入向量已缓存，无需重新计算
- **支持大型代码库**：IVF_PQ索引实现可扩展性
- **会话持久化**：在多次搜索中保持上下文

**何时使用工作区：**
- 多次搜索同一代码库
- 超大型项目（1000+文件）
- 交互式探索会话
- 需要重复搜索的CI/CD流水线

3. Document Parsing (

parse

) ⚠️ Requires API Key

3. 文档解析（

parse

）⚠️ 需要API密钥

Convert documents to searchable markdown (requires LlamaParse API key):

bash

undefined

将文档转换为可搜索的Markdown格式（需要LlamaParse API密钥）：

bash

undefined

Parse PDFs to markdown

将PDF解析为Markdown

parse research_papers/*.pdf

Parse Word documents

解析Word文档

parse reports/*.docx

Parse presentations

解析演示文稿

parse slides/*.pptx

Parse and pipe to search

解析并直接传入搜索命令

parse docs/*.pdf | xargs search "neural networks"


**Supported formats:**
- PDF (.pdf)
- Word (.docx)
- PowerPoint (.pptx)

**Configuration:**
```bash

parse docs/*.pdf | xargs search "neural networks"


**支持格式：**
- PDF (.pdf)
- Word (.docx)
- PowerPoint (.pptx)

**配置方式：**
```bash

Via environment variable

通过环境变量配置

export LLAMA_CLOUD_API_KEY="llx-..."

Via config file

通过配置文件配置

cat > ~/.parse_config.json << EOF { "api_key": "llx-...", "max_concurrent_requests": 10, "timeout_seconds": 3600 } EOF


**Important:** Document parsing is **optional**. Semantic search works without it.

cat > ~/.parse_config.json << EOF { "api_key": "llx-...", "max_concurrent_requests": 10, "timeout_seconds": 3600 } EOF


**注意：** 文档解析是**可选功能**，语义搜索无需该功能即可使用。

Workflow Patterns

工作流模式

Pattern 1: Concept Discovery

模式1：概念发现

When you know what you're looking for conceptually but not by name:

bash

undefined

当你知道要找的概念但不知道具体名称时：

bash

undefined

Step 1: Broad semantic search

步骤1：宽泛的语义搜索

search "rate limiting implementation" src/

Step 2: Review results, refine query

步骤2：查看结果，优化查询

search "throttle requests per user" src/ --top-k 10

Step 3: Use ripgrep for exact follow-up

步骤3：使用ripgrep进行精确后续搜索

rg "RateLimiter" --type py src/

undefined

rg "RateLimiter" --type py src/

undefined

Pattern 2: Similar Code Finder

模式2：相似代码查找

When you want to find code similar to a reference implementation:

bash

undefined

当你想要找到与参考实现相似的代码时：

bash

undefined

Step 1: Extract key concepts from reference code

步骤1：从参考代码中提取核心概念

[Read example_auth.py and identify key concepts]

[阅读example_auth.py并识别核心概念]

Step 2: Search for similar implementations

步骤2：搜索相似实现

search "user authentication with JWT tokens" src/

Step 3: Compare implementations

步骤3：对比实现

[Review semantic matches to find similar approaches]

[查看语义匹配结果以找到相似方案]

undefined

undefined

Pattern 3: Documentation Search

模式3：文档搜索

When researching concepts in documentation or comments:

bash

undefined

当你在文档或注释中研究概念时：

bash

undefined

Search code comments semantically

语义搜索代码注释

search "thread safety guarantees" src/ --n-lines 10

Search markdown documentation

搜索Markdown文档

search "deployment best practices" docs/

Combined search

组合搜索

search "performance optimization" --top-k 20

undefined

search "performance optimization" --top-k 20

undefined

Pattern 4: Cross-Language Search

模式4：跨语言搜索

When searching for concepts across different languages:

bash

undefined

当你跨不同语言搜索概念时：

bash

undefined

Semantic search works across languages

语义搜索支持跨语言

search "connection pooling" src/

May find:

可能找到：

- Java: "ConnectionPool manager"

- Python: "database connection reuse"

- Go: "pool of persistent connections"

All semantically related despite different terminology

尽管术语不同，但语义相关

undefined

undefined

Pattern 5: Document Analysis (with API key)

模式5：文档分析（需API密钥）

When analyzing PDFs or documents:

bash

undefined

当你分析PDF或其他文档时：

bash

undefined

Step 1: Parse documents to markdown

步骤1：将文档解析为Markdown

parse research/*.pdf > papers.md

Step 2: Search converted content

步骤2：搜索转换后的内容

search "transformer architecture" papers.md

Step 3: Combine with code search

步骤3：结合代码搜索

search "attention mechanism implementation" src/

undefined

search "attention mechanism implementation" src/

undefined

Integration with file-search

与文件搜索工具的集成

Semtools and file-search (ripgrep/ast-grep) are complementary tools. Use them together for comprehensive search:

Semtools与文件搜索工具（ripgrep/ast-grep）是互补工具，结合使用可实现全面搜索：

Search Strategy Matrix

搜索策略矩阵

You Know	Use First	Then Use	Why
Exact keywords	ripgrep	search	Fast exact match, then find similar
Concept only	search	ripgrep	Find relevant code, then search specifics
Function name	ripgrep	search	Find definition, then find similar usage
Code pattern	ast-grep	search	Find structure, then find similar logic
Approximate idea	search	ripgrep + ast-grep	Discover, then drill down

你知道的信息	优先使用	随后使用	原因
确切关键词	ripgrep	search	快速精确匹配，然后查找相似内容
仅知道概念	search	ripgrep	找到相关代码，然后搜索具体细节
函数名	ripgrep	search	找到定义，然后查找相似用法
代码模式	ast-grep	search	找到结构，然后查找相似逻辑
大致想法	search	ripgrep + ast-grep	先发现，再深入分析

Layered Search Approach

分层搜索方法

bash

undefined

bash

undefined

Layer 1: Semantic discovery (what's related?)

第一层：语义发现（哪些内容相关？）

search "user session management" --top-k 10

Layer 2: Exact text search (what's the implementation?)

第二层：精确文本搜索（具体实现是什么？）

rg "SessionManager|session_store" --type py

Layer 3: Structural search (how is it used?)

第三层：结构搜索（如何使用？）

sg --pattern 'session.$METHOD($$$)' --lang python

Layer 4: Reference tracking (where is it called?)

第四层：引用追踪（在哪里被调用？）

[Use serena skill for symbol-level tracking]

[使用serena skill进行符号级追踪]

undefined

undefined

Best Practices

最佳实践

1. Start Broad, Then Narrow

1. 先宽泛搜索，再逐步缩小范围

Use semantic search for discovery, then narrow with exact search:

bash

undefined

使用语义搜索进行发现，然后用精确搜索缩小范围：

bash

undefined

GOOD: Broad semantic discovery first

推荐：先进行宽泛的语义发现

search "authentication" src/ --top-k 10

[Review results to learn terminology]

[查看结果以了解术语]

rg "authenticate|verify_credentials" --type py src/

AVOID: Starting too narrow and missing variations

避免：一开始范围过窄，遗漏变体

rg "authenticate" --type py # Misses "verify_credentials", "check_auth", etc.

undefined

rg "authenticate" --type py # 会遗漏"verify_credentials"、"check_auth"等

undefined

2. Adjust Similarity Threshold

2. 调整相似性阈值

Tune

--max-distance

based on results:

bash

undefined

根据结果调整

--max-distance

参数：

bash

undefined

Too many irrelevant results? Decrease distance (more strict)

无关结果太多？减小距离（更严格）

search "query" --max-distance 0.2

Missing relevant results? Increase distance (more lenient)

遗漏相关结果？增大距离（更宽松）

search "query" --max-distance 0.5

Default (0.3) works well for most cases

默认值（0.3）适用于大多数场景

search "query"

undefined

search "query"

undefined

3. Use Workspaces for Repeated Searches

3. 重复搜索时使用工作区

For interactive exploration, always use workspaces:

bash

undefined

对于交互式探索，始终使用工作区：

bash

undefined

GOOD: Create workspace once, search many times

推荐：创建一次工作区，多次搜索

export SEMTOOLS_WORKSPACE=my-analysis search "concept1" src/ search "concept2" src/ search "concept3" src/

INEFFICIENT: Re-compute embeddings every time

低效：每次搜索都重新计算嵌入向量

search "concept1" src/ search "concept2" src/

undefined

search "concept1" src/ search "concept2" src/

undefined

4. Combine with Context Tools

4. 结合上下文工具

Get more context around semantic matches:

bash

undefined

获取语义匹配结果的更多上下文：

bash

undefined

Find semantically similar code

查找语义相似的代码

search "retry logic" src/ --n-lines 2

Get more context with ripgrep

使用ripgrep获取更多上下文

rg -C 10 "retry" src/specific_file.py

Or read the full file

或直接读取完整文件

cat src/specific_file.py

undefined

cat src/specific_file.py

undefined

5. Phrase Queries Conceptually

5. 以概念化方式撰写查询

Write queries as concepts, not exact keywords:

bash

undefined

将查询写为概念，而非精确关键词：

bash

undefined

GOOD: Conceptual queries

推荐：概念化查询

search "handling network timeouts" search "user input validation" search "concurrent data access"

LESS EFFECTIVE: Exact keyword queries (use ripgrep instead)

效果较差：精确关键词查询（应使用ripgrep）

search "timeout" # Use: rg "timeout" search "validate" # Use: rg "validate"

undefined

search "timeout" # 推荐使用：rg "timeout" search "validate" # 推荐使用：rg "validate"

undefined

Understanding Semantic Distance

理解语义距离

Semtools uses embedding vectors to measure semantic similarity:

Distance 0.0: Identical meaning
Distance 0.1-0.2: Very similar (synonyms, paraphrases)
Distance 0.2-0.3: Related concepts (default threshold)
Distance 0.3-0.4: Loosely related
Distance 0.5+: Weakly related or unrelated

Practical guidelines:

bash

undefined

Semtools使用嵌入向量来衡量语义相似性：

距离0.0：含义完全相同
距离0.1-0.2：非常相似（同义词、改写）
距离0.2-0.3：相关概念（默认阈值）
距离0.3-0.4：松散相关
距离0.5+：相关性弱或不相关

实用指南：

bash

undefined

Strict matching (only close matches)

严格匹配（仅接近的匹配结果）

--max-distance 0.2

Balanced matching (default, recommended)

平衡匹配（默认值，推荐）

--max-distance 0.3

Lenient matching (exploratory search)

宽松匹配（探索性搜索）

--max-distance 0.4

Very lenient (may include false positives)

非常宽松（可能包含误报）

--max-distance 0.5

undefined

--max-distance 0.5

undefined

Local vs. Cloud Embeddings

本地与云端嵌入向量

Semantic Search (Local):

Uses local embeddings (model2vec, potion-multilingual-128M)
No API calls or cloud dependencies
Fast, private, no cost
Works offline

Document Parsing (Cloud):

Uses LlamaParse API (cloud-based)
Requires API key and internet connection
Processes PDFs, DOCX, PPTX
Usage-based pricing (check LlamaIndex pricing)

Privacy consideration: Semantic search is 100% local. Only document parsing sends data to LlamaParse API.

语义搜索（本地）：

使用本地嵌入向量（model2vec、potion-multilingual-128M）
无API调用或云端依赖
快速、隐私、免费
支持离线使用

文档解析（云端）：

使用LlamaParse API（基于云端）
需要API密钥和互联网连接
处理PDF、DOCX、PPTX
按使用量付费（查看LlamaIndex定价）

隐私注意事项： 语义搜索100%在本地进行，只有文档解析会将数据发送到LlamaParse API。

Performance Considerations

性能考虑

Speed Characteristics

速度特性

Without workspace:

First search: ~2-5 seconds (embedding computation)
Subsequent searches: ~2-5 seconds each (re-compute embeddings)

With workspace (cached embeddings):

First search: ~2-5 seconds (builds index)
Subsequent searches: ~0.1-0.5 seconds (cached)
Large codebases: IVF_PQ indexing for scalability

Comparison:

ripgrep: 0.01-0.1 seconds (fastest, exact match)
ast-grep: 0.1-0.5 seconds (fast, structural)
semtools (cached): 0.1-0.5 seconds (fast, semantic)
semtools (uncached): 2-5 seconds (slower, semantic)

无工作区：

首次搜索：~2-5秒（嵌入向量计算）
后续搜索：每次~2-5秒（重新计算嵌入向量）

有工作区（嵌入向量已缓存）：

首次搜索：~2-5秒（构建索引）
后续搜索：~0.1-0.5秒（使用缓存）
大型代码库：IVF_PQ索引实现可扩展性

对比：

ripgrep：0.01-0.1秒（最快，精确匹配）
ast-grep：0.1-0.5秒（快速，结构匹配）
semtools（缓存）：0.1-0.5秒（快速，语义匹配）
semtools（无缓存）：2-5秒（较慢，语义匹配）

Optimization Tips

优化技巧

bash

undefined

bash

undefined

1. Use workspaces for repeated searches

1. 重复搜索时使用工作区

export SEMTOOLS_WORKSPACE=my-project

2. Limit search scope to relevant directories

2. 将搜索范围限制在相关目录

search "query" src/ --not tests/

3. Use --top-k to control result count

3. 使用--top-k控制结果数量

search "query" --top-k 5

4. Pipe to head for quick preview

4. 管道到head命令快速预览

search "query" | head -50

undefined

search "query" | head -50

undefined

Unix Pipeline Integration

Unix流水线集成

Semtools is designed for Unix-style composition:

bash

undefined

Semtools专为Unix风格的组合使用而设计：

bash

undefined

Find and parse PDFs, then search

查找并解析PDF，然后搜索

find docs/ -name "*.pdf" | xargs parse | xargs search "topic"

Search and filter with grep

搜索并使用grep过滤

search "authentication" src/ | grep -i "jwt"

Count matches

统计匹配数量

search "error handling" src/ | grep "Match" | wc -l

Combine with other tools

与其他工具结合使用

search "API" src/ | xargs -I {} rg -l "REST" {}

undefined

search "API" src/ | xargs -I {} rg -l "REST" {}

undefined

Limitations

局限性

When NOT to Use Semtools

何时不应使用Semtools

Exact keyword search: Use ripgrep for known keywords

bash

# WRONG TOOL: Semantic search for exact function name
search "authenticate_user"

# RIGHT TOOL: Use ripgrep for exact matches
rg "authenticate_user" --type py

Structural code patterns: Use ast-grep for syntax matching

bash

# WRONG TOOL: Semantic search for code structure
search "class with constructor"

# RIGHT TOOL: Use ast-grep for structure
sg --pattern 'class $NAME { constructor($$$) { $$$ } }'

Symbol references: Use serena for LSP-based tracking

bash

# WRONG TOOL: Semantic search for all usages
search "MyClass usage"

# RIGHT TOOL: Use serena for precise references
serena find_referencing_symbols --name 'MyClass'

Small codebases: Overhead not worth it for <100 files
- ripgrep is faster and simpler for small projects

精确关键词搜索：使用ripgrep查找已知关键词

bash

# 错误工具：用语义搜索查找确切函数名
search "authenticate_user"

# 正确工具：用ripgrep进行精确匹配
rg "authenticate_user" --type py

结构代码模式：使用ast-grep进行语法匹配

bash

# 错误工具：用语义搜索查找代码结构
search "class with constructor"

# 正确工具：用ast-grep进行结构匹配
sg --pattern 'class $NAME { constructor($$$) { $$$ } }'

符号引用：使用serena进行基于LSP的追踪

bash

# 错误工具：用语义搜索查找所有用法
search "MyClass usage"

# 正确工具：用serena进行精确引用追踪
serena find_referencing_symbols --name 'MyClass'

小型代码库：对于<100个文件的项目，使用Semtools的开销不值得
- ripgrep更快更简单

Known Edge Cases

已知边缘情况

Ambiguous queries: Vague concepts return broad results
Technical jargon: Domain-specific terms may have lower accuracy
Short code snippets: Limited context reduces embedding quality
Mixed languages: Embeddings tuned for English (multilingual model used)
Generated code: Repetitive patterns may cluster together

模糊查询：模糊概念会返回宽泛的结果
技术术语：领域特定术语的准确性可能较低
短代码片段：上下文有限会降低嵌入向量质量
混合语言：嵌入向量针对英语优化（使用多语言模型）
生成代码：重复模式可能会聚集在一起

Troubleshooting

故障排除

No Semantic Matches Found

未找到语义匹配结果

If semantic search returns zero results:

Verify files exist: Use ripgrep to confirm content
bash
```
rg "concept" src/
```
Increase similarity threshold: Be more lenient
bash
```
search "query" --max-distance 0.5
```

Rephrase query: Try different terminology

bash

search "user authentication"
search "verify user credentials"
search "login validation"

Check file types: Ensure searching correct extensions
bash
```
search "query" src/*.py  # Target specific types
```

如果语义搜索返回零结果：

验证文件存在：使用ripgrep确认内容存在
bash
```
rg "concept" src/
```
提高相似性阈值：设置更宽松的匹配条件
bash
```
search "query" --max-distance 0.5
```

改写查询：尝试不同的术语

bash

search "user authentication"
search "verify user credentials"
search "login validation"

检查文件类型：确保搜索的是正确的文件扩展名
bash
```
search "query" src/*.py  # 针对特定类型
```

Too Many Irrelevant Results

无关结果过多

If semantic search returns too much noise:

Decrease similarity threshold: Be more strict
bash
```
search "query" --max-distance 0.2
```
Limit result count: Review top matches only
bash
```
search "query" --top-k 3
```
Narrow directory scope: Search specific paths
bash
```
search "query" src/specific_module/
```

Refine query: Add more specific concepts

bash

# Vague
search "data"

# Specific
search "data validation with regex patterns"

如果语义搜索返回太多噪音：

降低相似性阈值：设置更严格的匹配条件
bash
```
search "query" --max-distance 0.2
```
限制结果数量：仅查看前几个匹配结果
bash
```
search "query" --top-k 3
```
缩小目录范围：搜索特定路径
bash
```
search "query" src/specific_module/
```

优化查询：添加更具体的概念

bash

# 模糊查询
search "data"

# 具体查询
search "data validation with regex patterns"

Document Parsing Fails

文档解析失败

parse

fails:

Verify API key is set:
bash
```
echo $LLAMA_CLOUD_API_KEY
```
Check file format: Ensure supported format (PDF, DOCX, PPTX)
bash
```
file document.pdf  # Verify file type
```
Check file size: Large files may timeout
bash
```
du -h document.pdf  # Check size
```
Review parse config: Adjust timeouts if needed
bash
```
cat ~/.parse_config.json
```

如果

parse

命令失败：

验证API密钥已设置：
bash
```
echo $LLAMA_CLOUD_API_KEY
```
检查文件格式：确保是支持的格式（PDF、DOCX、PPTX）
bash
```
file document.pdf  # 验证文件类型
```
检查文件大小：大文件可能超时
bash
```
du -h document.pdf  # 检查大小
```
查看解析配置：必要时调整超时时间
bash
```
cat ~/.parse_config.json
```

Workspace Issues

工作区问题

If workspace commands fail:

bash

undefined

如果工作区命令失败：

bash

undefined

Check workspace status

检查工作区状态

workspace status

Prune corrupted workspaces

清理损坏的工作区

workspace prune

Recreate workspace

重新创建工作区

rm -rf ~/.semtools/workspaces/my-workspace export SEMTOOLS_WORKSPACE=my-workspace

undefined

rm -rf ~/.semtools/workspaces/my-workspace export SEMTOOLS_WORKSPACE=my-workspace

undefined

semtools

Original

Translation

Semtools: Semantic Search

Semtools：语义搜索

Purpose

用途

When to Use This Skill

何时使用该Skill

Available Commands

可用命令

Core Operations

核心操作

1. Semantic Search (search)

1. 语义搜索（search）

Basic semantic search

基础语义搜索

Search with more context (5 lines before/after)

搜索并显示更多上下文（前后各5行）

Get more results (default: 3)

获取更多结果（默认：3条）

Control similarity threshold (0.0-1.0, lower = more lenient)

控制相似性阈值（0.0-1.0，值越低越严格）

Match 1 (similarity: 0.12) File: src/auth/handlers.py Lines: 42-47

def authenticate_user(username: str, password: str) -> Optional[User]: """Authenticate user credentials against database.""" user = get_user_by_username(username) if user and verify_password(password, user.password_hash): return user return None

Match 1 (similarity: 0.12) File: src/auth/handlers.py Lines: 42-47

def authenticate_user(username: str, password: str) -> Optional[User]: """Authenticate user credentials against database.""" user = get_user_by_username(username) if user and verify_password(password, user.password_hash): return user return None

2. Workspace Management (workspace)

2. 工作区管理（workspace）

Create/activate workspace

创建/激活工作区

Set workspace via environment variable

通过环境变量设置工作区

Index files in workspace (workspace auto-detected from env var)

在工作区中索引文件（工作区会从环境变量自动检测）

Check workspace status

检查工作区状态

Clean up old workspaces

清理旧工作区

3. Document Parsing (parse) ⚠️ Requires API Key

3. 文档解析（parse）⚠️ 需要API密钥

Parse PDFs to markdown

将PDF解析为Markdown

Parse Word documents

解析Word文档

Parse presentations

解析演示文稿

Parse and pipe to search

解析并直接传入搜索命令

Via environment variable

通过环境变量配置

Via config file

通过配置文件配置

Workflow Patterns

工作流模式

Pattern 1: Concept Discovery

模式1：概念发现

Step 1: Broad semantic search

步骤1：宽泛的语义搜索

Step 2: Review results, refine query

步骤2：查看结果，优化查询

Step 3: Use ripgrep for exact follow-up

步骤3：使用ripgrep进行精确后续搜索

Pattern 2: Similar Code Finder

模式2：相似代码查找

Step 1: Extract key concepts from reference code

步骤1：从参考代码中提取核心概念

[Read example_auth.py and identify key concepts]

[阅读example_auth.py并识别核心概念]

Step 2: Search for similar implementations

步骤2：搜索相似实现

Step 3: Compare implementations

步骤3：对比实现

[Review semantic matches to find similar approaches]

[查看语义匹配结果以找到相似方案]

Pattern 3: Documentation Search

模式3：文档搜索

Search code comments semantically

语义搜索代码注释

Search markdown documentation

1. Semantic Search (
`search`
)

1. 语义搜索（
`search`
）

2. Workspace Management (
`workspace`
)

2. 工作区管理（
`workspace`
）

3. Document Parsing (
`parse`
) ⚠️ Requires API Key

3. 文档解析（
`parse`
）⚠️ 需要API密钥