RLAMA - Local RAG System

RLAMA - 本地RAG系统

RLAMA (Retrieval-Augmented Language Model Adapter) provides fully local, offline RAG for semantic search over your documents.

RLAMA（Retrieval-Augmented Language Model Adapter，检索增强语言模型适配器）为你的文档提供完全本地、离线的RAG语义搜索功能。

When to Use This Skill

适用场景

Building knowledge bases from local documents
Searching personal notes, research papers, or code documentation
Document-based Q&A without sending data to the cloud
Indexing project documentation for quick semantic lookup
Creating searchable archives of PDFs, markdown, or code files

从本地文档构建知识库
搜索个人笔记、研究论文或代码文档
无需将数据发送到云端的文档问答
为项目文档建立索引以实现快速语义查找
创建PDF、Markdown或代码文件的可搜索档案

Prerequisites

前置条件

RLAMA requires Ollama running locally:

bash

undefined

RLAMA要求本地运行Ollama：

bash

undefined

Verify Ollama is running

ollama list

If not running, start it

brew services start ollama # macOS

or: ollama serve

undefined

undefined

Quick Reference

快速参考

Query a RAG (Most Common)

查询RAG（最常用）

Query an existing RAG system with a natural language question:

bash

undefined

使用自然语言问题查询现有RAG系统：

bash

undefined

Non-interactive query (returns answer and exits)

rlama run <rag-name> --query "your question here"

With more context chunks for complex questions

rlama run <rag-name> --query "explain the authentication flow" --context-size 30

Show which documents contributed to the answer

rlama run <rag-name> --query "what are the API endpoints?" --show-context

Use a different model for answering

rlama run <rag-name> --query "summarize the architecture" -m deepseek-r1:8b


**Script wrapper** for cleaner output:

```bash
python3 ~/.claude/skills/rlama/scripts/rlama_query.py <rag-name> "your query"
python3 ~/.claude/skills/rlama/scripts/rlama_query.py my-docs "what is the main idea?" --show-sources

rlama run <rag-name> --query "summarize the architecture" -m deepseek-r1:8b


**更简洁输出的脚本封装**：

```bash
python3 ~/.claude/skills/rlama/scripts/rlama_query.py <rag-name> "your query"
python3 ~/.claude/skills/rlama/scripts/rlama_query.py my-docs "what is the main idea?" --show-sources

Retrieve-Only Mode (Claude Synthesizes)

仅检索模式（由Claude合成结果）

Get raw chunks without local LLM generation. Claude reads the chunks directly and synthesizes a stronger answer than local models can produce.

When to use retrieve vs standard query:

Scenario	Use
Quick lookup, local model sufficient	`rlama_query.py` (standard)
Complex synthesis, nuanced reasoning	`rlama_retrieve.py` (retrieve-only)
Claude needs raw evidence to cite	`rlama_retrieve.py` (retrieve-only)
Offline/no Ollama for generation	`rlama_retrieve.py` (retrieve-only)

bash

undefined

获取原始文本块，无需本地LLM生成。Claude直接读取文本块并生成比本地模型更优质的结果。

何时使用仅检索模式 vs 标准查询：

场景	推荐用法
快速查找，本地模型足够满足需求	`rlama_query.py` （标准模式）
复杂合成、精细化推理	`rlama_retrieve.py` （仅检索模式）
需要Claude引用原始证据	`rlama_retrieve.py` （仅检索模式）
离线环境/无Ollama用于生成	`rlama_retrieve.py` （仅检索模式）

bash

undefined

Retrieve top 10 chunks (human-readable)

检索前10个文本块（人类可读格式）

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query"

Retrieve as JSON for programmatic use

以JSON格式检索（用于程序化调用）

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --json

More chunks for broad queries

针对宽泛查询检索更多文本块

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" -k 20

Force rebuild embedding cache

强制重建嵌入缓存

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --rebuild-cache

List RAGs with cache status

列出所有RAG及其缓存状态

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py --list


**External LLM Synthesis** (optional—retrieve chunks AND synthesize via OpenRouter, TogetherAI, Ollama, or any OpenAI-compatible endpoint):

```bash

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py --list


**外部LLM合成（可选）**——检索文本块并通过OpenRouter、TogetherAI、Ollama或任何兼容OpenAI的端点生成结果：

```bash

Synthesize via OpenRouter (auto-detected from model with /)

通过OpenRouter合成（自动从包含/的模型名称检测）

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --synthesize --synth-model anthropic/claude-sonnet-4

Synthesize via TogetherAI

通过TogetherAI合成

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --synthesize --provider togetherai

Synthesize via local Ollama (fully offline, uses research-grade system prompt)

通过本地Ollama合成（完全离线，使用研究级系统提示词）

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --synthesize --provider ollama

Synthesize via custom endpoint

通过自定义端点合成

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --synthesize --endpoint https://my-api.com/v1/chat/completions


**Environment variables for synthesis:**

| Variable | Provider |
|----------|----------|
| `OPENROUTER_API_KEY` | OpenRouter (default, auto-detected first) |
| `TOGETHER_API_KEY` | TogetherAI |
| `SYNTH_API_KEY` | Custom endpoint (via `--endpoint`) |
| *(none needed)* | Ollama (local, no auth) |

Provider auto-detection: model names with `/` → OpenRouter, otherwise → TogetherAI. Falls back to whichever API key is set.

**Quality tiers:**

| Tier | Method | Quality | Latency |
|------|--------|---------|---------|
| Best | Retrieve-only → Claude synthesizes | Strongest synthesis | ~1s retrieve |
| Good | `--synthesize --synth-model anthropic/claude-sonnet-4` | Strong, cited | ~3s |
| Decent | `--synthesize --provider togetherai` (Llama 70B) | Solid for factual | ~2s |
| Local | `--synthesize --provider ollama` (Qwen 7B) | Basic, may hedge | ~5s |
| Baseline | `rlama_query.py` (RLAMA built-in) | Weakest, no prompt control | ~3s |

Small local models (7B) use a tuned prompt optimized for Qwen (structured output, anti-hedge, domain-keyword aware). Cloud providers use a strict research-grade prompt with mandatory citations.

First run builds an embedding cache (~30s for 3K chunks, ~10min for 25K chunks). Subsequent queries are <1s. Large RAGs use incremental checkpointing—if Ollama crashes mid-build, re-run to resume from the last checkpoint. Individual chunks are truncated to 5K chars to stay within nomic-embed-text's context window.

**Benchmarking:**

```bash

python3 ~/.claude/skills/rlama/scripts/rlama_retrieve.py <rag-name> "your query" --synthesize --endpoint https://my-api.com/v1/chat/completions


**合成功能的环境变量：**

| 变量名 | 服务商 |
|----------|----------|
| `OPENROUTER_API_KEY` | OpenRouter（默认，优先自动检测） |
| `TOGETHER_API_KEY` | TogetherAI |
| `SYNTH_API_KEY` | 自定义端点（配合`--endpoint`使用） |
| *无需设置* | Ollama（本地运行，无需认证） |

服务商自动检测规则：模型名称包含/ → OpenRouter，否则 → TogetherAI。若都不匹配，则使用已设置的任意API密钥。

**质量层级：**

| 层级 | 方式 | 质量 | 延迟 |
|------|--------|---------|---------|
| 最佳 | 仅检索 → 由Claude合成 | 最强合成能力 | ~1秒检索 |
| 优秀 | `--synthesize --synth-model anthropic/claude-sonnet-4` | 高质量，带引用 | ~3秒 |
| 良好 | `--synthesize --provider togetherai`（Llama 70B） | 事实性可靠 | ~2秒 |
| 本地 | `--synthesize --provider ollama`（Qwen 7B） | 基础能力，可能存在模糊表述 | ~5秒 |
| 基准 | `rlama_query.py`（RLAMA内置） | 能力最弱，无提示词控制 | ~3秒 |

小型本地模型（7B参数）使用针对Qwen优化的提示词（结构化输出、减少模糊表述、领域关键词感知）。云端服务商使用严格的研究级提示词，强制要求引用来源。

首次运行会构建嵌入缓存（3000个文本块约需30秒，25000个文本块约需10分钟）。后续查询耗时<1秒。大型RAG系统使用增量 checkpoint——若Ollama在构建过程中崩溃，重新运行即可从上次断点恢复。单个文本块会被截断至5000字符，以适配nomic-embed-text的上下文窗口。

**基准测试：**

```bash

Retrieval quality only

仅测试检索质量

python3 ~/.claude/skills/rlama/scripts/rlama_bench.py <rag-name> --retrieval-only

Full synthesis benchmark (8 test cases)

完整合成基准测试（8个测试用例）

python3 ~/.claude/skills/rlama/scripts/rlama_bench.py <rag-name> --provider ollama --verbose

Single test case

单个测试用例

python3 ~/.claude/skills/rlama/scripts/rlama_bench.py <rag-name> --provider ollama --case 0

JSON output for analysis

以JSON格式输出结果用于分析

python3 ~/.claude/skills/rlama/scripts/rlama_bench.py <rag-name> --provider ollama --json


Scores: retrieval precision, topic coverage, grounding, directness (anti-hedge), composite (0-100).

python3 ~/.claude/skills/rlama/scripts/rlama_bench.py <rag-name> --provider ollama --json


评分维度：检索精度、主题覆盖度、事实依据、直接性（反模糊表述）、综合得分（0-100）。

Create a RAG

创建RAG系统

Index documents from a folder into a new RAG system:

bash

undefined

将文件夹中的文档索引到新的RAG系统：

bash

undefined

Basic creation (uses llama3.2 by default)

基础创建（默认使用llama3.2）

rlama rag llama3.2 <rag-name> <folder-path>

Examples

示例

rlama rag llama3.2 my-notes ~/Notes rlama rag llama3.2 project-docs ./docs rlama rag llama3.2 research-papers ~/Papers

With exclusions

排除指定内容

rlama rag llama3.2 codebase ./src --exclude-dir=node_modules,dist,.git --exclude-ext=.log,.tmp

Only specific file types

仅处理特定文件类型

rlama rag llama3.2 markdown-docs ./docs --process-ext=.md,.txt

Custom chunking strategy

自定义分块策略

rlama rag llama3.2 my-rag ./docs --chunking=semantic --chunk-size=1500 --chunk-overlap=300


**Chunking strategies:**
- `hybrid` (default) - Combines semantic and fixed chunking
- `semantic` - Respects document structure (paragraphs, sections)
- `fixed` - Fixed character count chunks
- `hierarchical` - Preserves document hierarchy

rlama rag llama3.2 my-rag ./docs --chunking=semantic --chunk-size=1500 --chunk-overlap=300


**分块策略：**
- `hybrid`（默认）- 结合语义分块和固定长度分块
- `semantic` - 尊重文档结构（段落、章节）
- `fixed` - 固定字符长度分块
- `hierarchical` - 保留文档层级结构

List RAG Systems

列出RAG系统

bash

undefined

bash

undefined

List all RAGs

列出所有RAG系统

rlama list

List documents in a specific RAG

列出指定RAG中的文档

rlama list-docs <rag-name>

Inspect chunks (debugging)

查看文本块（调试用）

rlama list-chunks <rag-name> --document=filename.pdf

undefined

rlama list-chunks <rag-name> --document=filename.pdf

undefined

Manage Documents

文档管理

Add documents to existing RAG:

bash

rlama add-docs <rag-name> <folder-or-file>

向现有RAG添加文档：

bash

rlama add-docs <rag-name> <folder-or-file>

Examples

示例

rlama add-docs my-notes ~/Notes/new-notes rlama add-docs research ./papers/new-paper.pdf


**Remove a document:**

```bash
rlama remove-doc <rag-name> <document-id>

rlama add-docs my-notes ~/Notes/new-notes rlama add-docs research ./papers/new-paper.pdf


**删除文档：**

```bash
rlama remove-doc <rag-name> <document-id>

Document ID is typically the filename

文档ID通常为文件名

rlama remove-doc my-notes old-note.md rlama remove-doc research outdated-paper.pdf

Force remove without confirmation

强制删除，无需确认

rlama remove-doc my-notes old-note.md --force

undefined

rlama remove-doc my-notes old-note.md --force

undefined

Delete a RAG

删除RAG系统

bash

rlama delete <rag-name>

bash

rlama delete <rag-name>

Or manually remove the data directory

或手动删除数据目录

rm -rf ~/.rlama/<rag-name>

undefined

rm -rf ~/.rlama/<rag-name>

undefined

Advanced Features

高级功能

Web Crawling

网页爬取

Create a RAG from website content:

bash

undefined

从网站内容创建RAG系统：

bash

undefined

Crawl a website and create RAG

爬取网站并创建RAG

rlama crawl-rag llama3.2 docs-rag https://docs.example.com

Add web content to existing RAG

向现有RAG添加网页内容

rlama crawl-add-docs my-rag https://blog.example.com

undefined

rlama crawl-add-docs my-rag https://blog.example.com

undefined

Directory Watching

目录监控

Automatically update RAG when files change:

bash

undefined

文件变化时自动更新RAG系统：

bash

undefined

Enable watching

启用监控

rlama watch <rag-name> <folder-path>

Check for new files manually

手动检查新文件

rlama check-watched <rag-name>

Disable watching

禁用监控

rlama watch-off <rag-name>

undefined

rlama watch-off <rag-name>

undefined

Website Watching

网站监控

Monitor websites for content updates:

bash

rlama web-watch <rag-name> https://docs.example.com
rlama check-web-watched <rag-name>
rlama web-watch-off <rag-name>

监控网站内容更新：

bash

rlama web-watch <rag-name> https://docs.example.com
rlama check-web-watched <rag-name>
rlama web-watch-off <rag-name>

Reranking

重排序

Improve result relevance with reranking:

bash

undefined

提升结果相关性：

bash

undefined

Add reranker to existing RAG

为现有RAG添加重排序器

rlama add-reranker <rag-name>

Configure reranker weight (0-1, default 0.7)

配置重排序器权重（0-1，默认0.7）

rlama update-reranker <rag-name> --reranker-weight=0.8

Disable reranking

禁用重排序

rlama rag llama3.2 my-rag ./docs --disable-reranker

undefined

rlama rag llama3.2 my-rag ./docs --disable-reranker

undefined

API Server

API服务端

Run RLAMA as an API server for programmatic access:

bash

undefined

将RLAMA作为API服务端运行，支持程序化调用：

bash

undefined

Start API server

启动API服务端

rlama api --port 11249

Query via API

通过API查询

curl -X POST http://localhost:11249/rag
-H "Content-Type: application/json"
-d '{ "rag_name": "my-docs", "prompt": "What are the key points?", "context_size": 20 }'

undefined

curl -X POST http://localhost:11249/rag
-H "Content-Type: application/json"
-d '{ "rag_name": "my-docs", "prompt": "What are the key points?", "context_size": 20 }'

undefined

Model Management

模型管理

bash

undefined

bash

undefined

Update the model used by a RAG

更新RAG使用的模型

rlama update-model <rag-name> <new-model>

Example: Switch to a more powerful model

示例：切换到更强大的模型

rlama update-model my-rag deepseek-r1:8b

Use Hugging Face models

使用Hugging Face模型

rlama rag hf.co/username/repo my-rag ./docs rlama rag hf.co/username/repo:Q4_K_M my-rag ./docs

Use OpenAI models (requires OPENAI_API_KEY)

使用OpenAI模型（需要OPENAI_API_KEY）

export OPENAI_API_KEY="your-key" rlama rag gpt-4-turbo my-openai-rag ./docs

undefined

export OPENAI_API_KEY="your-key" rlama rag gpt-4-turbo my-openai-rag ./docs

undefined

Configuration

配置

Data Directory

数据目录

By default, RLAMA stores data in

~/.rlama/

. Change this with

--data-dir

:

bash

undefined

默认情况下，RLAMA将数据存储在

~/.rlama/

。可通过

--data-dir

修改：

bash

undefined

Use custom data directory

使用自定义数据目录

rlama --data-dir=/path/to/custom list rlama --data-dir=/projects/rag-data rag llama3.2 project-rag ./docs

Or set via environment (add to ~/.zshrc)

或通过环境变量设置（添加到~/.zshrc）

export RLAMA_DATA_DIR="/path/to/custom"

undefined

export RLAMA_DATA_DIR="/path/to/custom"

undefined

Ollama Configuration

Ollama配置

bash

undefined

bash

undefined

Custom Ollama host

自定义Ollama主机

rlama --host=192.168.1.100 --port=11434 run my-rag

Or via environment

或通过环境变量设置

export OLLAMA_HOST="http://192.168.1.100:11434"

undefined

export OLLAMA_HOST="http://192.168.1.100:11434"

undefined

Default Model

默认模型

The skill uses

qwen2.5:7b

by default (changed from llama3.2 in Jan 2026). For legacy mode:

bash

undefined

本技能默认使用

qwen2.5:7b

（2026年1月从llama3.2变更）。如需使用旧版默认模型：

bash

undefined

Use the old llama3.2 default

使用旧版默认模型创建RAG

python3 ~/.claude/skills/rlama/scripts/rlama_manage.py create my-rag ./docs --legacy

Per-command model override

单命令模型覆盖

rlama rag deepseek-r1:8b my-rag ./docs

For queries

查询时指定模型

rlama run my-rag --query "question" -m deepseek-r1:8b


**Recommended models:**
| Model | Size | Best For |
|-------|------|----------|
| `qwen2.5:7b` | 7B | Default - better reasoning (recommended) |
| `llama3.2` | 3B | Fast, legacy default (use `--legacy`) |
| `deepseek-r1:8b` | 8B | Complex questions |
| `llama3.3:70b` | 70B | Highest quality (slow) |

rlama run my-rag --query "question" -m deepseek-r1:8b


**推荐模型：**
| 模型 | 大小 | 最佳适用场景 |
|-------|------|----------|
| `qwen2.5:7b` | 7B | 默认选项——推理能力更强（推荐） |
| `llama3.2` | 3B | 速度快，旧版默认（使用`--legacy`） |
| `deepseek-r1:8b` | 8B | 复杂问题 |
| `llama3.3:70b` | 70B | 最高质量（速度慢） |

Supported File Types

支持的文件类型

RLAMA indexes these formats:

Text:
```
.txt
```
,
```
.md
```
,
```
.markdown
```
Documents:
```
.pdf
```
,
```
.docx
```
,
```
.doc
```
Code:
```
.py
```
,
```
.js
```
,
```
.ts
```
,
```
.go
```
,
```
.rs
```
,
```
.java
```
,
```
.rb
```
,
```
.cpp
```
,
```
.c
```
,
```
.h
```
Data:
```
.json
```
,
```
.yaml
```
,
```
.yml
```
,
```
.csv
```
Web:
```
.html
```
,
```
.htm
```
Org-mode:
```
.org
```

RLAMA可索引以下格式：

文本：
```
.txt
```
,
```
.md
```
,
```
.markdown
```
文档：
```
.pdf
```
,
```
.docx
```
,
```
.doc
```
代码：
```
.py
```
,
```
.js
```
,
```
.ts
```
,
```
.go
```
,
```
.rs
```
,
```
.java
```
,
```
.rb
```
,
```
.cpp
```
,
```
.c
```
,
```
.h
```
数据：
```
.json
```
,
```
.yaml
```
,
```
.yml
```
,
```
.csv
```
网页：
```
.html
```
,
```
.htm
```
Org模式：
```
.org
```

Example Workflows

示例工作流

Personal Knowledge Base

个人知识库

bash

undefined

bash

undefined

Create from multiple folders

从多个文件夹创建

rlama rag llama3.2 personal-kb ~/Documents rlama add-docs personal-kb ~/Notes rlama add-docs personal-kb ~/Downloads/papers

Query

查询

rlama run personal-kb --query "what did I write about project management?"

undefined

rlama run personal-kb --query "what did I write about project management?"

undefined

Code Documentation

代码文档

bash

undefined

bash

undefined

Index project docs

索引项目文档

rlama rag llama3.2 project-docs ./docs ./README.md

Query architecture

查询架构

rlama run project-docs --query "how does authentication work?" --context-size 25

undefined

rlama run project-docs --query "how does authentication work?" --context-size 25

undefined

Research Papers

研究论文

bash

undefined

bash

undefined

Create research RAG

创建研究RAG

rlama rag llama3.2 papers ~/Papers --exclude-ext=.bib

Add specific paper

添加特定论文

rlama add-docs papers ./new-paper.pdf

Query with high context

高上下文查询

rlama run papers --query "what methods are used for evaluation?" --context-size 30

undefined

rlama run papers --query "what methods are used for evaluation?" --context-size 30

undefined

Interactive Wizard

交互式向导

For guided RAG creation:

bash

rlama wizard

引导式创建RAG：

bash

rlama wizard

Resilient Indexing (Skip Problem Files)

弹性索引（跳过问题文件）

For folders with mixed content where some files may exceed embedding context limits (e.g., large PDFs), use the resilient script that processes files individually and skips failures:

bash

undefined

针对包含混合内容的文件夹（部分文件可能超出嵌入上下文限制，如大型PDF），使用弹性脚本逐个处理文件并跳过失败项：

bash

undefined

Create RAG, skipping files that fail

创建RAG，跳过处理失败的文件

python3 ~/.claude/skills/rlama/scripts/rlama_resilient.py create my-rag ~/Documents

Add to existing RAG, skipping failures

向现有RAG添加文件，跳过处理失败的文件

python3 ~/.claude/skills/rlama/scripts/rlama_resilient.py add my-rag ~/MoreDocs

With docs-only filter

仅处理文档类文件

python3 ~/.claude/skills/rlama/scripts/rlama_resilient.py create research ~/Papers --docs-only

With legacy model

使用旧版模型

python3 ~/.claude/skills/rlama/scripts/rlama_resilient.py create my-rag ~/Docs --legacy


The script reports which files were added and which were skipped due to errors.

python3 ~/.claude/skills/rlama/scripts/rlama_resilient.py create my-rag ~/Docs --legacy


脚本会报告已添加的文件和因错误跳过的文件。

Progress Monitoring

进度监控

Monitor long-running RLAMA operations in real-time using the logging system.

通过日志系统实时监控长时间运行的RLAMA操作。

Tail the Log File

跟踪日志文件

bash

undefined

bash

undefined

Watch all operations in real-time

实时查看所有操作

tail -f ~/.rlama/logs/rlama.log

Filter by RAG name

按RAG名称过滤

tail -f ~/.rlama/logs/rlama.log | grep my-rag

Pretty-print with jq

使用jq格式化输出

tail -f ~/.rlama/logs/rlama.log | jq -r '"(.ts) [(.cat)] (.msg)"'

Show only progress updates

仅显示进度更新

tail -f ~/.rlama/logs/rlama.log | jq -r 'select(.data.i) | "(.ts) [(.cat)] (.data.i)/(.data.total) (.data.file // .data.status)"'

undefined

tail -f ~/.rlama/logs/rlama.log | jq -r 'select(.data.i) | "(.ts) [(.cat)] (.data.i)/(.data.total) (.data.file // .data.status)"'

undefined

Check Operation Status

检查操作状态

bash

undefined

bash

undefined

Show active operations

显示活跃操作

python3 ~/.claude/skills/rlama/scripts/rlama_status.py

Show recent completed operations

显示最近完成的操作

python3 ~/.claude/skills/rlama/scripts/rlama_status.py --recent

Show both active and recent

显示活跃和最近操作

python3 ~/.claude/skills/rlama/scripts/rlama_status.py --all

Follow mode (formatted tail -f)

跟随模式（格式化的tail -f）

python3 ~/.claude/skills/rlama/scripts/rlama_status.py --follow

JSON output

JSON格式输出

python3 ~/.claude/skills/rlama/scripts/rlama_status.py --json

undefined

python3 ~/.claude/skills/rlama/scripts/rlama_status.py --json

undefined

Log File Format

日志文件格式

Logs are written in JSON Lines format to

~/.rlama/logs/rlama.log

:

json

{"ts": "2026-02-03T12:34:56.789", "level": "info", "cat": "INGEST", "msg": "Progress 45/100", "data": {"op_id": "ingest_abc123", "i": 45, "total": 100, "file": "doc.pdf", "eta_sec": 85}}

日志以JSON Lines格式写入

~/.rlama/logs/rlama.log

：

json

{"ts": "2026-02-03T12:34:56.789", "level": "info", "cat": "INGEST", "msg": "Progress 45/100", "data": {"op_id": "ingest_abc123", "i": 45, "total": 100, "file": "doc.pdf", "eta_sec": 85}}

Operations State

操作状态

Active and recent operations are tracked in

~/.rlama/logs/operations.json

:

json

{
  "active": {
    "ingest_abc123": {
      "type": "ingest",
      "rag_name": "my-docs",
      "started": "2026-02-03T12:30:00",
      "processed": 45,
      "total": 100,
      "eta_sec": 85
    }
  },
  "recent": [...]
}

活跃和最近的操作记录在

~/.rlama/logs/operations.json

：

json

{
  "active": {
    "ingest_abc123": {
      "type": "ingest",
      "rag_name": "my-docs",
      "started": "2026-02-03T12:30:00",
      "processed": 45,
      "total": 100,
      "eta_sec": 85
    }
  },
  "recent": [...]
}

Troubleshooting

故障排除

"Ollama not found"

bash

undefined

bash

undefined

Check Ollama status

检查Ollama状态

ollama --version ollama list

Start Ollama

启动Ollama

brew services start ollama # macOS ollama serve # Manual start

undefined

brew services start ollama # macOS ollama serve # 手动启动

undefined

"Model not found"

bash

undefined

bash

undefined

Pull the required model

拉取所需模型

ollama pull llama3.2 ollama pull nomic-embed-text # Embedding model

undefined

ollama pull llama3.2 ollama pull nomic-embed-text # 嵌入模型

undefined

Slow Indexing

索引速度慢

Use smaller embedding models
Exclude large binary files:
```
--exclude-ext=.bin,.zip,.tar
```
Exclude build directories:
```
--exclude-dir=node_modules,dist,build
```

使用更小的嵌入模型
排除大型二进制文件：
```
--exclude-ext=.bin,.zip,.tar
```
排除构建目录：
```
--exclude-dir=node_modules,dist,build
```

Poor Query Results

查询结果差

Increase context size:
```
--context-size=30
```
Use a better model:
```
-m deepseek-r1:8b
```
Re-index with semantic chunking:
```
--chunking=semantic
```
Enable reranking:
```
rlama add-reranker <rag-name>
```

增加上下文大小：
```
--context-size=30
```
使用更优模型：
```
-m deepseek-r1:8b
```
使用语义分块重新索引：
```
--chunking=semantic
```
启用重排序：
```
rlama add-reranker <rag-name>
```

Index Corruption

索引损坏

bash

undefined

bash

undefined

Delete and recreate

删除并重新创建

rm -rf ~/.rlama/<rag-name> rlama rag llama3.2 <rag-name> <folder-path>

undefined

rm -rf ~/.rlama/<rag-name> rlama rag llama3.2 <rag-name> <folder-path>

undefined

CLI Reference

CLI参考

Full command reference available at:

bash

rlama --help
rlama <command> --help

Or see

references/rlama-commands.md

for complete documentation.

完整命令参考可通过以下方式查看：

bash

rlama --help
rlama <command> --help

或查看

references/rlama-commands.md

获取完整文档。

rlama

Original

Translation

RLAMA - Local RAG System

RLAMA - 本地RAG系统

When to Use This Skill

适用场景

Prerequisites

前置条件

Verify Ollama is running

Verify Ollama is running

If not running, start it

If not running, start it

or: ollama serve

or: ollama serve

Quick Reference

快速参考

Query a RAG (Most Common)

查询RAG（最常用）

Non-interactive query (returns answer and exits)

Non-interactive query (returns answer and exits)

With more context chunks for complex questions

With more context chunks for complex questions

Show which documents contributed to the answer

Show which documents contributed to the answer

Use a different model for answering

Use a different model for answering

Retrieve-Only Mode (Claude Synthesizes)

仅检索模式（由Claude合成结果）

Retrieve top 10 chunks (human-readable)

检索前10个文本块（人类可读格式）

Retrieve as JSON for programmatic use

以JSON格式检索（用于程序化调用）

More chunks for broad queries

针对宽泛查询检索更多文本块

Force rebuild embedding cache

强制重建嵌入缓存

List RAGs with cache status

列出所有RAG及其缓存状态

Synthesize via OpenRouter (auto-detected from model with /)

通过OpenRouter合成（自动从包含/的模型名称检测）

Synthesize via TogetherAI

通过TogetherAI合成

Synthesize via local Ollama (fully offline, uses research-grade system prompt)

通过本地Ollama合成（完全离线，使用研究级系统提示词）

Synthesize via custom endpoint

通过自定义端点合成

Retrieval quality only

仅测试检索质量

Full synthesis benchmark (8 test cases)

完整合成基准测试（8个测试用例）

Single test case

单个测试用例

JSON output for analysis

以JSON格式输出结果用于分析

Create a RAG

创建RAG系统

Basic creation (uses llama3.2 by default)

基础创建（默认使用llama3.2）

Examples

示例

With exclusions

排除指定内容

Only specific file types

仅处理特定文件类型

Custom chunking strategy

自定义分块策略

List RAG Systems

列出RAG系统

List all RAGs

列出所有RAG系统

List documents in a specific RAG

列出指定RAG中的文档

Inspect chunks (debugging)

查看文本块（调试用）

Manage Documents

文档管理

Examples

示例

Document ID is typically the filename