sparksatchel

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SparkSatchel 灵犀妙计

SparkSatchel 灵犀妙计

A Meta-Skill that provides intelligent skill retrieval and recommendation for Claude Code.
一款为Claude Code提供智能技能检索与推荐的元技能(Meta-Skill)。

Core Philosophy

核心理念

"Think twice before acting, keep the user burden-free"
"三思而后行,减轻用户负担"

Quick Start

快速开始

python
from src.retriever import SparkSatchel

sparksatchel = SparkSatchel()
result = sparksatchel.retrieve("process this PDF")
python
from src.retriever import SparkSatchel

sparksatchel = SparkSatchel()
result = sparksatchel.retrieve("process this PDF")

Decision Mechanism

决策机制

The system evaluates confidence and responds accordingly:
Confidence LevelThresholdAction
High>70%Auto-recommend with reasoning
Medium40-70%Recommend primary + alternatives
Low<40%Present candidates and ask user
系统会评估置信度并做出相应响应:
置信度等级阈值对应操作
>70%自动推荐并附上推理说明
40-70%推荐主选技能+备选技能
<40%展示候选技能并询问用户

Key Features

核心功能

1. Semantic Retrieval

1. 语义检索

  • Bilingual embeddings: Supports Chinese and English via paraphrase-multilingual-MiniLM-L12-v2
  • Sharded storage: Skills organized by category for efficient retrieval
  • Vector similarity: Matches user intent to skill descriptions
  • 双语嵌入模型:通过paraphrase-multilingual-MiniLM-L12-v2支持中文和英文
  • 分片存储:技能按类别组织,提升检索效率
  • 向量相似度匹配:将用户意图与技能描述进行匹配

2. Intent Analysis

2. 意图分析

Extracts from user requests:
  • Primary intent
  • Keywords
  • Entities (filenames, formats, etc.)
从用户请求中提取:
  • 核心意图
  • 关键词
  • 实体(文件名、格式等)

3. Historical Learning

3. 历史学习

  • Tracks all skill calls
  • Records success/failure feedback
  • Calculates skill success rates
  • Optimizes recommendation ranking
  • 跟踪所有技能调用记录
  • 记录成功/失败反馈
  • 计算技能成功率
  • 优化推荐排序

4. Health Checking

4. 健康检查

  • Detects missing skills
  • Identifies corrupted skills
  • Handles version mismatches
  • Provides fallback strategies
  • 检测缺失的技能
  • 识别损坏的技能
  • 处理版本不匹配问题
  • 提供降级策略

5. Cache Management

5. 缓存管理

  • Monitors database size
  • Tracks record count
  • Suggests cleanup when needed
  • Supports auto/manual cleanup
  • 监控数据库大小
  • 跟踪记录数量
  • 在需要时建议清理
  • 支持自动/手动清理

Usage Examples

使用示例

High Confidence (Auto-recommend)

高置信度(自动推荐)

User: "Process this PDF"
SparkSatchel: "I recommend pdf-skill because it specializes in PDF documents (92% historical success rate)"
用户:"处理这份PDF"
SparkSatchel:"我推荐pdf-skill,因为它专门处理PDF文档(历史成功率92%)"

Medium Confidence (With alternatives)

中置信度(附带备选方案)

User: "Create a document"
SparkSatchel: "I suggest docx-skill. pdf-skill is also available. Want me to compare them?"
用户:"创建一份文档"
SparkSatchel:"我建议使用docx-skill。也可以选择pdf-skill。需要我对比它们吗?"

Low Confidence (Ask user)

低置信度(询问用户)

User: "Process data"
SparkSatchel: "Found several matching skills. Which one fits best?
- xlsx-skill: Excel spreadsheet processing
- pandas-skill: Data analysis with Python
- csv-skill: CSV file handling"
用户:"处理数据"
SparkSatchel:"找到多个匹配的技能。哪一个最符合你的需求?
- xlsx-skill:Excel电子表格处理
- pandas-skill:基于Python的数据分析
- csv-skill:CSV文件处理"

Embedding Models

嵌入模型

Pre-installed Model (Ready to Use)

预安装模型(开箱即用)

SparkSatchel comes with a pre-downloaded bilingual embedding model:
  • Model:
    paraphrase-multilingual-MiniLM-L12-v2
  • Size: ~470MB
  • Languages: 50+ including Chinese and English
  • Dimension: 384
  • Status: ✅ Pre-downloaded, ready to use out-of-the-box
  • Location:
    ~/.cache/huggingface/hub/
The default model provides good balance between:
  • ✅ Bilingual support (Chinese + English)
  • ✅ Lightweight size
  • ✅ Fast inference
  • ✅ Offline capability
SparkSatchel预装了一款双语嵌入模型:
  • 模型
    paraphrase-multilingual-MiniLM-L12-v2
  • 大小:约470MB
  • 支持语言:50+种,包含中文和英文
  • 维度:384
  • 状态:✅ 已预下载,可直接开箱即用
  • 存储位置
    ~/.cache/huggingface/hub/
默认模型在以下方面达到了良好平衡:
  • ✅ 双语支持(中文+英文)
  • ✅ 轻量化体积
  • ✅ 快速推理
  • ✅ 离线运行能力

Model Selection Guide

模型选择指南

Choose the right model based on your scenario:
根据你的使用场景选择合适的模型:

Model Comparison

模型对比

ModelSizeLanguagesSpeedAccuracyBest For
paraphrase-multilingual-MiniLM-L12-v2470MB50+⭐⭐⭐⭐⭐⭐⭐⭐⭐Default choice - Balanced performance
shibing624/text2vec-base-chinese110MBChinese⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐Chinese-only - Faster & more accurate
intfloat/multilingual-e5-large1.3GB100+⭐⭐⭐⭐⭐⭐⭐⭐High accuracy - Best for complex queries
BAAI/bge-large-zh-v1.5390MBChinese⭐⭐⭐⭐⭐⭐⭐⭐⭐Chinese advanced - State-of-the-art
all-MiniLM-L6-v223MBEnglish⭐⭐⭐⭐⭐⭐⭐⭐English only - Ultra lightweight
模型大小支持语言速度准确率最佳适用场景
paraphrase-multilingual-MiniLM-L12-v2470MB50+⭐⭐⭐⭐⭐⭐⭐⭐⭐默认选择 - 性能均衡
shibing624/text2vec-base-chinese110MB中文⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐中文专属 - 更快更准确
intfloat/multilingual-e5-large1.3GB100+⭐⭐⭐⭐⭐⭐⭐⭐高精度需求 - 复杂查询最佳选择
BAAI/bge-large-zh-v1.5390MB中文⭐⭐⭐⭐⭐⭐⭐⭐⭐中文进阶 - 业界领先水平
all-MiniLM-L6-v223MB英文⭐⭐⭐⭐⭐⭐⭐⭐英文专属 - 超轻量级

Scenario Recommendations

场景推荐

Quick Download with Script
Use the provided download script for convenience:
bash
undefined
使用脚本快速下载
使用提供的下载脚本便捷获取模型:
bash
undefined

List available models

列出可用模型

python scripts/download_model.py --list
python scripts/download_model.py --list

Download default model (already downloaded ✅)

下载默认模型(已预下载 ✅)

python scripts/download_model.py default
python scripts/download_model.py default

Download Chinese-optimized model

下载中文优化模型

python scripts/download_model.py chinese
python scripts/download_model.py chinese

Download high-accuracy multilingual model

下载高精度多语言模型

python scripts/download_model.py large
python scripts/download_model.py large

Download ultra-lightweight English model

下载超轻量级英文模型

python scripts/download_model.py english

**Scenario 1: Chinese-dominant environment**

```bash
python scripts/download_model.py english

**场景1:中文主导环境**

```bash

Option A: Use download script

选项A:使用下载脚本

python scripts/download_model.py chinese
python scripts/download_model.py chinese

Option B: Manual download

选项B:手动下载

pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('shibing624/text2vec-base-chinese')"

**Scenario 2: English-only (fastest)**

```bash
pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('shibing624/text2vec-base-chinese')"

**场景2:纯英文环境(最快速度)**

```bash

Option A: Use download script

选项A:使用下载脚本

python scripts/download_model.py english
python scripts/download_model.py english

Option B: Manual download

选项B:手动下载

pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

**Scenario 3: Maximum accuracy (multilingual)**
```bash
pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

**场景3:最高精度需求(多语言)**
```bash

Download high-accuracy model

下载高精度模型

pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('intfloat/multilingual-e5-large')"

**Scenario 4: Cloud-based (no local storage)**
```bash
pip install sentence-transformers python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('intfloat/multilingual-e5-large')"

**场景4:云端部署(无本地存储)**
```bash

Use OpenAI API (requires API key)

使用OpenAI API(需要API密钥)

pip install openai
undefined
pip install openai
undefined

How to Switch Models

模型切换方法

Option 1: Modify code (permanent)
Edit
src/models/embedding.py
:
python
class EmbeddingModel:
    # Change default model
    DEFAULT_MODEL = "shibing624/text2vec-base-chinese"  # Your choice
Option 2: Pass model name (temporary)
python
from src.models.embedding import EmbeddingModel
from src.retriever import SparkSatchel
选项1:修改代码(永久生效)
编辑
src/models/embedding.py
python
class EmbeddingModel:
    # 修改默认模型
    DEFAULT_MODEL = "shibing624/text2vec-base-chinese"  # 你的选择
选项2:传入模型名称(临时生效)
python
from src.models.embedding import EmbeddingModel
from src.retriever import SparkSatchel

Use custom model

使用自定义模型

custom_model = EmbeddingModel( model_name="shibing624/text2vec-base-chinese", device="cpu" # or "cuda" for GPU acceleration )
custom_model = EmbeddingModel( model_name="shibing624/text2vec-base-chinese", device="cpu" # 或使用"cuda"进行GPU加速 )

Pass to SparkSatchel

传入SparkSatchel

sparksatchel = SparkSatchel(embedding_model=custom_model)

**Option 3: Use OpenAI API**

```python
import openai

def openai_embedding(text: str) -> list:
    response = openai.Embedding.create(
        model="text-embedding-3-small",
        input=text
    )
    return response['data'][0]['embedding']
sparksatchel = SparkSatchel(embedding_model=custom_model)

**选项3:使用OpenAI API**

```python
import openai

def openai_embedding(text: str) -> list:
    response = openai.Embedding.create(
        model="text-embedding-3-small",
        input=text
    )
    return response['data'][0]['embedding']

Model Performance Tips

模型性能优化技巧

  1. GPU Acceleration: If you have NVIDIA GPU, use
    device="cuda"
    for 5-10x speedup
  2. Batch Processing: Process multiple texts at once for better throughput
  3. Caching: Models are cached after first download, no re-downloading needed
  4. Quantization: For memory-constrained environments, use 8-bit quantized models
  1. GPU加速:如果你有NVIDIA GPU,使用
    device="cuda"
    可提升5-10倍速度
  2. 批量处理:同时处理多个文本,提升吞吐量
  3. 缓存机制:模型首次下载后会被缓存,无需重复下载
  4. 量化处理:在内存受限环境中,使用8位量化模型

Project Structure

项目结构

SparkSatchel/
├── SKILL.md              # This file
├── requirements.txt      # Dependencies
├── src/
│   ├── retriever.py      # Main entry point
│   ├── models/           # Embedding models
│   ├── storage/          # Vector DB + history
│   ├── analysis/         # Intent + confidence
│   └── maintenance/      # Health + lifecycle + cache
└── data/                 # Data storage
    ├── collections/      # Vector databases
    └── history.db        # Call history
SparkSatchel/
├── SKILL.md              # 本文档
├── requirements.txt      # 依赖包
├── src/
│   ├── retriever.py      # 主入口文件
│   ├── models/           # 嵌入模型模块
│   ├── storage/          # 向量数据库+历史记录
│   ├── analysis/         # 意图分析+置信度计算
│   └── maintenance/      # 健康检查+生命周期+缓存管理
└── data/                 # 数据存储目录
    ├── collections/      # 向量数据库
    └── history.db        # 调用历史数据库

API Reference

API参考

Main Interface

主接口

python
class SparkSatchel:
    def retrieve(self, user_request: str) -> RetrievalResult:
        """Search and recommend skills"""

    def feedback(self, skill_name: str, success: bool, feedback: str = ""):
        """Record user feedback"""

    def check_health(self) -> Dict:
        """Check system health"""

    def cleanup(self, strategy: dict = None):
        """Execute cache cleanup"""
python
class SparkSatchel:
    def retrieve(self, user_request: str) -> RetrievalResult:
        """搜索并推荐技能"""

    def feedback(self, skill_name: str, success: bool, feedback: str = ""):
        """记录用户反馈"""

    def check_health(self) -> Dict:
        """检查系统健康状态"""

    def cleanup(self, strategy: dict = None):
        """执行缓存清理"""

Retrieval Result

检索结果

python
@dataclass
class RetrievalResult:
    confidence: float              # 0-1
    recommended_skill: str         # Skill name
    reasoning: str                 # Explanation
    alternative_skills: List[str]  # For medium confidence
    candidate_skills: List[Dict]   # For low confidence
    requires_confirmation: bool    # Needs user input?
python
@dataclass
class RetrievalResult:
    confidence: float              # 0-1
    recommended_skill: str         # 推荐技能名称
    reasoning: str                 # 推理说明
    alternative_skills: List[str]  # 中置信度场景下的备选技能
    candidate_skills: List[Dict]   # 低置信度场景下的候选技能
    requires_confirmation: bool    # 是否需要用户确认

Maintenance

维护操作

Check Health

检查健康状态

python
health = sparksatchel.check_health()
if health["cache"]["needs_cleanup"]:
    print(health["suggestion"])
python
health = sparksatchel.check_health()
if health["cache"]["needs_cleanup"]:
    print(health["suggestion"])

Cleanup Cache

清理缓存

python
from src.maintenance.cache import CleanupStrategy
python
from src.maintenance.cache import CleanupStrategy

By age (delete records older than 30 days)

按时间清理(删除30天前的记录)

sparksatchel.cleanup(CleanupStrategy.by_age(days=30))
sparksatchel.cleanup(CleanupStrategy.by_age(days=30))

By count (keep recent 1000 records)

按数量清理(保留最近1000条记录)

sparksatchel.cleanup(CleanupStrategy.by_count(keep=1000))
undefined
sparksatchel.cleanup(CleanupStrategy.by_count(keep=1000))
undefined

Tech Stack

技术栈

  • Python: 3.10+
  • Vector DB: ChromaDB
  • Embedding: sentence-transformers
  • History: SQLite
  • Python: 3.10+
  • 向量数据库: ChromaDB
  • 嵌入模型: sentence-transformers
  • 历史记录: SQLite

Performance

性能指标

MetricTarget
Retrieval latency<500ms (100k skills)
Memory usage<500MB
Startup time<3s
Accuracy>85%
指标目标值
检索延迟<500ms(10万条技能数据)
内存占用<500MB
启动时间<3s
推荐准确率>85%