Loading...
Loading...
Split text into contextual chunks for RAG/embedding pipelines. Document segmentation and section extraction using window, tfidf, punctuation, or hybrid strategies chosen by intent.
npx skill4agent add trkbt10/indexion-skills indexion-segment# Default window divergence strategy
indexion segment <input-file> <output-dir>
# TF-IDF based segmentation
indexion segment --strategy=tfidf <input-file> <output-dir>
# Punctuation-based segmentation
indexion segment --strategy=punctuation <input-file> <output-dir>
# Custom segment sizes
indexion segment --min-size=200 --max-size=3000 --target-size=800 document.txt output/
# Custom divergence threshold
indexion segment --threshold=0.5 document.txt output/
# Adaptive threshold mode (default)
indexion segment --adaptive document.txt output/
# Hybrid NCD+TF-IDF mode
indexion segment --hybrid --ncd-weight=0.6 --tfidf-weight=0.4 document.txt output/
# Custom window size
indexion segment --window-size=5 document.txt output/
# Custom output prefix
indexion segment --prefix=chunk document.txt output/| Option | Default | Description |
|---|---|---|
| window | Strategy: window, tfidf, punctuation |
| 100 | Minimum segment characters |
| 2000 | Maximum segment characters |
| 500 | Target segment characters |
| 0.42 | Divergence threshold |
| 3 | Window size |
| true | Adaptive threshold mode |
| false | NCD+TF-IDF hybrid mode |
| 0.5 | NCD weight in hybrid mode |
| 0.5 | TF-IDF weight in hybrid mode |
| segment | Output file prefix |
| Strategy | Description |
|---|---|
| Sliding window divergence detection |
| TF-IDF based topic change detection |
| Punctuation/sentence boundary based |
indexion segment <input-file> <output-dir>--threshold--target-size--hybrid