Loading...
Loading...
Search data using vector similarity, full-text keywords, or hybrid methods with Reciprocal Rank Fusion (RRF). Use when setting up embeddings for search, configuring full-text indexing, writing vector_search/text_search/rrf SQL queries, using the /v1/search HTTP API, or configuring vector engines like S3 Vectors.
npx skill4agent add spiceai/skills spice-search| Method | When to Use | Requires |
|---|---|---|
| Vector search | Semantic similarity, RAG, recommendations | Embedding model + column embeddings |
| Full-text search | Keyword/phrase matching, exact terms | |
| Hybrid (RRF) | Best of both — combines rankings from multiple methods | Multiple search methods configured |
| Lexical (LIKE/=) | Exact pattern or value matching | Nothing extra |
embeddings:
- name: local_embeddings
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- name: openai_embeddings
from: openai:text-embedding-3-small
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }| Provider | From Format | Status |
|---|---|---|
| OpenAI | | Release Candidate |
| HuggingFace | | Release Candidate |
| Local file | | Release Candidate |
| Azure OpenAI | | Alpha |
| Google AI | | Alpha |
| Amazon Bedrock | | Alpha |
| Databricks | | Alpha |
| Model2Vec | | Alpha |
datasets:
- from: postgres:documents
name: docs
acceleration:
enabled: true
columns:
- name: content
embeddings:
- from: local_embeddings
row_id: id
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 64| Method | Description | When to Use |
|---|---|---|
| Accelerated | Precomputed and stored | Faster queries, frequently searched datasets |
| JIT (Just-in-Time) | Computed at query time (no acceleration) | Large or rarely queried datasets |
| Passthrough | Pre-existing embeddings used directly | Source already has |
curl -X POST http://localhost:8090/v1/search \
-H 'Content-Type: application/json' \
-d '{
"datasets": ["docs"],
"text": "cutting edge AI",
"where": "author=\"jeadie\"",
"additional_columns": ["title", "state"],
"limit": 5
}'| Field | Required | Description |
|---|---|---|
| Yes | Search text |
| No | Datasets to search (null = all searchable) |
| No | Extra columns to return |
| No | SQL filter predicate |
| No | Max results per dataset |
additional_columnsSELECT id, title, score
FROM vector_search(docs, 'cutting edge AI')
WHERE state = 'Open'
ORDER BY score DESC
LIMIT 5;vector_searchvector_search(
table STRING, -- Dataset name (required)
query STRING, -- Search text (required)
col STRING, -- Column (optional if single embedding column)
limit INTEGER, -- Max results (default: 1000)
include_score BOOLEAN -- Include score column (default: TRUE)
) RETURNS TABLELimitation:UDTF does not yet support chunked embedding columns. Use the HTTP API for chunked data.vector_search
datasets:
- from: postgres:articles
name: articles
acceleration:
enabled: true
columns:
- name: title
full_text_search:
enabled: true
row_id:
- id
- name: body
full_text_search:
enabled: trueSELECT id, title, score
FROM text_search(articles, 'search keywords', body)
ORDER BY score DESC
LIMIT 5;text_searchtext_search(
table STRING, -- Dataset name (required)
query STRING, -- Keywords/phrase (required)
col STRING, -- Column (required if multiple indexed columns)
limit INTEGER, -- Max results (default: 1000)
include_score BOOLEAN -- Include score column (default: TRUE)
) RETURNS TABLERRF Score = Σ(rank_weight / (k + rank))SELECT id, title, content, fused_score
FROM rrf(
vector_search(documents, 'machine learning algorithms'),
text_search(documents, 'neural networks deep learning', content),
join_key => 'id'
)
ORDER BY fused_score DESC
LIMIT 5;SELECT fused_score, title, content
FROM rrf(
text_search(posts, 'artificial intelligence', rank_weight => 50.0),
vector_search(posts, 'AI machine learning', rank_weight => 200.0)
)
ORDER BY fused_score DESC
LIMIT 10;-- Exponential decay (1-hour scale)
SELECT fused_score, title, created_at
FROM rrf(
text_search(news, 'breaking news'),
vector_search(news, 'latest updates'),
time_column => 'created_at',
recency_decay => 'exponential',
decay_constant => 0.05,
decay_scale_secs => 3600
)
ORDER BY fused_score DESC
LIMIT 10;
-- Linear decay (24-hour window)
SELECT fused_score, content
FROM rrf(
text_search(posts, 'trending'),
vector_search(posts, 'viral popular'),
time_column => 'created_at',
recency_decay => 'linear',
decay_window_secs => 86400
)
ORDER BY fused_score DESC;SELECT fused_score, text, langs
FROM rrf(
vector_search(posts, 'ultimas noticias', rank_weight => 100),
text_search(posts, 'news'),
time_column => 'created_at',
recency_decay => 'exponential',
decay_constant => 0.05,
decay_scale_secs => 3600
)
WHERE trim(text) != ''
ORDER BY fused_score DESC LIMIT 15;rrf| Parameter | Type | Required | Description |
|---|---|---|---|
| Search UDTF | Yes (2+) | |
| String | No | Column for joining results (default: auto-hash) |
| Float | No | Smoothing parameter (default: 60.0, lower = more aggressive) |
| String | No | Timestamp column for recency boosting |
| String | No | |
| Float | No | Rate for exponential decay (default: 0.01) |
| Float | No | Time scale for exponential decay (default: 86400) |
| Float | No | Window for linear decay (default: 86400) |
| Float | No | Per-query weight (specified inside search calls) |
datasets:
- from: postgres:documents
name: docs
acceleration:
enabled: true
columns:
- name: content
embeddings:
- from: embed_model
row_id: id
metadata:
vectors: non-filterable
- name: category
metadata:
vectors: filterable # enable filtering on this column
vectors:
enabled: true
engine: s3_vectors
params:
s3_vectors_bucket: my-bucket
s3_vectors_region: us-east-1SELECT * FROM my_table WHERE column LIKE '%substring%';
SELECT * FROM my_table WHERE column = 'exact value';
SELECT * FROM my_table WHERE regexp_like(column, '^spice.*ai$');spice search "cutting edge AI" --dataset docs --limit 5
spice search --cache-control no-cache "search terms"version: v1
kind: Spicepod
name: search_app
secrets:
- from: env
name: env
embeddings:
- name: embeddings
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
datasets:
- from: file:articles.parquet
name: articles
acceleration:
enabled: true
engine: duckdb
columns:
- name: title
full_text_search:
enabled: true
row_id:
- id
- name: content
embeddings:
- from: embeddings
full_text_search:
enabled: trueSELECT id, title, content, fused_score
FROM rrf(
vector_search(articles, 'machine learning best practices'),
text_search(articles, 'neural network training', content),
join_key => 'id',
time_column => 'published_at',
recency_decay => 'exponential',
decay_constant => 0.01,
decay_scale_secs => 86400
)
WHERE fused_score > 0.01
ORDER BY fused_score DESC
LIMIT 10;| Issue | Solution |
|---|---|
| Verify embeddings configured on column and model is loaded |
| Check |
| Poor hybrid search relevance | Tune |
| Results missing recent content | Add |
| Chunked vector search not working via SQL | Use HTTP API instead (UDTF doesn't support chunked columns yet) |