how-to-do-full-text-search-with-sqlite

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

How to do Full Text Search with SQLite

如何在SQLite中实现全文搜索

SQLite has a powerful way to add new functionality via loadable extensions. The first-party ones include fts5json1 and a couple others.
When building applications it is common to add searching features based on data coming from tables and you may already have queries for fuzzy searching with LIKE. You may be excited to hear that SQLite can easily add fully query capabilities over a dataset all with just a simple MATCH keyword. 👀
SQLite 提供了一种通过可加载扩展添加新功能的强大方式。官方提供的扩展包括fts5json1以及其他几个扩展。
在开发应用程序时,基于表中数据添加搜索功能是很常见的需求,你可能已经使用LIKE关键字实现了模糊查询。值得高兴的是,SQLite 只需通过简单的MATCH关键字,就能轻松为数据集添加完整的全文查询功能。👀

Creating your first search index 

创建你的第一个搜索索引

Full text search in SQLite requires storing the index in VIRTUAL tables, which allow for optimized storage of the index based on the queries we will execute against it.
You can create the virtual table for the index making sure to include the USING directive for the fts5 target.
CREATE VIRTUAL TABLE posts_fts USING fts5 (
    title,
    description,
    content,
    content=posts,
    content_rowid=id
);
Text IDs are also supported instead of just INTEGERS.
This is a standard callout. You can customize its content and even the icon.
SQLite 中的全文搜索需要将索引存储在VIRTUAL(虚拟)表中,这类表可以根据我们将要执行的查询对索引进行优化存储。
你可以为索引创建虚拟表,确保在语句中包含指向fts5的USING子句。
CREATE VIRTUAL TABLE posts_fts USING fts5 (
    title,
    description,
    content,
    content=posts,
    content_rowid=id
);
除了整数类型外,也支持文本类型的ID。
这是一个标准提示,你可以自定义它的内容甚至图标。

Contentless tables 

无内容表

You can also create a contentless table that will not be based on any existing tables:
CREATE VIRTUAL TABLE example_fts USING fts5 (
    name,
    description,
    content=''
);
你也可以创建不基于任何现有表的无内容表:
CREATE VIRTUAL TABLE example_fts USING fts5 (
    name,
    description,
    content=''
);

Keeping the index up to date 

保持索引更新

By having the source content be stored in another table we need to make sure to keep both tables in sync and avoid updating the index in a hot path when trying to make a query.
By default when you create table it will be empty, even if the source table is populated. You do have various options for populating the index.
由于源内容存储在另一个表中,我们需要确保两个表保持同步,并且避免在查询的关键路径中更新索引。
默认情况下,即使源表中已有数据,你创建的虚拟表初始也是空的。你可以通过多种方式填充索引。

Update by query 

通过查询更新

If you use a contentless table or want to pull in data from a view you can update by query.
INSERT INTO posts_fts (id, title, description, content)
SELECT id, title, description, content FROM posts;
如果你使用无内容表,或者想要从视图中提取数据,可以通过查询来更新索引。
INSERT INTO posts_fts (id, title, description, content)
SELECT id, title, description, content FROM posts;

Rebuild command 

重建命令

Using the rebuild command it will update the index based on the content table specified.
INSERT INTO posts_fts(posts_fts) VALUES('rebuild');
使用重建命令,系统会根据指定的内容表更新索引。
INSERT INTO posts_fts(posts_fts) VALUES('rebuild');

Triggers 

触发器

We can use SQLite triggers to automatically keep the records updated:
CREATE TRIGGER posts_insert AFTER INSERT ON posts BEGIN
  INSERT INTO posts_fts(id, title, description, content)
  VALUES (new.id, new.title, new.description, new.content);
END;

CREATE TRIGGER posts_delete AFTER DELETE ON posts BEGIN
  INSERT INTO posts_fts(posts_fts, id, title, description, content)
  VALUES ('delete', old.id, old.title, old.description, old.content);
END;

CREATE TRIGGER posts_update AFTER UPDATE ON posts BEGIN
  INSERT INTO posts_fts(posts_fts, id, title, description, content)
  VALUES ('delete', old.id, old.title, old.description, old.content);
  INSERT INTO posts_fts(id, title, description, content)
  VALUES (new.id, new.title, new.description, new.content);
END;
This will always ensure the two tables are in sync for any CRUD actions on the source table.
我们可以使用SQLite触发器来自动保持记录同步:
CREATE TRIGGER posts_insert AFTER INSERT ON posts BEGIN
  INSERT INTO posts_fts(id, title, description, content)
  VALUES (new.id, new.title, new.description, new.content);
END;

CREATE TRIGGER posts_delete AFTER DELETE ON posts BEGIN
  INSERT INTO posts_fts(posts_fts, id, title, description, content)
  VALUES ('delete', old.id, old.title, old.description, old.content);
END;

CREATE TRIGGER posts_update AFTER UPDATE ON posts BEGIN
  INSERT INTO posts_fts(posts_fts, id, title, description, content)
  VALUES ('delete', old.id, old.title, old.description, old.content);
  INSERT INTO posts_fts(id, title, description, content)
  VALUES (new.id, new.title, new.description, new.content);
END;
这样就能确保对源表执行任何CRUD操作时,两个表始终保持同步。

Searching the index 

搜索索引

Query syntax 

查询语法

Here is the supported query syntax:
<phrase>    := string [*]
<phrase>    := <phrase> + <phrase>
<neargroup> := NEAR ( <phrase> <phrase> ... [, N] )
<query>     := [ [-] <colspec> :] [^] <phrase>
<query>     := [ [-] <colspec> :] <neargroup>
<query>     := [ [-] <colspec> :] ( <query> )
<query>     := <query> AND <query>
<query>     := <query> OR <query>
<query>     := <query> NOT <query>
<colspec>   := colname
<colspec>   := { colname1 colname2 ... }
To preform an actual query on the index we will need to use the MATCH keyword and order by the rank.
SELECT posts.* FROM posts_fts
INNER JOIN posts ON posts.id = posts_fts.rowid
WHERE posts_fts MATCH :query
ORDER BY rank;
以下是支持的查询语法:
<phrase>    := string [*]
<phrase>    := <phrase> + <phrase>
<neargroup> := NEAR ( <phrase> <phrase> ... [, N] )
<query>     := [ [-] <colspec> :] [^] <phrase>
<query>     := [ [-] <colspec> :] <neargroup>
<query>     := [ [-] <colspec> :] ( <query> )
<query>     := <query> AND <query>
<query>     := <query> OR <query>
<query>     := <query> NOT <query>
<colspec>   := colname
<colspec>   := { colname1 colname2 ... }
要对索引执行实际查询,我们需要使用MATCH关键字,并按rank排序。
SELECT posts.* FROM posts_fts
INNER JOIN posts ON posts.id = posts_fts.rowid
WHERE posts_fts MATCH :query
ORDER BY rank;

Demo

演示

Reference 

参考资料