Full-Text Search

Token-based search across string fields

The FTS plugin indexes string fields for fast search queries.

Setup

import { Database, ftsPlugin } from "ctrodb"

const db = new Database({
  schema: {
    version: 1,
    collections: {
      articles: {
        fields: {
          title: { type: "string" },
          body: { type: "string" },
        },
        searchable: ["title", "body"],
      },
    },
  },
  plugins: [ftsPlugin()],
})

How it works

The plugin stores a token-to-document mapping in a _ctrodb_fts collection.

Indexing:

On create/update/delete, tokens are extracted from searchable fields
Each token maps to the set of document IDs containing it
Tokens are lowercased, deduplicated, and filtered for stop words

Basic search (substring matching)

Without the FTS index, .search() does case-insensitive substring matching:

const results = await articles
  .query()
  .search("body", "database schema")
  .fetch()

This checks if the body field contains the string "database schema" anywhere — it does not use the tokenized FTS index.

Indexed search (FTSIndexer)

To use the inverted index directly:

import { FTSIndexer } from "ctrodb"

const indexer = new FTSIndexer(adapter)
const ids = await indexer.search("articles", "database schema")
// ids contains IDs of documents matching ALL tokens (AND logic)

FTSIndexer.search() tokenizes the query and returns only document IDs that contain every token. This is true tokenized AND search.

Built-in stop words

Common English words are excluded: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

Tokenizer

import { tokenize } from "ctrodb"

tokenize("Hello World!") // ["hello", "world"]

The tokenizer splits on non-alphanumeric characters and lowercases. Duplicates are removed. Stop words are filtered.

Last updated on Jun 20, 2026