Full-Text Search in the Browser

Full-text search is one of those features that feels simple until you implement it. This post explains how ctrodb's FTS plugin works under the hood.

The naive approach

Without an index, searching means iterating every record and running String.includes():

const results = allRecords.filter((r) =>
  r.title.toLowerCase().includes(query.toLowerCase())
)

This works for small datasets. For hundreds or thousands of records, it gets slow.

Token-based indexing

ctrodb's FTS plugin takes a different approach. When a record is created or updated, it extracts tokens from searchable fields and stores a token-to-document mapping:

Token "database" -> [doc1, doc3, doc7]
Token "schema"   -> [doc1, doc4]
Token "react"    -> [doc2, doc5]

A search for "database schema" finds docs that have both tokens (AND logic).

Tokenizer behavior

The tokenizer:

Lowercases the input
Splits on non-alphanumeric characters
Removes duplicates
Filters out stop words (a, an, the, and, ...)
Returns unique tokens

import { tokenize } from "ctrodb"

tokenize("Hello World! The database")
// ["hello", "world", "database"]

Storage strategy

The index lives in a _ctrodb_fts collection alongside your data. Each entry is:

{
  id: "articles:database",
  token: "database",
  collection: "articles",
  docIds: [1, 3, 7]
}

This works with both MemoryAdapter and IndexedDBAdapter.

Query execution

When you call .search("title", "database schema"):

The tokenizer splits the query into tokens: ["database", "schema"]
For each token, the indexer looks up the corresponding entry
Doc IDs that appear in ALL token entry sets are returned
These IDs are passed to the adapter for record retrieval

Performance characteristics

Indexing: O(tokens per record) — happens on every create/update/delete
Search: O(tokens in query) — constant time lookups per token
Storage: O(unique tokens x docs per token) — scales with vocabulary size

When to use it

The FTS plugin works best for:

Searchable text fields (titles, descriptions, body content)
Datasets up to tens of thousands of records
Apps where exact substring matching is sufficient

It does not support fuzzy search, stemming, or relevance scoring. For those, consider a dedicated search service.

Full-text search is one of those features that feels simple until you implement it. This post explains how ctrodb's FTS plugin works under the hood.

The naive approach

Without an index, searching means iterating every record and running String.includes():

const results = allRecords.filter((r) =>
  r.title.toLowerCase().includes(query.toLowerCase())
)

This works for small datasets. For hundreds or thousands of records, it gets slow.

Token-based indexing

ctrodb's FTS plugin takes a different approach. When a record is created or updated, it extracts tokens from searchable fields and stores a token-to-document mapping:

Token "database" -> [doc1, doc3, doc7]
Token "schema"   -> [doc1, doc4]
Token "react"    -> [doc2, doc5]

A search for "database schema" finds docs that have both tokens (AND logic).

Tokenizer behavior

The tokenizer:

Lowercases the input
Splits on non-alphanumeric characters
Removes duplicates
Filters out stop words (a, an, the, and, ...)
Returns unique tokens

import { tokenize } from "ctrodb"

tokenize("Hello World! The database")
// ["hello", "world", "database"]

Storage strategy

The index lives in a _ctrodb_fts collection alongside your data. Each entry is:

{
  id: "articles:database",
  token: "database",
  collection: "articles",
  docIds: [1, 3, 7]
}

This works with both MemoryAdapter and IndexedDBAdapter.

Query execution

When you call .search("title", "database schema"):

The tokenizer splits the query into tokens: ["database", "schema"]
For each token, the indexer looks up the corresponding entry
Doc IDs that appear in ALL token entry sets are returned
These IDs are passed to the adapter for record retrieval

Performance characteristics

Indexing: O(tokens per record) — happens on every create/update/delete
Search: O(tokens in query) — constant time lookups per token
Storage: O(unique tokens x docs per token) — scales with vocabulary size

When to use it

The FTS plugin works best for:

Searchable text fields (titles, descriptions, body content)
Datasets up to tens of thousands of records
Apps where exact substring matching is sufficient

It does not support fuzzy search, stemming, or relevance scoring. For those, consider a dedicated search service.

The naive approach

Token-based indexing

Tokenizer behavior

Storage strategy

Query execution

Performance characteristics

When to use it

Related posts

Client-Side Full-Text Search with ctrodb

Extending ctrodb with Custom Plugins

Transactions and Data Integrity in ctrodb

Full-Text Search in the Browser

The naive approach

Token-based indexing

Tokenizer behavior

Storage strategy

Query execution

Performance characteristics

When to use it

Related posts

Client-Side Full-Text Search with ctrodb

Extending ctrodb with Custom Plugins

Transactions and Data Integrity in ctrodb