LanceDB Cloud and Enterprise provide performant full-text search based on BM25 so you can incorporate keyword-based search into retrieval solutions.
The create_fts_index API returns immediately, but index building happens asynchronously.
Creating FTS Indexes
Check FTS index status using the API:
Configuration Options
FTS Parameters
| Parameter | Type | Default | Description |
|---|
with_position | bool | False | Store token positions (required for phrase queries) |
base_tokenizer | str | "simple" | Text splitting method (simple, whitespace, or raw) |
language | str | "English" | Language for stemming/stop words |
max_token_length | int | 40 | Maximum token size; longer tokens are omitted |
lower_case | bool | True | Lowercase tokens |
stem | bool | True | Apply stemming (running → run) |
remove_stop_words | bool | True | Drop common stop words |
ascii_folding | bool | True | Normalize accented characters |
max_token_length can filter out base64 blobs or long URLs.
- Disabling
with_position reduces index size but disables phrase queries.
ascii_folding helps with international text (e.g., “café” → “cafe”).
Phrase Query Configuration
Enable phrase queries by setting:
| Parameter | Required Value | Purpose |
|---|
with_position | True | Track token positions for phrase matching |
remove_stop_words | False | Preserve stop words for exact phrase matching |