Option 1: Self-Hosted Indexing
Manual, synchronous or asynchronous: When using LanceDB Open Source, you build indexes manually, reindex, and tune parameters yourself. The Python SDK supports both synchronous and asynchronous workflows.Option 2: Automated Indexing
Automatic and async: LanceDB Cloud/Enterprise handles indexing automatically. When a table contains a single vector column namedvector, LanceDB automatically:
- Infers the vector column from the schema
- Creates an optimized
IVF_PQindex with defaultl2distance - Tunes indexing parameters based on data distribution
You can create a new index with different parameters using
create_index (this replaces any existing index). Index building is asynchronous—call wait_for_index() to block until indexing completes.Example: Construct an IVF Index
This example creates an index for a table containing 1,536-dimensional vectors. Ensure the table contains at least a few thousand rows for effective index training.Index Configuration
You can configure the index beyond default parameters:index_typeIVF_PQ: Default index optimized for high-dimensional vectorsIVF_HNSW_SQ: Combines IVF clustering with an HNSW graph
metric: Defaults tol2; you can switch tocosineordotnum_partitions: Number of IVF partitions (choose to target a desired rows-per-partition)num_sub_vectors: Number of subvectors created during PQ (controls recall vs. memory)
1. Setup
Connect to LanceDB and open the table you want to index.2. Construct an IVF Index
Create anIVF_PQ index with cosine similarity. Specify vector_column_name if you use multiple vector columns or non-default names. By default LanceDB uses Product Quantization; switch to IVF_SQ for scalar quantization.
3. Query the IVF Index
Search using a random 1,536-dimensional embedding.Search Configuration
The previous query uses:limit: number of results to returnnprobes: number of IVF partitions to scan (5–10% of partitions often balances recall and latency)refine_factor: reads additional candidates and reranks in memory.to_pandas(): converts the results to a pandas DataFrame
Example: Construct an HNSW Index
Index Configuration
Set the following when constructing an HNSW index:metric: defaults tol2;dotandcosineare also availablem: number of neighbors per vector in the HNSW graphef_construction: number of candidates to evaluate when building the graph
1. Construct an HNSW Index
2. Query the HNSW Index
Example: Construct a Binary Vector Index
Binary vectors are useful for hash-based retrieval, fingerprinting, or any scenario modeled as bits.Index Configuration
- Store binary vectors as fixed-size binary data (uint8 arrays, 8 bits per byte)
- Use
IVF_FLATfor binary vectors - Use the
hammingdistance metric - Ensure the dimension is a multiple of 8 (e.g., a 128-D vector becomes 16 bytes)
1. Create Table and Schema
2. Generate and Add Data
3. Construct the Binary Index
4. Vector Search
Check Index Status
Vector index creation is fast—usually a few minutes for a million 1,536-D vectors. Check status via the UI or APIs. The index name appends_idx to the column name. Use wait_for_index() to block until completion.