Ingesting Data in LanceDB Tables

Once you have created a table, there are several ways to modify its data. You can:

Ingest and add new records to your table;
Update existing records that match specific conditions;
Use the powerful Merge Insert function for more complex operations like upserting or replacing ranges of data.

These operations allow you to keep your table data current and maintain it exactly as needed for your use case. Let’s look at each of these operations in detail.

These examples demonstrate common usage patterns. For complete API details and advanced options, refer to our SDK documentation page and navigate to your client language of choice.

Connecting to LanceDB

Before performing any operations, you’ll need to connect to LanceDB. The connection method depends on whether you’re using LanceDB Cloud or the open source version.

import lancedb

# Connect to LanceDB Cloud
db = lancedb.connect(
    uri="db://your-project-slug",
    api_key="your-api-key",
    region="us-east-1"
)

You can also connect locally using LanceDB OSS:

import lancedb

# Connect to local LanceDB
db = lancedb.connect("./data")  # Local directory for data storage

Data Insertion

Adding data to a table

Say you created a LanceDB table by passing in a schema. This is an empty table, with no data in it. To add or append data to a table, you can use the table.add(data), as shown below.

The vector column needs to be a pyarrow.FixedSizeList type.

Using Pydantic Models

Pydantic models provide a more structured way to define your table schema:

Using Nested Models

You can use nested Pydantic models to represent complex data structures. For example, you may want to store the document string and the document source name as a nested Document object:

from pydantic import BaseModel

class Document(BaseModel):
    content: str
    source: str

This can be used as the type of a LanceDB table column: This creates a struct column called document that has two subfields called content and source:

In [28]: table.schema
Out[28]:
id: string not null
vector: fixed_size_list<item: float>[128] not null
    child 0, item: float
document: struct<content: string not null, source: string not null> not null
    child 0, content: string not null
    child 1, source: string not null

Batch Data Insertion

It is recommended to use iterators to add large datasets in batches when creating your table in one go. Data will be automatically compacted for the best query performance.

Python Batch Insertion

LanceDB Cloud is a multi-tenant environment with a 100MB payload limit. Adjust your batch size accordingly.

Data Modification

Update Operations

This can be used to update zero to all rows depending on how many rows match the where clause. The update queries follow the form of a SQL UPDATE statement. The where parameter is a SQL filter that matches on the metadata columns. The values or values_sql parameters are used to provide the new values for the columns.

Updating nested columns is not yet supported.

Parameter	Type	Description
`where`	`str`	The SQL where clause to use when updating rows. For example, `'x = 2'` or `'x IN (1, 2, 3)'`. The filter must not be empty, or it will error.
`values`	`dict`	The values to update. The keys are the column names and the values are the values to set.
`values_sql`	`dict`	The values to update. The keys are the column names and the values are the SQL expressions to set. For example, `{'x': 'x + 1'}` will increment the value of the `x` column by 1.

See the SQL queries page for more information on the supported SQL syntax.

Output:

    x  vector
1  [1.0, 2.0]
3  [5.0, 6.0]
2  [10.0, 10.0]

Updating Using SQL

The values parameter is used to provide the new values for the columns as literal values. You can also use the values_sql / valuesSql parameter to provide SQL expressions for the new values. For example, you can use values_sql="x + 1" to increment the value of the x column by 1. Output:

    x  vector
2  [1.0, 2.0]
4  [5.0, 6.0]
3  [10.0, 10.0]

When rows are updated, they are moved out of the index. The row will still show up in ANN queries, but the query will not be as fast as it would be if the row was in the index. If you update a large proportion of rows, consider rebuilding the index afterwards.

Delete Operations

Remove rows that match a condition.

Delete operations soft delete rows. Rows are hard deleted later by compaction and cleanup operations that happen in the background on LanceDB Cloud and Enterprise. The default retention on Cloud is 30 days. During this time, these rows are still accessible to query or restore by accessing old table versions (see Versioning & Reproducibility in LanceDB).

If a table is emptied, its existing indexes are removed. Recreate indexes after ingesting new data.

Merge Operations

The merge insert command is a flexible API that can be used to perform upsert, insert_if_not_exists, and replace_range_ operations.

The merge insert command performs a join between the input data and the target table on the key you provide. This requires scanning that entire column, which can be expensive for large tables. To speed up this operation, create a scalar index on the join column, which will allow LanceDB to find matches without scanning the whole table.Read more about scalar indices in the Scalar Index guide.

You may receive an HTTP 400 error from merge insert: Bad request: Merge insert cannot be performed because the number of unindexed rows exceeds the maximum of 10000. Verify that the scalar index on the join column is up to date before retrying.

Like the create table and add APIs, the merge insert API will automatically compute embeddings if the table has an embedding definition in its schema. If the input data doesn’t contain the source column, or the vector column is already filled, the embeddings won’t be computed.

Upsert

upsert updates rows if they exist and inserts them if they don’t. To do this with merge insert, enable both when_matched_update_all() and when_not_matched_insert_all().

Setting Up the Example Table and Performing Upsert

Insert-if-not-exists

This will only insert rows that do not have a match in the target table, thus preventing duplicate rows. To do this with merge insert, enable just when_not_matched_insert_all().

Setting Up the Example Table and Performing Insert-if-not-exists

Replace Range

You can also replace a range of rows in the target table with the input data. For example, if you have a table of document chunks, where each chunk has both a doc_id and a chunk_id, you can replace all chunks for a given doc_id with updated chunks. This can be tricky otherwise because if you try to use upsert when the new data has fewer chunks you will end up with extra chunks. To avoid this, add another clause to delete any chunks for the document that are not in the new data, with when_not_matched_by_source_delete.

Setting Up the Example Table and Performing Replace Range

We suggest the best batch size to be 500k for optimal performance.

Get started

User guide

API & SDK Reference

Ingesting Data in LanceDB Tables

Connecting to LanceDB

Data Insertion

Adding data to a table

Using Pydantic Models

Using Nested Models

Batch Data Insertion

Python Batch Insertion

Data Modification

Update Operations

Updating Using SQL

Delete Operations

Merge Operations

Upsert

Setting Up the Example Table and Performing Upsert

Insert-if-not-exists

Setting Up the Example Table and Performing Insert-if-not-exists

Replace Range

Setting Up the Example Table and Performing Replace Range

Get started

User guide

API & SDK Reference

​Connecting to LanceDB

​Data Insertion

​Adding data to a table

​Using Pydantic Models

​Using Nested Models

​Batch Data Insertion

​Python Batch Insertion

​Data Modification

​Update Operations

​Updating Using SQL

​Delete Operations

​Merge Operations

​Upsert

​Setting Up the Example Table and Performing Upsert

​Insert-if-not-exists

​Setting Up the Example Table and Performing Insert-if-not-exists

​Replace Range

​Setting Up the Example Table and Performing Replace Range

Connecting to LanceDB

Data Insertion

Adding data to a table

Using Pydantic Models

Using Nested Models

Batch Data Insertion

Python Batch Insertion

Data Modification

Update Operations

Updating Using SQL

Delete Operations

Merge Operations

Upsert

Setting Up the Example Table and Performing Upsert

Insert-if-not-exists

Setting Up the Example Table and Performing Insert-if-not-exists

Replace Range

Setting Up the Example Table and Performing Replace Range