- Ingest and add new records to your table;
- Update existing records that match specific conditions;
- Use the powerful Merge Insert function for more complex operations like upserting or replacing ranges of data.
These examples demonstrate common usage patterns. For complete API details and advanced options, refer to our SDK documentation page and navigate to your client language of choice.
Connecting to LanceDB
Before performing any operations, you’ll need to connect to LanceDB. The connection method depends on whether you’re using LanceDB Cloud or the open source version.Data Insertion
Adding data to a table
Say you created a LanceDB table by passing in aschema.
This is an empty table, with no data in it. To add or append data to a table, you can use the table.add(data),
as shown below.
The vector column needs to be a
pyarrow.FixedSizeList type.Using Pydantic Models
Pydantic models provide a more structured way to define your table schema:Using Nested Models
You can use nested Pydantic models to represent complex data structures. For example, you may want to store the document string and the document source name as a nested Document object:document that has two subfields called content and source:
Batch Data Insertion
It is recommended to use iterators to add large datasets in batches when creating your table in one go. Data will be automatically compacted for the best query performance.Python Batch Insertion
Data Modification
Update Operations
This can be used to update zero to all rows depending on how many rows match the where clause. The update queries follow the form of a SQL UPDATE statement. Thewhere parameter is a SQL filter that matches on the metadata columns. The values or values_sql parameters are used to provide the new values for the columns.
| Parameter | Type | Description |
|---|---|---|
where | str | The SQL where clause to use when updating rows. For example, 'x = 2' or 'x IN (1, 2, 3)'. The filter must not be empty, or it will error. |
values | dict | The values to update. The keys are the column names and the values are the values to set. |
values_sql | dict | The values to update. The keys are the column names and the values are the SQL expressions to set. For example, {'x': 'x + 1'} will increment the value of the x column by 1. |
See the SQL queries page for more information on the supported SQL syntax.
Updating Using SQL
Thevalues parameter is used to provide the new values for the columns as literal values. You can also use the values_sql / valuesSql parameter to provide SQL expressions for the new values. For example, you can use values_sql="x + 1" to increment the value of the x column by 1.
Output:
When rows are updated, they are moved out of the index. The row will still show up in ANN queries, but the query will not be as fast as it would be if the row was in the index. If you update a large proportion of rows, consider rebuilding the index afterwards.
Delete Operations
Remove rows that match a condition.Merge Operations
The merge insert command is a flexible API that can be used to performupsert,
insert_if_not_exists, and replace_range_ operations.
Like the create table and add APIs, the merge insert API will automatically compute embeddings if the table has an embedding definition in its schema. If the input data doesn’t contain the source column, or the vector column is already filled, the embeddings won’t be computed.
Upsert
upsert updates rows if they exist and inserts them if they don’t. To do this with merge insert,
enable both when_matched_update_all() and when_not_matched_insert_all().
Setting Up the Example Table and Performing Upsert
Insert-if-not-exists
This will only insert rows that do not have a match in the target table, thus preventing duplicate rows. To do this with merge insert, enable justwhen_not_matched_insert_all().
Setting Up the Example Table and Performing Insert-if-not-exists
Replace Range
You can also replace a range of rows in the target table with the input data. For example, if you have a table of document chunks, where each chunk has both adoc_id and a chunk_id, you can replace all chunks for a given doc_id with updated chunks.
This can be tricky otherwise because if you try to use upsert when the new data has fewer
chunks you will end up with extra chunks. To avoid this, add another clause to delete any chunks
for the document that are not in the new data, with when_not_matched_by_source_delete.