Schema and Data Evolution

Schema evolution enables non-breaking modifications to a database table’s structure — such as adding columns, altering data types, or dropping fields — to adapt to evolving data requirements without service interruptions. LanceDB supports ACID-compliant schema evolution through granular operations (add/alter/drop columns), allowing you to:

Iterate Safely: Modify schemas in production with versioned datasets and backward compatibility
Scale Seamlessly: Handle ML model iterations, regulatory changes, or feature additions
Optimize Continuously: Remove unused fields or enforce new constraints without downtime

Schema evolution operations

LanceDB supports three primary schema evolution operations:

Adding new columns: Extend your table with additional attributes
Altering existing columns: Change column names, data types, or nullability
Dropping columns: Remove unnecessary columns from your schema

Schema evolution operations are applied immediately but do not typically require rewriting all data. However, data type changes may involve more substantial operations.

Each schema evolution operation commits a new table version and returns status metadata such as the committed version. Run these operations from a mutable table handle; if you checked out an older version for reads, call checkout_latest / checkoutLatest before modifying the schema.

Add new columns

You can add new columns to a table with the add_columns method in Python, addColumns in TypeScript/JavaScript, or add_columns in Rust. New columns are populated based on SQL expressions you provide.

Set up the example table

First, let’s create a sample table with product data to demonstrate schema evolution:

Add derived columns

You can add new columns that are derived from existing data using SQL expressions. For feature engineering on large existing tables, group related derived features into one add_columns operation instead of running many separate writes. This creates one new table version for the schema change and computes the new columns from the existing rows, which avoids growing the table’s version history with many small updates. The same call can add multiple derived columns at once. For example, if you are building several lightweight features from existing product fields, pass all of the new column expressions together:

LanceDB add_columns does not currently accept Python callables, batch UDFs, or PyArrow RecordBatch iterators for populating new columns. New column values must be defined with SQL expressions, or added as NULL columns from an Arrow field or schema. If your transformation cannot be expressed in SQL, compute the values outside add_columns before writing them back through another workflow.

Add columns with default values

Add boolean columns with default values for status tracking:

Add nullable columns

Add timestamp columns that can contain NULL values:

When adding columns that should contain NULL values, be sure to cast the NULL to the appropriate type, e.g., cast(NULL as timestamp).

Alter existing columns

You can alter columns using the alter_columns method in Python, alterColumns in TypeScript/JavaScript, or alter_columns in Rust. This allows you to:

Rename a column
Change a column’s data type
Modify nullability (whether a column can contain NULL values)

Set up the example table

Create a table with a custom schema to demonstrate column alterations:

Rename columns

Change column names to better reflect their purpose:

Change data types

Convert column data types for better performance or compatibility:

Make columns nullable

You can alter columns to contain NULL values: Changing a column to nullable affects future writes and merges too: missing values are accepted only when the target column is nullable.

Multiple changes at once

Apply several alterations in a single operation:

Expression-based type changes

For transformations that are not simple casts (for example, converting "$100" to an integer), use a SQL-expression column add, then drop and rename:

Alter embedding types and dimensions

It’s quite common to need to change an embedding column’s schema, in case a new model becomes available with a different embedding dimension.

In Python, the example shows an in-place type update when the cast is compatible.
In TypeScript and Rust, the example shows a dimension change (384 -> 1024), which cannot be cast in-place.

For dimension changes, use this 3-step pattern: add a new column with the target type, drop the old column, then rename the new column to the original name.

FixedSizeList Dimension Changes in TypeScript and RustalterColumns / alter_columns can cast between compatible types, but changing FixedSizeList dimensions (for example 384 -> 1024) is not a compatible cast. For such cases, use addColumns / add_columns (with arrow_cast), then dropColumns / drop_columns, then rename the replacement column.

Changing data types requires rewriting the column data and may be resource-intensive for large tables. Renaming columns or changing nullability is more efficient as it only updates metadata.

Drop columns

You can remove columns using the drop_columns method in Python, dropColumns in TypeScript/JavaScript, or drop_columns in Rust.

Set Up the example table

Create a table with temporary columns that we’ll remove:

Drop single columns

Remove individual columns that are no longer needed:

Drop multiple columns

Remove several columns at once for efficiency:

Dropping columns cannot be undone. Make sure you have backups or are certain before removing columns.

​Schema evolution operations

​Add new columns

​Set up the example table

​Add derived columns

​Add columns with default values

​Add nullable columns

​Alter existing columns

​Set up the example table

​Rename columns

​Change data types

​Make columns nullable

​Multiple changes at once

​Expression-based type changes

​Alter embedding types and dimensions

​Drop columns

​Set Up the example table

​Drop single columns

​Drop multiple columns

Schema evolution operations

Add new columns

Set up the example table

Add derived columns

Add columns with default values

Add nullable columns

Alter existing columns

Set up the example table

Rename columns

Change data types

Make columns nullable

Multiple changes at once

Expression-based type changes

Alter embedding types and dimensions

Drop columns

Set Up the example table

Drop single columns

Drop multiple columns