- Iterate Safely: Modify schemas in production with versioned datasets and backward compatibility
- Scale Seamlessly: Handle ML model iterations, regulatory changes, or feature additions
- Optimize Continuously: Remove unused fields or enforce new constraints without downtime
Schema evolution operations
LanceDB supports three primary schema evolution operations:- Adding new columns: Extend your table with additional attributes
- Altering existing columns: Change column names, data types, or nullability
- Dropping columns: Remove unnecessary columns from your schema
version. Run these operations from a mutable table handle; if you checked out an
older version for reads, call checkout_latest / checkoutLatest before modifying the schema.
Add new columns
You can add new columns to a table with theadd_columns
method in Python, addColumns in TypeScript/JavaScript, or add_columns in Rust.
New columns are populated based on SQL expressions you provide.
Set up the example table
First, let’s create a sample table with product data to demonstrate schema evolution:Add derived columns
You can add new columns that are derived from existing data using SQL expressions. For feature engineering on large existing tables, group related derived features into oneadd_columns operation instead of running many separate writes. This creates one
new table version for the schema change and computes the new columns from the existing
rows, which avoids growing the table’s version history with many small updates.
The same call can add multiple derived columns at once. For example, if you are
building several lightweight features from existing product fields, pass all of the
new column expressions together:
LanceDB
add_columns does not currently accept Python callables, batch UDFs, or
PyArrow RecordBatch iterators for populating new columns. New column values must be
defined with SQL expressions, or added as NULL columns from an Arrow field or schema.
If your transformation cannot be expressed in SQL, compute the values outside
add_columns before writing them back through another workflow.Add columns with default values
Add boolean columns with default values for status tracking:Add nullable columns
Add timestamp columns that can contain NULL values:Alter existing columns
You can alter columns using thealter_columns
method in Python, alterColumns in TypeScript/JavaScript, or alter_columns in Rust. This allows you to:
- Rename a column
- Change a column’s data type
- Modify nullability (whether a column can contain NULL values)
Set up the example table
Create a table with a custom schema to demonstrate column alterations:Rename columns
Change column names to better reflect their purpose:Change data types
Convert column data types for better performance or compatibility:Make columns nullable
You can alter columns to contain NULL values: Changing a column to nullable affects future writes and merges too: missing values are accepted only when the target column is nullable.Multiple changes at once
Apply several alterations in a single operation:Expression-based type changes
For transformations that are not simple casts (for example, converting"$100" to an integer), use a SQL-expression column add, then drop and rename:
Alter embedding types and dimensions
It’s quite common to need to change an embedding column’s schema, in case a new model becomes available with a different embedding dimension.- In Python, the example shows an in-place type update when the cast is compatible.
- In TypeScript and Rust, the example shows a dimension change (
384 -> 1024), which cannot be cast in-place.
Drop columns
You can remove columns using thedrop_columns
method in Python, dropColumns in TypeScript/JavaScript, or drop_columns in Rust.