Migration Patterns Reference

Expand-Contract Pattern

The fundamental pattern for zero-downtime schema changes. Every breaking change goes through three phases: expand (add new), migrate (backfill), contract (remove old).

Phase 1 — Expand

Add the new structure. The old structure still exists and the application still writes to it. Both old and new must be valid at this point.

-- Example: rename column users.full_name to users.display_name

-- Phase 1: add new column, make it nullable (no default required)
ALTER TABLE users ADD COLUMN display_name TEXT;

Phase 2 — Dual-Write (Application Deploy)

Deploy application code that writes to both old and new column. Reads still use the old column.

-- Backfill existing rows: chunk it to avoid long locks
DO $$
DECLARE
  batch_size INT := 10000;
  last_id BIGINT := 0;
  max_id BIGINT;
BEGIN
  SELECT MAX(id) INTO max_id FROM users;
  WHILE last_id < max_id LOOP
    UPDATE users
    SET display_name = full_name
    WHERE id > last_id AND id <= last_id + batch_size
      AND display_name IS NULL;
    last_id := last_id + batch_size;
    PERFORM pg_sleep(0.01);  -- brief pause to reduce lock contention
  END LOOP;
END $$;

Phase 2b — Cutover (Application Deploy)

Deploy application code that reads from the new column. Still writes to both.

Phase 3 — Contract (Application Deploy + Migration)

Deploy application code that reads and writes only to the new column. Then drop the old.

-- Phase 3: safe to drop old column now
ALTER TABLE users DROP COLUMN full_name;

Backward-Compatible Changes (Safe, Single Migration)

These changes do not require expand-contract:

Change	Safe?	Notes
Add nullable column	Yes	No default = no table rewrite
Add column with constant default	Yes (Postgres 11+)	Postgres 11+ stores default in catalog, no rewrite
Add column with dynamic default	No	Causes table rewrite
Add index (CONCURRENTLY)	Yes	Non-blocking
Add constraint (NOT VALID)	Yes	Validates existing rows later
Increase varchar length	Yes	No rewrite
Drop column (mark unused first)	Careful	App must stop referencing it
Rename column	No	Breaking — use expand-contract
Change column type	No	Breaking — use expand-contract
Drop NOT NULL	Yes
Add NOT NULL	No	Requires backfill + constraint

Adding NOT NULL Without Downtime

Adding NOT NULL to an existing column requires all existing rows to have a non-null value. The naive ALTER TABLE ... SET NOT NULL locks the table while it validates every row.

-- Step 1: Add column as nullable, backfill, then add constraint as NOT VALID
ALTER TABLE orders ADD COLUMN confirmed_at TIMESTAMPTZ;

-- Step 2: backfill (chunked)
UPDATE orders SET confirmed_at = created_at WHERE confirmed_at IS NULL;

-- Step 3: Add NOT NULL constraint as NOT VALID (skips existing row validation)
ALTER TABLE orders ADD CONSTRAINT orders_confirmed_at_not_null
  CHECK (confirmed_at IS NOT NULL) NOT VALID;

-- Step 4: Validate constraint in background (ShareUpdateExclusiveLock, non-blocking)
ALTER TABLE orders VALIDATE CONSTRAINT orders_confirmed_at_not_null;

-- Step 5: (Optional) Convert to true NOT NULL — only safe after validation
ALTER TABLE orders ALTER COLUMN confirmed_at SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_confirmed_at_not_null;

Lock Timeout Settings

Always set a lock timeout before DDL statements in production migrations. Without it, a migration waiting for a lock can block all reads and writes on the table indefinitely.

-- Set per-session lock timeout before each DDL statement
SET lock_timeout = '5s';
SET statement_timeout = '30s';

-- The migration will fail fast if it can't acquire a lock in 5 seconds
-- Retry later when the blocking query completes
ALTER TABLE orders ADD COLUMN status TEXT;

Retry strategy: If a migration fails due to lock timeout, wait for the blocking query (visible in pg_stat_activity) to complete and retry. Do not increase the timeout to avoid blocking the table longer.

Index Migration Safety

-- Always use CONCURRENTLY for new indexes on live tables
-- CONCURRENTLY takes 2-3x longer but does not block reads or writes
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);

-- If CONCURRENTLY fails partway through, the index is left as INVALID
-- Find invalid indexes:
SELECT indexname, tablename FROM pg_indexes
JOIN pg_class ON pg_class.relname = indexname
WHERE pg_class.relkind = 'i'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index WHERE pg_index.indexrelid = pg_class.oid
      AND pg_index.indisvalid
  );

-- Drop the invalid index and recreate
DROP INDEX CONCURRENTLY idx_orders_status;
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);

Migration File Organization

migrations/
├── 0001_create_users.sql
├── 0002_create_orders.sql
├── 0003_add_status_to_orders.sql      # expand phase
├── 0004_backfill_order_status.sql     # data migration
└── 0005_drop_old_status_column.sql    # contract phase (deploy after app)

Convention: Separate structural changes (DDL) from data migrations (DML). Run structural migrations at deploy time. Run data migrations as background jobs or chunked scripts when the table is large.

Rollback Strategies

Not all migrations are trivially reversible. Plan rollback before executing forward.

Migration Type	Rollback Strategy
Add column	`ALTER TABLE DROP COLUMN`
Add index	`DROP INDEX CONCURRENTLY`
Add constraint NOT VALID	`DROP CONSTRAINT`
Rename column (expand phase)	Drop new column — old still exists
Drop column	Restore from backup — plan before executing
Data backfill	Re-run inverse transformation on affected rows

Rule: Never drop a column in the same deploy window as the application code change that stops using it. Wait one deploy cycle to confirm the old column is unused, then drop it in a follow-up migration. This gives you a clean rollback path if the deploy goes wrong.

Testing Migrations

# Test migration against a clone of production data
pg_dump $PROD_URL | psql $TEST_URL
psql $TEST_URL -f migrations/0042_new_change.sql

# Time the migration — if it takes > 1s, it may cause issues in production
time psql $TEST_URL -f migrations/0042_new_change.sql

# Check for lock contention by running under load
# Use pgbench to simulate concurrent queries while migration runs
pgbench -T 30 -c 50 $TEST_URL &
psql $TEST_URL -f migrations/0042_new_change.sql

Expand-Contract Pattern​

Phase 1 — Expand​

Phase 2 — Dual-Write (Application Deploy)​

Phase 2b — Cutover (Application Deploy)​

Phase 3 — Contract (Application Deploy + Migration)​

Backward-Compatible Changes (Safe, Single Migration)​

Adding NOT NULL Without Downtime​

Lock Timeout Settings​

Index Migration Safety​

Migration File Organization​

Rollback Strategies​

Testing Migrations​