Database migrations
How schema migrations are applied on Scrydon upgrades — what runs, how to verify, and how to handle failures.
Every Scrydon upgrade may include database migrations. This runbook covers what runs, how to verify, and how to handle a failed migration.
How migrations run
On upgrade, each Scrydon service that owns a schema runs its own migrations on startup. The first pod in each rolling update applies the migration; subsequent pods see the migration as applied and skip.
| Service | Schema owner |
|---|---|
| Platform | Authentication, organisations, workspaces, audit, secrets metadata |
| Agentic | Workflows, automations, knowledge bases, chats |
| Analytics | Managed-table catalogue, profiles, policy bundles |
| Ontology | Ontology schema, bindings, branches |
Migrations are forward-only by design. Down migrations are not provided; if you need to roll back, restore from backup.
What a migration does
Migrations are typically:
- Adding a new column (
ADD COLUMN ... NULL). - Backfilling derived data.
- Adding an index in
CONCURRENTLYmode. - Soft-deprecating a column (renamed, then read from new + old, then read-from-new only, then drop).
Migrations that would block writes for more than a few seconds are split across multiple releases — the platform never ships a single-step lock-the-table migration on a large table.
Verifying a migration ran
# Find the migration version
kubectl exec -it deploy/api-platform -n scrydon-platform -- \
psql "$DATABASE_URL" -c "SELECT version, applied_at FROM platform_migrations ORDER BY applied_at DESC LIMIT 5;"Each service has its own migrations table.
Handling a migration failure
If a pod fails to apply a migration, the pod stays in CrashLoopBackOff. The migration error appears in the pod logs.
Do not kubectl rollout restart blindly. A failing migration that's retried can leave the schema in a partially-applied state on some databases. Investigate the error first.
Recovery procedure:
- Read the failing pod's logs.
- Identify the migration version and the SQL statement that failed.
- Decide between three paths:
- Fix forward: apply a hot-patch to the failing migration. Restart the pod.
- Skip the migration (if you've decided it's safe): mark the migration applied in the migrations table manually and restart.
- Restore from backup: roll back to the pre-upgrade state. Restore PostgreSQL from the snapshot you took before the upgrade.
Always take a PostgreSQL snapshot before starting an upgrade. The upgrade runbook calls this out explicitly.
Long-running backfills
Some upgrades introduce a backfill (populating a new column from existing data). For large datasets, the backfill runs as a separate background job, not blocking the rolling update.
The progress is visible in the audit log as MIGRATION_BACKFILL_* events. The job can be paused, resumed, or restarted from where it left off.
What helm upgrade doesn't touch
helm upgrade orchestrates the rollout but doesn't itself touch the database. The services apply their own migrations on startup. This separation means:
- A Helm rollback doesn't un-apply migrations.
- A failed Helm upgrade where pods refused to start typically left the schema unchanged.
- The migration history is owned by the service, not by Helm.
Related
- Upgrade runbook — full upgrade procedure.
- Backup & restore — the rollback path.