Operations

Day-2 runbooks for operating Scrydon — backup, restore, migrations, license rotation, observability, SIEM forwarding, supply-chain verification, and upgrades.

Day-1 covers getting Scrydon installed. Day-2 covers operating it — backup, restore, migrations, license rotation, observability, and the runbooks your on-call team needs.

Account recovery & re-running setup

Regain admin access after a lost password, and how to re-open the first-run setup wizard.

Backup & restore

What to back up, where the canonical state lives, and how to restore it.

Database migrations

How schema migrations are applied on upgrade.

License rotation

Apply a renewed license without downtime.

Observability

Platform metrics, dashboards, and the SLOs to track.

SIEM forwarding

Wire the audit log into Splunk, Datadog, Elastic, Sumo, or Sentinel.

Supply-chain verification

Verify signed Helm charts, signed images, and the SBOM.

Upgrade runbook

The order of operations for an in-place upgrade.

Day-2 expectations

For a steady-state Scrydon deployment, plan on:

Activity	Cadence
License heartbeat health check	Daily (automated)
Audit log review (focus events)	Weekly
Backup verification (restore-from-yesterday drill)	Monthly
Disaster recovery exercise	Quarterly
Minor version upgrade	Quarterly
Major version upgrade	Annually
Vulnerability scan + patch cycle	Continuous

What requires planned downtime

Most operations are non-disruptive. The exceptions:

Major version upgrades that touch the OLAP warehouse (StarRocks) typically require a brief read-only window while indexes rebuild.
Encryption-strategy changes (LOCAL → BYOK → HYOK) require re-encryption of secrets in place — typically a few minutes for normal-sized vaults.
Replacing PostgreSQL with a managed instance requires a one-time migration window.

Each is covered in the relevant runbook with the expected duration.

Where to look if something's wrong

Check the audit log first — most failures show up there with a structured *_FAILED event.
Check the platform metrics dashboard for the affected subsystem.
Check the relevant subsystem's logs (workflow runtime, analytics, ontology, copilot).
If the issue isn't visible in any of the above, contact Scrydon support with the audit log and dashboard screenshots.