Scrydon
Architecture

Analytics stack

Managed tables, schema inference, classification, and the OLAP warehouse that backs them.

The analytics stack is what makes "drop a CSV, get a typed governed table" work end-to-end. It combines a managed-table service, a column-aware ingest pipeline, a classification + masking layer, and an OLAP query engine.

What it does

  • Ingest — upload a CSV / JSON / JSONL via the Analytics UI or a workflow, and the platform infers a schema on first data arrival.
  • Govern — every table has a data classification and a policy bundle controlling which columns are masked or denied per user / per workspace.
  • Query — read paths are exposed both as typed APIs (used by workflows and the ontology layer) and as raw SQL (authorisation-gated, governance-masked).
  • Profile — managed tables carry profile snapshots (row counts, null rates, distincts) that the UI surfaces.
  • Notebook — Python notebooks run side-by-side with the warehouse for ad-hoc analysis. See Marimo notebooks.

Components

ComponentRole
Analytics serviceThe UI host and ingest orchestrator. Lives in the scrydon-analytics namespace.
Managed-table serviceThe HTTP API that owns table lifecycle, schema inference, classification, masking, and SQL execution. Service-to-service only — not customer-routable.
StarRocksOLAP engine that stores managed-table rows. Default deployment is a single-pod bundled image; production HA uses the StarRocks operator.
SeaweedFS / S3Blob storage for staged uploads (CSV/JSON/JSONL) before they're materialised.
OPA (Open Policy Agent)Per-tenant policy decision point. Every read evaluates a tenant-specific Rego bundle.
Marimo sidecarPython notebook runtime, scoped to your workspace and your tables.

Read paths

Three entrypoints, all going through the same authorisation + masking layer:

  1. Workflow tools — the scrydon:tables product exposes typed get-schema, query, write, and delete tools an Agent can call.
  2. Ontology projection — when the ontology layer reads a typed Object whose binding is a managed table, the projection goes through the same masking pipeline.
  3. Notebooks — Marimo notebooks call the analytics SQL surface; the same Rego policies apply.

No path bypasses governance. Column masking, row filtering, and data classification are evaluated at every read, regardless of caller.

Schema evolution

Schema bootstrap is additive:

  • First data arrival creates the table.
  • Adding a new column to an upload adds a nullable column.
  • Renaming or dropping columns is never automatic — schema changes that aren't additive require an explicit migration through the UI.

This guarantee is binding: production tables never silently narrow.

Agent-created tables

Workflows can create managed tables themselves — useful for an agent that wants to persist a deduplicated entity list, a scoring matrix, or a sample of a long-running run. These tables follow a separate naming convention and always carry a confidential classification, with explicit actor metadata recorded on every row.

See Analytics → Agent-created tables for the full policy.

On this page

On this page