Analytics stack

Managed tables, schema inference, classification, and the OLAP warehouse that backs them.

The analytics stack is what makes "drop a CSV, get a typed governed table" work end-to-end. It combines a managed-table service, a column-aware ingest pipeline, a classification + masking layer, and an OLAP query engine.

What it does

Ingest — upload a CSV / JSON / JSONL via the Analytics UI or a workflow, and the platform infers a schema on first data arrival.
Govern — every table has a data classification and a policy bundle controlling which columns are masked or denied per user / per workspace.
Query — read paths are exposed both as typed APIs (used by workflows and the ontology layer) and as raw SQL (authorisation-gated, governance-masked).
Profile — managed tables carry profile snapshots (row counts, null rates, distincts) that the UI surfaces.
Notebook — Python notebooks run side-by-side with the warehouse for ad-hoc analysis. See Marimo notebooks.

Components

Component	Role
Analytics service	The UI host and ingest orchestrator. Lives in the `scrydon-analytics` namespace.
Managed-table service	The HTTP API that owns table lifecycle, schema inference, classification, masking, and SQL execution. Service-to-service only — not customer-routable.
StarRocks	OLAP engine that stores managed-table rows. Default deployment is a single-pod bundled image; production HA uses the StarRocks operator.
SeaweedFS / S3	Blob storage for staged uploads (CSV/JSON/JSONL) before they're materialised.
OPA (Open Policy Agent)	Per-tenant policy decision point. Every read evaluates a tenant-specific Rego bundle.
Marimo sidecar	Python notebook runtime, scoped to your workspace and your tables.

Read paths

Three entrypoints, all going through the same authorisation + masking layer:

Workflow tools — the scrydon:tables product exposes typed get-schema, query, write, and delete tools an Agent can call.
Ontology projection — when the ontology layer reads a typed Object whose binding is a managed table, the projection goes through the same masking pipeline.
Notebooks — Marimo notebooks call the analytics SQL surface; the same Rego policies apply.

No path bypasses governance. Column masking, row filtering, and data classification are evaluated at every read, regardless of caller.

Schema evolution

Schema bootstrap is additive:

First data arrival creates the table.
Adding a new column to an upload adds a nullable column.
Renaming or dropping columns is never automatic — schema changes that aren't additive require an explicit migration through the UI.

This guarantee is binding: production tables never silently narrow.

Agent-created tables

Workflows can create managed tables themselves — useful for an agent that wants to persist a deduplicated entity list, a scoring matrix, or a sample of a long-running run. These tables follow a separate naming convention and always carry a confidential classification, with explicit actor metadata recorded on every row.

See Analytics → Agent-created tables for the full policy.

Analytics — product-level overview of managed tables.
Ontology — typed layer on top of managed tables.
Security → Authorization — how Rego policies are enforced.