Analytics stack
Managed tables, schema inference, classification, and the OLAP warehouse that backs them.
The analytics stack is what makes "drop a CSV, get a typed governed table" work end-to-end. It combines a managed-table service, a column-aware ingest pipeline, a classification + masking layer, and an OLAP query engine.
What it does
- Ingest — upload a CSV / JSON / JSONL via the Analytics UI or a workflow, and the platform infers a schema on first data arrival.
- Govern — every table has a data classification and a policy bundle controlling which columns are masked or denied per user / per workspace.
- Query — read paths are exposed both as typed APIs (used by workflows and the ontology layer) and as raw SQL (authorisation-gated, governance-masked).
- Profile — managed tables carry profile snapshots (row counts, null rates, distincts) that the UI surfaces.
- Notebook — Python notebooks run side-by-side with the warehouse for ad-hoc analysis. See Marimo notebooks.
Components
| Component | Role |
|---|---|
| Analytics service | The UI host and ingest orchestrator. Lives in the scrydon-analytics namespace. |
| Managed-table service | The HTTP API that owns table lifecycle, schema inference, classification, masking, and SQL execution. Service-to-service only — not customer-routable. |
| StarRocks | OLAP engine that stores managed-table rows. Default deployment is a single-pod bundled image; production HA uses the StarRocks operator. |
| SeaweedFS / S3 | Blob storage for staged uploads (CSV/JSON/JSONL) before they're materialised. |
| OPA (Open Policy Agent) | Per-tenant policy decision point. Every read evaluates a tenant-specific Rego bundle. |
| Marimo sidecar | Python notebook runtime, scoped to your workspace and your tables. |
Read paths
Three entrypoints, all going through the same authorisation + masking layer:
- Workflow tools — the
scrydon:tablesproduct exposes typedget-schema,query,write, anddeletetools an Agent can call. - Ontology projection — when the ontology layer reads a typed
Objectwhose binding is a managed table, the projection goes through the same masking pipeline. - Notebooks — Marimo notebooks call the analytics SQL surface; the same Rego policies apply.
No path bypasses governance. Column masking, row filtering, and data classification are evaluated at every read, regardless of caller.
Schema evolution
Schema bootstrap is additive:
- First data arrival creates the table.
- Adding a new column to an upload adds a nullable column.
- Renaming or dropping columns is never automatic — schema changes that aren't additive require an explicit migration through the UI.
This guarantee is binding: production tables never silently narrow.
Agent-created tables
Workflows can create managed tables themselves — useful for an agent that wants to persist a deduplicated entity list, a scoring matrix, or a sample of a long-running run. These tables follow a separate naming convention and always carry a confidential classification, with explicit actor metadata recorded on every row.
See Analytics → Agent-created tables for the full policy.
Related
- Analytics — product-level overview of managed tables.
- Ontology — typed layer on top of managed tables.
- Security → Authorization — how Rego policies are enforced.