Schema inference

How Scrydon picks column types from a file on first data arrival — sampling, type rules, overrides.

When you upload a file, Scrydon infers a schema by sampling the data and classifying each column's type. This page documents the rules so you can predict what'll happen with your data — and override the result when the heuristic is wrong.

Sampling

The inference runs on a sample of the file:

CSV / JSONL — first N rows (default 10,000). Configurable per upload.
JSON — the whole document if it's an array; otherwise the top-level keys.

Files larger than the sample are still ingested in full once the schema is committed — the sample is used only to decide the schema.

Type rules

For each column the inference walks a priority list and picks the most-specific type that every sampled value matches:

Priority	Type	Match rule
1	`BOOLEAN`	All values are `true`/`false`/`yes`/`no`/`0`/`1` (case-insensitive)
2	`BIGINT`	All values match `^-?\d+$` and fit in 64 bits
3	`DOUBLE`	All values are valid IEEE-754 doubles
4	`DATE`	All values match a recognised date format (`YYYY-MM-DD`, `DD/MM/YYYY`, …)
5	`DATETIME`	All values match a recognised datetime / ISO 8601 format
6	`JSON`	All values parse as JSON objects or arrays
7	`STRING`	Fallback — anything else, including columns that mix types

Empty / null / NA values are ignored when inferring the type — they don't force a column to STRING. They do mark the column as nullable.

Nullability

A column is nullable if any sampled value is empty, null, NA, or NaN. Otherwise it's marked non-null. You can flip this manually in the schema confirmation step.

Column-name handling

Headers are preserved losslessly. The physical column name is sanitised to satisfy SQL identifier rules; the original header is stored as a display name. See Column names for the full sanitisation table.

Overrides

After inference runs you get a schema confirmation step. From there you can:

Change a column's type (e.g. force a STRING to BIGINT if the sample happened to have one rogue value).
Toggle a column's nullability.
Mark a column as the primary key for upsert support.
Set a column's classification — public / internal / confidential / restricted. See Classification & masking.
Rename a column's display name without changing the physical name.

Overrides apply to subsequent re-uploads of the same file structure.

Re-uploading a file

Re-uploading the same file structure to an existing table is the standard pattern. The platform translates display names → physical names automatically, so you keep the original CSV headers in your source file and never have to rename.

If the new file adds columns, they're appended as nullable. If it removes columns, the existing columns remain. If it renames a column, the renamed column is treated as new — the platform won't guess that "OldName" became "NewName".

When inference is wrong

The most common failure modes:

Symptom	What to check
All numeric column inferred as `STRING`	A single non-numeric value in the sample. Increase the sample size or clean the data.
Date column inferred as `STRING`	The date format isn't in the recognised list. Override the type to `DATE` and provide a format string.
Mixed-currency column inferred as `STRING`	Decimals carry a currency prefix. Either strip the prefix in the source or accept the `STRING` type.
Column you wanted as `BOOLEAN` came in as `BIGINT`	The values are `0` and `1` — `BIGINT` won the priority. Override to `BOOLEAN` in the confirmation step.

Managed tables — lifecycle and write modes.
Column names — header sanitisation.
Classification & masking — per-column governance.

Schema inference

On this page