Data Sources
Author declarative poll data sources and ship them inside a Scrydon Pack — no runtime code required
This artifact ships inside a Pack. For the shared lifecycle — install, pack build, upload — see Packs & Authoring SDK.
A Declarative Data Source is a poll-mode table source encoded entirely as JSON-serializable configuration — an HTTP request spec, an itemsPath selector, a field-mapping table, and a column list. You author it once with defineDataSource; the platform's generic poll runtime drives it on every tick without executing any customer-supplied code.
Data sources shipped in a Pack are pure JSON — the archive carries the request spec, mapping, and column definitions, never code. The generic poll runtime (fetch → select → map → validate) lives in the platform and is invoked from the manifest on each tick.
When to use this SDK
Use the Data Source Authoring SDK when:
- You want to pull rows from a public REST API on a schedule and make them available as a typed table inside Scrydon.
- You need the source definition to live in a Pack so re-uploading the pack also refreshes the source (same idempotency contract as ontologies and process flows).
- You are building a demo or starter Pack and want the data tables to come pre-wired with no manual configuration.
Do not use this SDK if:
- Your ingest logic is conditional, stateful, or requires code (pagination cursors, OAuth token refresh, custom payload signing). For those, author a code-shipped source in the platform monorepo — it uses the same
defineDataSourcecall but with aproduce()function. - The destination is not a table. Non-tabular ingestion paths are out of scope for this SDK today.
Key constraint: packs carry pure JSON — never functions. The bundler strips any produce() functions during serialization. A packable source must therefore be fully declarative — every field in the source definition must survive JSON.parse(JSON.stringify(...)) round-trip intact.
Install
bun add -d @scrydon/sdk-authoring zodnpm install --save-dev @scrydon/sdk-authoring zodimport { defineDataSource } from '@scrydon/sdk-authoring/integrations'Anatomy of a declarative data source
A declarative data source is made up of five blocks:
| Block | Field(s) | Purpose |
|---|---|---|
request | url, method, headers, query, authRef | Describes the HTTP call the runtime makes on each tick. url must be https://. Credentials are referenced by authRef (a credential connection id) — never inline. |
response.itemsPath | itemsPath | A dot/bracket path (e.g. $.ac, data.items) from the JSON response envelope to the array of row objects. Leading $. is optional. |
mapping | Record<columnName, FieldMapping> | Maps each output column from a source field, optionally applying one of the four bounded transforms. |
filter | requireNonNull | Drops candidate rows before mapping if any of the listed source field paths is null or undefined. |
table.columns | TableColumnDef[] | Declares the output schema. Column names and types drive the row validator — no separate Zod table.schema is needed for declarative sources. |
The mapping DSL
Each mapping entry either copies a path directly or applies a named transform. The transform set is bounded and auditable — adding a new transform requires a reviewed code change to the platform, not pack data. Arbitrary expressions, eval, and sandboxed code are intentionally not supported.
| Transform | What it does | Example args |
|---|---|---|
trim_to_null | Trims leading/trailing whitespace; coerces empty string or non-string to null. "RCH123 " → "RCH123", " " → null, undefined → null. | (none required) |
number_or_null | Passes finite numbers unchanged; coerces strings, NaN, Infinity, and undefined to null. | (none required) |
value_map | Maps literal string keys via a map dictionary; passthrough: "number" passes numeric values unchanged; anything else falls back to default. | { map: { ground: 0 }, passthrough: "number", default: null } |
iso_from_epoch_offset | Combines an envelope-level base timestamp (basePath resolves against the response root) with a per-row signed offset, and returns an ISO 8601 string. Useful for APIs that report an absolute now clock plus per-aircraft seen seconds-ago values. | { basePath: "$.now", baseUnit: "ms", offsetUnit: "s", direction: "subtract" } |
The iso_from_epoch_offset hard case
Some REST APIs — the ADS-B family is the canonical example — do not return per-row absolute timestamps. Instead, the envelope carries a single now field (epoch milliseconds) and each row carries a seen field (seconds ago). The iso_from_epoch_offset transform bridges the two:
seenAt = new Date(envelope.now - row.seen * 1000).toISOString()Configure it as:
seenAt: {
path: "seen", // per-row offset field
transform: "iso_from_epoch_offset",
args: {
basePath: "$.now", // envelope field — resolved against the response root
baseUnit: "ms", // $.now is epoch milliseconds
offsetUnit: "s", // row.seen is seconds
direction: "subtract", // now − seen → absolute time
},
},If row.seen is missing or non-numeric, the runtime defaults the offset to 0 (base time exactly), matching the code-source convention seenSecondsAgo = typeof raw.seen === "number" ? raw.seen : 0.
A complete example
The following is the real adsb-lol-military-declarative source shipped in @scrydon/sdk-authoring. It pulls military aircraft positions from https://api.adsb.lol/v2/mil and maps them to a typed aircraft_position table. The golden parity test in the monorepo asserts this produces byte-for-byte identical rows to the equivalent code source.
import { defineDataSource } from '@scrydon/sdk-authoring/integrations'
export const adsbLolMilitaryDeclarative = defineDataSource({
kind: "table",
id: "adsb-lol-military-declarative",
vendor: "adsb-lol",
displayName: "ADS-B Lol — Military Aircraft (declarative)",
scope: "global",
table: {
name: "aircraft_position",
primaryKey: ["icao24", "seenAt"],
timestampColumn: "seenAt",
columns: [
{ name: "icao24", dataType: "string", isPrimaryKey: true },
{ name: "callsign", dataType: "string", nullable: true },
{ name: "registration", dataType: "string", nullable: true },
{ name: "aircraftType", dataType: "string", nullable: true },
{ name: "category", dataType: "string", nullable: true },
{ name: "latitude", dataType: "decimal" },
{ name: "longitude", dataType: "decimal" },
{ name: "altitudeFeet", dataType: "int", nullable: true },
{ name: "groundSpeedKnots", dataType: "double", nullable: true },
{ name: "heading", dataType: "double", nullable: true },
{ name: "seenAt", dataType: "timestamp", isPrimaryKey: true },
],
},
ingest: {
mode: "poll",
intervalSec: 60,
minIntervalSec: 30,
request: {
url: "https://api.adsb.lol/v2/mil",
method: "GET",
headers: { accept: "application/json" },
},
response: { itemsPath: "$.ac" },
filter: {
// Drop rows missing hex (ICAO), lat, or lon — mirrors the code source guard.
requireNonNull: ["hex", "lat", "lon"],
},
mapping: {
icao24: { path: "hex" },
// "RCH123 " → "RCH123", " " → null. Mirrors raw.flight?.trim() || null
callsign: { path: "flight", transform: "trim_to_null" },
// undefined → null. Mirrors raw.r ?? null
registration: { path: "r" },
// undefined → null. Mirrors raw.t ?? null
aircraftType: { path: "t" },
// undefined → null. Mirrors raw.category ?? null
category: { path: "category" },
latitude: { path: "lat" },
longitude: { path: "lon" },
// "ground" → 0, number → passthrough, else → null. Mirrors parseAltitude()
altitudeFeet: {
path: "alt_baro",
transform: "value_map",
args: { map: { ground: 0 }, passthrough: "number", default: null },
},
// finite number → itself, string/NaN/undefined → null
groundSpeedKnots: { path: "gs", transform: "number_or_null" },
heading: { path: "track", transform: "number_or_null" },
// new Date(envelope.now - row.seen * 1000).toISOString()
seenAt: {
path: "seen",
transform: "iso_from_epoch_offset",
args: {
basePath: "$.now",
baseUnit: "ms",
offsetUnit: "s",
direction: "subtract",
},
},
},
},
})defineDataSource is an identity function at runtime — it validates the manifest via the DataSourceManifestSchema Zod schema and derives a row validator from table.columns. The emitted definition is pure data: no produce() function, no closures, serializable as-is into data-source-<slug>/manifest.json.
Write modes
Every tick produces a batch of rows; ingest.writeMode controls how that batch lands in the table. It is install-only — it provisions the StarRocks key model at table-create time and cannot be changed afterwards (changing it requires re-installing the pack with an updated manifest). The default is upsert.
writeMode | Dedup key | Rows kept per entity | Use it for |
|---|---|---|---|
upsert (default) | table.primaryKey (identity) | 1 — latest only | "Current state" feeds. Each poll overwrites the entity's row in place; no history. |
changed-only | primaryKey + a synthesized content hash | N — one per distinct state | Slowly-changing data where you want history but not a row per poll. Identical consecutive polls collapse onto the same key; a row is written only when a value actually changes. |
append | none | every poll | Raw event streams where every observation matters, including exact repeats. |
replace | primaryKey | latest full snapshot | Small reference tables re-published wholesale each tick (truncate-then-load). |
changed-only vs upsert. Both upsert under the hood, but they dedup on different keys. upsert keeps one always-current row per entity (no history). changed-only keeps one row per distinct state the entity passed through, skipping polls where nothing changed. For continuously-changing telemetry (e.g. an aircraft's live position, where latitude/longitude change every poll) changed-only behaves almost identically to append — the content hash differs every tick. There, the growth lever is retention, not the write mode.
ingest: {
mode: "poll",
intervalSec: 60,
writeMode: "changed-only", // omit for the "upsert" default
// ...request, response, mapping
}Retention
For time-series tables that grow continuously, declare table.ttlSec alongside table.timestampColumn. The platform partitions the StarRocks table by day on the timestamp column and keeps only the most recent ceil(ttlSec / 86400) daily partitions — older partitions are dropped automatically.
table: {
name: "aircraft_position",
primaryKey: ["icao24", "seenAt"],
timestampColumn: "seenAt", // the column partitions are cut on
ttlSec: 604800, // keep ~7 days of partitions
columns: [ /* ... */ ],
}ttlSec is the authoring default. An organization admin can override the effective retention per source from Settings → Platform → Data Sources (open a source's detail panel and edit "Retention") — the override is applied live, with no table rebuild. The detail panel also shows the source's write mode (read-only) and its recent sync history, and links straight to the backing table.
The detail panel's Danger zone also has Remove data source: it drops the backing table and all of its rows, deletes the source, and — when the originating pack contributed nothing else — removes the pack too. If the pack ships other content, the source is removed but the pack is kept (uninstall it from Settings → Packs). Removal is irreversible; re-installing the pack brings the source back.
Retention bounds table growth for any write mode, and it is the right tool when changed-only can't help (continuously-changing telemetry). It is independent of the write mode: you can combine append or changed-only with ttlSec to keep history but cap it to a rolling window.
Bundle layout
A Pack that ships a data source adds one data-source-<slug>/ subdir per source, each containing a single manifest.json. Multiple data sources may coexist in the same Pack — data-source is a repeatable content kind, like workflow.
<bundle>.scrydon-pack.tar.gz
├── pack.json # PackBundleManifestSchema
└── data-source-adsb-lol-military/
└── manifest.json # DataSourceManifestSchema (pure JSON, no code)The top-level pack.json lists the data source as a contents[] entry and includes "data-source" in installOrder:
import { defineScrydonPack } from '@scrydon/sdk-authoring/packs'
import { adsbLolMilitaryDeclarative } from './data-source-adsb-lol-military'
export default defineScrydonPack({
manifestVersion: 1,
package: {
id: 'adsb-lol-military',
name: 'ADS-B Lol Military Aircraft',
version: '1.0.0',
},
contents: [
{
kind: 'data-source',
path: 'data-source-adsb-lol-military',
version: '1.0.0',
required: true,
},
],
installOrder: ['data-source'],
metadata: { isSystemPack: false, isDemoPack: false, tags: ['adsb', 'aircraft'] },
})Build, inspect, upload
Author each data source with defineDataSource. Compose the Pack with defineScrydonPack and add a contents[] entry per source with kind: "data-source".
bunx @scrydon/sdk-authoring pack build src/pack.ts --outDir dist
# → dist/<package.id>-<package.version>.scrydon-pack.tar.gzbunx @scrydon/sdk-authoring pack inspect dist/adsb-lol-military-1.0.0.scrydon-pack.tar.gzThe inspector lists every subdir, validates each manifest against DataSourceManifestSchema, and surfaces any mapping or column errors before upload.
Data source packs upload to Settings → Packs in the platform app. Uploading admits the pack to the org's pack catalog. No data_source row is created yet — data sources defer to Stage 2 so the workspace user can pick which environment they materialize into.
Programmatic equivalent:
curl -X POST "$AGENTIC_URL/api/packs/import?organizationId=$ORG_ID" \
-H "Cookie: $SESSION_COOKIE" \
-F "file=@dist/adsb-lol-military-1.0.0.scrydon-pack.tar.gz"The route returns the catalog entry id; dataSources.installedIds is empty at this stage by design.
In apps/analytics → Marketplace, while in a workspace + environment, pick the data source from a catalogued pack and click Install in this environment. The platform materializes the data_source row scoped to your env and starts polling on the configured intervalSec cadence — typically within about a minute. Operate the source from Analytics → Data Sources.
Data sources that need a credential
Some APIs require an account or API key. For those, use the request.authRef field — a named reference to a credential connection configured in your org's settings. The manifest never carries an inline secret; authRef is a pointer, not a value. (The DataSourceManifestSchema rejects any attempt to embed an Authorization header or API key directly in the manifest.)
ingest: {
mode: "poll",
intervalSec: 300,
request: {
url: "https://api.example.com/v1/records",
method: "GET",
authRef: "example-api-key", // named reference — NOT an inline token
},
// ...
}How the connection flows:
- You (or your org admin) connect the account in org settings, creating an enabled credential connection named
"example-api-key". - On each tick, the platform resolves that connection server-side and attaches the credential to the outbound request — your manifest never sees the secret.
- If no enabled connection exists yet, the tick returns HTTP 412
data_source_connection_required— a clear signal to connect the account first. No data is fetched, and no error is silently swallowed. - Once a connection is enabled, subsequent ticks resolve it automatically and run.
No authRef? A source with no authRef (public API, no authentication needed) ticks immediately — no credential setup required.
The value you set for authRef is the connection name your org admin creates in settings. Coordinate the name between the pack author and the org admin — if the names don't match, ticks return 412 until the connection is created with the expected name.
Security
Egress guard: the generic poll runtime enforces an SSRF egress guard before every fetch. Request URLs must use https:// — plain http:// is rejected. Requests to loopback addresses (127.x.x.x, ::1), link-local addresses (169.254.x.x), private ranges (10.x, 172.16–31.x, 192.168.x), and IPv6 ULA prefixes (fd00::/8) are blocked at the URL-shape level before a connection is opened.
Note: the v1 guard is host/scheme-shape based. A hostname that DNS-resolves to a private IP at fetch time is not blocked by the v1 guard — resolved-IP pinning is planned for a future release.
Credentials by reference only: the request.authRef field accepts a credential connection id (a pointer to a Tier-1 connection stored in the platform's secret store). Never embed an API key, Bearer token, or Authorization header value directly in the manifest. The DataSourceManifestSchema uses .strict() on the request block and rejects any field beyond url, method, headers, query, and authRef — stray fields like apiKey or secret cause a validation error at build time.
Resource limits enforced per-tick:
| Limit | Default |
|---|---|
| Response body | 8 MB |
| Row count | 10,000 rows |
| Fetch timeout | 15 seconds |
Where to next
Process Flows
Ship data sources alongside process templates — the platform can drive them from the same Pack.
Ontologies
Define the typed Object Types and Link Types that reference the data your sources ingest.
Workflows
HITL gates, approval routes, and automations that can act on data sourced from declarative poll sources.