Importing Existing Data
Bulk-import a pre-existing document estate into an organization knowledgebase domain using the Import wizard.
The Import feature lets organization admins migrate pre-existing data into an org-KB domain without writing any code. There are two kinds of import, and the wizard picks the right one from the file you upload:
| You upload | Best for | How rows are produced | LLM cost |
|---|---|---|---|
| ZIP of documents (PDF, DOCX, MD, TXT, HTML, PPTX, …) | An unstructured document estate | The platform ingests each file and an LLM condenses them into deduplicated learnings | Metered as LLM usage |
| CSV of tabular rows | Structured data you already have in a spreadsheet/export | Each row is written directly to the domain table — you map columns → fields, no LLM | None |
Which should I use? If your data is already tabular (a CRM export, a contract register, an asset inventory), prefer CSV — it is exact, deterministic, free of LLM cost, and re-importing updates rows in place. Use ZIP when the knowledge lives inside documents and needs to be distilled into learnings.
ZIP imports are metered as LLM usage — each condensed batch consumes tokens billed to your organization (Settings → Usage). CSV imports incur no LLM cost.
Example files: download the sample CSV and documents from the Org-KB bulk import example — a contracts.csv for the CSV import, plus two markdown documents you can zip together to try a ZIP import.
What import does
Import does not create one row per document. It runs the same LLM-based extraction that the workspace-KB finalize triggers use:
- The uploaded ZIP is unpacked and every document is ingested into a temporary workspace KB scoped to your organization.
- The ingested documents go through DLP secret scanning — any secrets found are redacted per your organization's DLP policy before the LLM ever sees them.
- The platform runs a batched LLM extraction over the ingested content, producing a content-determined set of discrete learning records — one per durable insight, with duplicates merged.
- Each learning is written to the target domain using the same stable row-identity and upsert semantics as other learnings: re-importing the same content updates existing learnings rather than duplicating them.
- After the import completes, the temporary ingested documents are deleted. The learnings themselves remain in the domain and each one retains path labels that reference the original file paths inside your ZIP — but the source documents are not retained.
Source documents are deleted after a successful import. If you need to retain the originals, keep your own copy before importing.
CSV imports: direct row writes
A CSV import is deterministic and skips the LLM entirely:
- The uploaded CSV is parsed into rows.
- Each row is mapped column → object-type field using the mapping you set in the wizard, and written straight to the domain's table.
- The row id is derived from the identity column(s) you marked, so re-importing the same CSV (or a corrected one) updates matching rows rather than duplicating them.
- Rows whose identity cells are all empty are skipped and reported in the detail panel.
There is no temporary KB, no DLP-then-LLM condensation, and no LLM cost. The uploaded CSV file is deleted when the import finishes.
Before you start
- A domain must already exist in your organization. Domains are created when a pack that declares
knowledgebases[]is installed. If you do not have a domain yet, install a pack first — for example the Org KB Starter pack, which creates thecontracts,learnings, andincidentsdomains used by the sample files below. - You must have the Organization Admin role. Workspace members cannot start imports.
- Your organization must have an LLM integration configured. Import uses the default LLM for the condensation step.
Step-by-step walkthrough
Open the Imports tab
Go to Settings → Organization → Knowledgebase and click the Imports tab. Any previous or in-progress imports for your organization are listed here.
Click Import data… to open the wizard.
Pick a domain and object type
The first step of the wizard shows:
- Domain — a dropdown of all org-KB domains installed in your organization.
- Object type — the learning type that extracted records will be written as (each domain declares one or more object types via its pack manifest).
Select the destination and click Next.
Upload a ZIP or CSV
Drag your file onto the upload area or click to browse. Both .zip and .csv are accepted — the wizard detects which kind you uploaded.
| Limit | Value |
|---|---|
| Maximum file size (ZIP or CSV) | 256 MB |
| Maximum files inside a ZIP | 20,000 |
| Maximum size per individual file | 32 MB |
| Maximum total uncompressed size | 1 GB |
ZIP — supported document formats (anything the knowledge-base ingest pipeline accepts): PDF, Word (.docx), Markdown, plain text, HTML, PPTX, and other common document types. Folder structure inside the ZIP becomes the reference labels on each extracted learning.
CSV — shape: a header row of column names, then one record per row. Quoted fields and embedded commas are handled. The header names drive the next step.
Click Next once the file is selected.
Map columns (CSV only)
When you upload a CSV, the wizard adds a Map columns step. It reads your CSV's header row and the target object type's fields, and shows a row per CSV column:
- Field — choose which object-type field this column writes to (or Ignore to drop the column). Columns whose name matches a field are mapped automatically.
- Identity — tick the column(s) that uniquely identify a row (its natural key). The identity is hashed into a stable row id, so re-importing the same identity updates the existing row instead of creating a duplicate. At least one identity column is required.
For the contracts.csv example against the starter pack's Contract object type you would map counterparty → Counterparty (identity), title → Title, effectiveDate → Effective Date, summary → Summary.
ZIP imports skip this step entirely.
Confirm and run
Review the domain, object type, and file. Click Start import to begin.
The wizard closes and the import appears on the Imports tab. A CSV import goes straight to writing rows; a ZIP import starts at Unpacking.
Monitoring progress
The Imports tab shows live progress for every import:
| Column | What it shows |
|---|---|
| Domain | The target domain |
| Status | Current stage (Unpacking → Ingesting → Condensing → Completed / Failed / Cancelled) |
| Documents | Ingested / total (a failed document counts against the total but does not stop the import) |
| Batches condensed | How many LLM batches have finished |
| Learnings written | Learnings successfully written to the domain |
Click a row to open the detail panel. The detail panel shows a per-document failure drill-down: the file path inside the ZIP and the reason the document could not be ingested (format not supported, file too large, DLP rejection, etc.).
Individual document failures do not stop an import. The platform continues processing the remaining documents and condensing whatever was successfully ingested.
Retry and Cancel
Retry
If an import ends in Failed status (for example, because the LLM was unavailable when condensing), the uploaded documents are kept. Click Retry to resume condensation from where it left off — you do not need to re-upload the ZIP.
Condensation is idempotent: learnings already written on a previous attempt are not duplicated.
Cancel
Click Cancel on any active or failed import to abandon it. Cancellation deletes the uploaded documents and any partially-ingested content. Learnings that were already written to the domain before cancellation remain there.
Only one active import is allowed per domain at a time. If you try to start a second import while one is running for the same domain, you will see: "An import for this domain is already running." Wait for the current import to complete (or cancel it) before starting another.
What happens to your documents
| Stage | Document status |
|---|---|
| Uploading | Stored in temporary object storage under your organization's namespace |
| Unpacking / Ingesting | Documents extracted and ingested into a hidden temporary workspace KB |
| Condensing | LLM extraction runs over ingested content |
| Completed | Temporary documents deleted. Learnings retained in the domain with path labels from the original ZIP. |
| Cancelled | Temporary documents deleted. |
| Failed | Documents kept so Retry can resume without re-uploading. |
Learnings written to the domain keep a references field that includes the original file path from inside the ZIP (e.g. contracts/2024/acme-msa.pdf). The path is informational — it is a label, not a live link. After the import completes, the document itself no longer exists on the platform.
ZIP requirements at a glance
- Must be a
.zipfile (no.tar.gz,.7z, etc.) - No symlinks inside the archive
- No absolute paths (e.g.
/etc/passwd) or path-traversal entries (e.g.../../secret) - Archives that violate these rules are rejected before any content is ingested