Knowledge Base Ingestion & Content Scanning

How documents enter a knowledge base — server-routed uploads, asynchronous processing, and the per-KB secrets policy (block, redact, or flag).

Both knowledge-base engines (RAG and Memex) ingest content through the same pipeline: files are uploaded through the platform server, scanned for secrets, and processed asynchronously — the upload dialog returns as soon as every file is queued, and per-document progress shows on the knowledge-base page.

upload → server-side secrets scan → background processing → searchable

Server-routed uploads

Knowledge-base files are always uploaded via the platform server — never directly from the browser to object storage — so every byte can be scanned before it becomes part of your knowledge base. Uploads are capped at 100 MB per file.

Other file surfaces (chat attachments, avatars) may still upload directly to storage when your storage endpoint is publicly reachable — see Storage Configuration. Knowledge-base content deliberately never does.

The secrets policy (per knowledge base)

Every knowledge base has an ingest secrets policy that controls what happens when uploaded or pasted content contains detectable secrets (API keys, JWTs, email addresses):

Policy	Behavior
Redact (default)	Matches are replaced with `⟨REDACTED:kind⟩` tokens before the content is processed. The document ingests normally; the secret never becomes searchable.
Block	Ingestion fails. The document shows a `dlp_blocked` error with the number and kinds of secrets found, and the uploaded file is deleted from storage (quarantine). Remove the secrets and re-upload.
Flag	Content ingests unchanged; the findings are logged for review.

The policy applies to every ingestion path: file uploads, pasted/inline text, and documents created by workflows.

Asynchronous processing

Processing (parsing, scanning, chunking/embedding for RAG; page derivation for Memex) runs in durable background workflows. A document moves through pending → processing → completed (or failed with a reason); progress and failures are visible per file on the knowledge-base page.

Both engines share one folder/file explorer on the knowledge-base page: files are browsed by folder, and a failed file can be reprocessed directly from its row — for RAG this retries indexing, for Memex it re-runs the ingest pipeline from the originally uploaded file.

For Memex knowledge bases, uploading a file whose content already exists in the knowledge base ends in a terminal skipped state instead: the duplicate is detected during background processing (the content hash is computed after parsing and secret redaction), so the upload itself always returns immediately and the existing pages are left untouched. Unsupported file formats and audio files without a configured speech-to-text integration are still rejected synchronously at upload time.

Knowledge Base Ingestion & Content Scanning

Server-routed uploads

The secrets policy (per knowledge base)

Asynchronous processing

On this page