Knowledge Base Ingestion & Content Scanning
How documents enter a knowledge base — server-routed uploads, asynchronous processing, and the per-KB secrets policy (block, redact, or flag).
Both knowledge-base engines (RAG and Memex) ingest content through the same pipeline: files are uploaded through the platform server, scanned for secrets, and processed asynchronously — the upload dialog returns as soon as every file is queued, and per-document progress shows on the knowledge-base page.
upload → server-side secrets scan → background processing → searchableServer-routed uploads
Knowledge-base files are always uploaded via the platform server — never directly from the browser to object storage — so every byte can be scanned before it becomes part of your knowledge base. Uploads are capped at 100 MB per file.
Other file surfaces (chat attachments, avatars) may still upload directly to storage when your storage endpoint is publicly reachable — see Storage Configuration. Knowledge-base content deliberately never does.
The secrets policy (per knowledge base)
Every knowledge base has an ingest secrets policy that controls what happens when uploaded or pasted content contains detectable secrets (API keys, JWTs, email addresses):
| Policy | Behavior |
|---|---|
| Redact (default) | Matches are replaced with ⟨REDACTED:kind⟩ tokens before the content is processed. The document ingests normally; the secret never becomes searchable. |
| Block | Ingestion fails. The document shows a dlp_blocked error with the number and kinds of secrets found, and the uploaded file is deleted from storage (quarantine). Remove the secrets and re-upload. |
| Flag | Content ingests unchanged; the findings are logged for review. |
The policy applies to every ingestion path: file uploads, pasted/inline text, and documents created by workflows.
Asynchronous processing
Processing (parsing, scanning, chunking/embedding for RAG; page derivation for
Memex) runs in durable background workflows. A document moves through
pending → processing → completed (or failed with a reason); progress and
failures are visible per file on the knowledge-base page.
Both engines share one folder/file explorer on the knowledge-base page: files are browsed by folder, and a failed file can be reprocessed directly from its row — for RAG this retries indexing, for Memex it re-runs the ingest pipeline from the originally uploaded file.
For Memex knowledge bases, uploading a file whose content already exists in
the knowledge base ends in a terminal skipped state instead: the duplicate
is detected during background processing (the content hash is computed after
parsing and secret redaction), so the upload itself always returns
immediately and the existing pages are left untouched. Unsupported file
formats and audio files without a configured speech-to-text integration are
still rejected synchronously at upload time.