ADR-015: CPU-First Docker Packaging¶

Note: Partially superseded by ADR-025. PyTorch and Docling are no longer in the main image — they run in the docling-serve sidecar.

Status: Accepted Date: 2026-04-12 Issue: #140

Context¶

WikiMind uses Docling for structured PDF-to-markdown extraction. Docling depends on PyTorch for its internal layout-detection models, but WikiMind only uses CPU inference — no GPU configuration exists in the codebase.

By default, pip install docling pulls the full CUDA PyTorch distribution, inflating the Docker image from ~1.7 GB to ~9.7 GB, CI builds from ~3 min to ~15 min, and introducing nvidia/CUDA CVEs that do not apply to our deployment.

Decision¶

CPU-only PyTorch by default. The Dockerfile uses ARG TORCH_INDEX=https://download.pytorch.org/whl/cpu and passes --extra-index-url ${TORCH_INDEX} to pip. This installs CPU-only torch wheels, eliminating ~8 GB of CUDA libraries.
GPU opt-in via build arg. Rebuild with --build-arg TORCH_INDEX=https://download.pytorch.org/whl/cu121 when GPU inference is needed (high-volume PDF ingestion, VLM extras, Whisper transcription).
Docling is an optional extra [pdf]. Moved from core dependencies to [project.optional-dependencies] pdf. Users who only need URL/text/YouTube ingestion skip the entire torch stack. A pymupdf (fitz) fallback provides basic PDF text extraction when docling is absent.
Bloat guard prevents regression. A static check (scripts/check_docker_bloat.py) runs in pre-commit and CI to ensure GPU-heavy packages (torch, nvidia, docling, sentence-transformers, etc.) do not re-enter core dependencies.

Consequences¶

Positive¶

Docker prod image: ~9.7 GB → ~1.7 GB (82% reduction)
CI build time: ~15 min → ~3-5 min
Trivy CVE scan passes (no nvidia/CUDA vulnerabilities)
pip install wikimind no longer forces ~4 GB of ML dependencies
GPU path preserved for future use via build arg

Negative¶

Users who want structured PDF extraction must install with pip install "wikimind[pdf]" instead of bare pip install wikimind
Without [pdf], PDF uploads fall back to pymupdf plain-text extraction (no heading hierarchy or layout awareness)

When to switch to GPU¶

High-volume PDF ingestion (>100 PDFs/day) where CPU is a bottleneck
Adding Docling's [vlm] extra for vision-language document understanding
Adding Whisper [transcribe] extra for audio transcription
Any future feature requiring CUDA-accelerated inference