ADR-015: CPU-First Docker Packaging¶
Note: Partially superseded by ADR-025. PyTorch and Docling are no longer in the main image — they run in the docling-serve sidecar.
Status: Accepted Date: 2026-04-12 Issue: #140
Context¶
WikiMind uses Docling for structured PDF-to-markdown extraction. Docling depends on PyTorch for its internal layout-detection models, but WikiMind only uses CPU inference — no GPU configuration exists in the codebase.
By default, pip install docling pulls the full CUDA PyTorch distribution, inflating the Docker image from ~1.7 GB to ~9.7 GB, CI builds from ~3 min to ~15 min, and introducing nvidia/CUDA CVEs that do not apply to our deployment.
Decision¶
-
CPU-only PyTorch by default. The Dockerfile uses
ARG TORCH_INDEX=https://download.pytorch.org/whl/cpuand passes--extra-index-url ${TORCH_INDEX}to pip. This installs CPU-only torch wheels, eliminating ~8 GB of CUDA libraries. -
GPU opt-in via build arg. Rebuild with
--build-arg TORCH_INDEX=https://download.pytorch.org/whl/cu121when GPU inference is needed (high-volume PDF ingestion, VLM extras, Whisper transcription). -
Docling is an optional extra
[pdf]. Moved from core dependencies to[project.optional-dependencies] pdf. Users who only need URL/text/YouTube ingestion skip the entire torch stack. Apymupdf(fitz) fallback provides basic PDF text extraction when docling is absent. -
Bloat guard prevents regression. A static check (
scripts/check_docker_bloat.py) runs in pre-commit and CI to ensure GPU-heavy packages (torch, nvidia, docling, sentence-transformers, etc.) do not re-enter core dependencies.
Consequences¶
Positive¶
- Docker prod image: ~9.7 GB → ~1.7 GB (82% reduction)
- CI build time: ~15 min → ~3-5 min
- Trivy CVE scan passes (no nvidia/CUDA vulnerabilities)
pip install wikimindno longer forces ~4 GB of ML dependencies- GPU path preserved for future use via build arg
Negative¶
- Users who want structured PDF extraction must install with
pip install "wikimind[pdf]"instead of barepip install wikimind - Without
[pdf], PDF uploads fall back to pymupdf plain-text extraction (no heading hierarchy or layout awareness)
When to switch to GPU¶
- High-volume PDF ingestion (>100 PDFs/day) where CPU is a bottleneck
- Adding Docling's
[vlm]extra for vision-language document understanding - Adding Whisper
[transcribe]extra for audio transcription - Any future feature requiring CUDA-accelerated inference