Problem
Python ML containers can become huge without anyone making an explicit decision. The common PyTorch install pulls CUDA libraries even for CPU-only workloads, which slows installs, bloats images, and can make serverless or autoscaled deploys slower.
Mechanism
The default package path optimizes for broad hardware support. If your workload does not need a GPU, the CUDA baggage is pure deployment weight.
# Before
pip install torch torchvision
# Can pull roughly 2.7 GB of CUDA-related packages
# After
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# CPU-only path, roughly 200 MB in the source noteFix
Install CPU-only PyTorch before the rest of your requirements, and pin versions so a future upstream change does not alter your image unexpectedly.
RUN pip install --no-cache-dir torch==2.6.0+cpu --extra-index-url https://download.pytorch.org/whl/cpu && pip install --no-cache-dir -r requirements.txtCombine that with normal Docker hygiene: slim base images, fewer layers, same-layer cleanup for apt caches, --no-install-recommends, and pip --no-cache-dir.
What changed in practice
| Install path | Approx package weight | Deploy effect |
|---|---|---|
| Default PyTorch | ~2.7 GB CUDA libraries | Large image, slower install |
| CPU-only PyTorch | ~200 MB | Smaller image, faster deploy |
Production lesson
Container size is often dependency selection disguised as infrastructure. If a workload is CPU-only, make that choice explicit in the Dockerfile.