Blog/Ollama
Ollama · Docker · Deployment

Ollama Docker Setup for Local LLM Deployment 2026

PL
Prashant Lalwani
April 16, 2026 · 12 min read
Docker · DevOps · LLM
Ollama Docker Architecture — Local LLM Deployment StackDocker Container: ollama/ollamaOllama ServerPort 11434 exposed~/.ollama:/root/.ollamaPersistent model storageGPU Passthrough--gpus=all (NVIDIA CUDA)Client ApplicationsPythonNext.jsLangChainRESTOpen WebUI (optional)localhost:3000 · ChatGPT-style interfacedocker run -d -p 11434:11434 --gpus=all -v ollama:/root/.ollama ollama/ollama

Deploying Ollama in Docker gives you a portable, reproducible, production-ready local LLM environment. This step-by-step guide covers GPU passthrough, Docker Compose with Open WebUI, health checks, and production hardening.

1 cmd
Deploy with Compose
GPU
Full NVIDIA passthrough
Portable
Same config everywhere

Step 1 — Prerequisites

1

Install Docker Desktop

Download from docker.com. Ensure Docker Compose v2 is included (it is by default in modern versions).

2

NVIDIA: Install Container Toolkit

Required for GPU passthrough inside containers. Skip entirely if using CPU-only.

3

Allocate Docker Resources

In Docker Desktop settings, set at least 16GB RAM and 6+ CPU cores for good performance.

Shell — NVIDIA Container Toolkit (Ubuntu)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-ct.gpg
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Verify GPU in container
docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

Step 2 — Quick Docker Run

Shell — Basic Start
# CPU only
docker run -d -p 11434:11434 -v ollama:/root/.ollama --name ollama ollama/ollama

# With NVIDIA GPU (recommended)
docker run -d -p 11434:11434 --gpus=all \
  -v ollama:/root/.ollama --name ollama ollama/ollama

# Pull a model inside the running container
docker exec -it ollama ollama pull llama3.1

Step 3 — Full Docker Compose Stack

docker-compose.yml
version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    ports: ["11434:11434"]
    volumes: [ollama_data:/root/.ollama]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_NUM_PARALLEL=4
    healthcheck:
      test: [CMD, curl, -f, http://localhost:11434/api/tags]
      interval: 30s
      timeout: 10s
      retries: 3

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["3000:8080"]
    environment: [OLLAMA_BASE_URL=http://ollama:11434]
    depends_on: [ollama]
    restart: unless-stopped

volumes:
  ollama_data:
Shell — Start Stack
docker compose up -d
docker compose exec ollama ollama pull llama3.1
# Open WebUI at http://localhost:3000
docker compose logs -f ollama
Production Security

Add an nginx reverse proxy as a third Compose service to handle HTTPS and authentication. Never expose port 11434 directly on public or shared networks — Ollama has no built-in authentication.