Updating NVIDIA Drivers for vLLM and llama.cpp¶

Why Update NVIDIA Drivers?¶

Cortex uses Docker containers for running vLLM and llama.cpp inference engines. These containers include CUDA libraries that require compatible NVIDIA drivers on the host system.

Common Issues Without Updated Drivers¶

Container Startup Failures: Containers fail to start with errors like:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.9

GPU Access Denied: Containers cannot access GPUs even though they're available
Performance Degradation: Older drivers may not support newer CUDA features used by vLLM/llama.cpp
Compatibility Issues: vLLM and llama.cpp Docker images are built with specific CUDA versions that require matching driver versions

Understanding CUDA and Driver Compatibility¶

How It Works¶

CUDA Version in Container: vLLM/llama.cpp Docker images are built with specific CUDA versions (e.g., CUDA 12.9)
Host Driver Requirement: The host NVIDIA driver must support the CUDA version used in the container
Backward Compatibility: Newer drivers support older CUDA versions, but older drivers cannot support newer CUDA versions

Current Requirements¶

For CUDA 12.9+ (used by latest vLLM/llama.cpp images): - Linux: NVIDIA driver 575.51.03 or newer - Windows: NVIDIA driver 576.02 or newer

For CUDA 12.8: - Linux: NVIDIA driver 525.60.13 or newer - Windows: NVIDIA driver 528.33 or newer

Note: Always check the specific CUDA version required by your vLLM/llama.cpp Docker images. Newer images may require CUDA 12.9+, while older images may work with CUDA 12.8.

Checking Your Current Driver Version¶

On Linux¶

# Check driver version
nvidia-smi

# Or get just the version number
nvidia-smi --query-gpu=driver_version --format=csv,noheader

# Check maximum CUDA version supported
nvidia-smi --query-gpu=cuda_version --format=csv,noheader

Understanding the Output¶

Driver Version: The installed NVIDIA driver version (e.g., 570.195.03)
CUDA Version: The maximum CUDA version your driver supports (e.g., 12.8)

If your driver supports CUDA 12.8 but the container requires CUDA 12.9+, you need to update.

Updating NVIDIA Drivers on Linux¶

Method 1: Package Manager (Recommended)¶

Ubuntu/Debian¶

# 1. Check available driver versions
ubuntu-drivers list

# 2. Add NVIDIA PPA for latest drivers (optional, for newer versions)
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# 3. Check what versions are available
apt-cache search nvidia-driver | grep "^nvidia-driver-[0-9]"

# 4. Install driver 575 or newer (replace with your preferred version)
sudo apt install nvidia-driver-575

# Or install the "open" version (for open-source kernel modules)
sudo apt install nvidia-driver-575-open

# 5. Reboot system
sudo reboot

RHEL/CentOS/Rocky Linux¶

# 1. Enable EPEL repository
sudo dnf install epel-release

# 2. Install NVIDIA driver (version 575 or newer)
sudo dnf install nvidia-driver-575

# 3. Reboot system
sudo reboot

Fedora¶

# 1. Install NVIDIA driver from RPM Fusion
sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

# 2. Install NVIDIA driver
sudo dnf install nvidia-driver-575

# 3. Reboot system
sudo reboot

Arch Linux¶

# 1. Install NVIDIA driver
sudo pacman -S nvidia

# Or for open-source kernel modules
sudo pacman -S nvidia-open

# 2. Reboot system
sudo reboot

Method 2: Direct Download from NVIDIA¶

If your distribution doesn't have driver 575+ in repositories:

# 1. Download driver from NVIDIA website
# Visit: https://www.nvidia.com/Download/index.aspx
# Select: Your GPU model, Linux 64-bit, Latest version

# 2. Stop services using GPU
sudo systemctl stop docker  # If Docker is running

# 3. Install prerequisites
sudo apt install build-essential dkms  # Ubuntu/Debian
# OR
sudo dnf install gcc kernel-devel kernel-headers dkms  # RHEL/Fedora

# 4. Make installer executable
chmod +x NVIDIA-Linux-x86_64-*.run

# 5. Run installer
sudo ./NVIDIA-Linux-x86_64-*.run

# Follow prompts:
# - Accept license
# - Install 32-bit compatibility libraries? (Yes)
# - Run nvidia-xconfig? (Yes)

# 6. Reboot
sudo reboot

Verifying Driver Installation¶

After rebooting:

# Check driver version (should show 575.x or higher)
nvidia-smi

# Verify CUDA version support (should show 12.9 or higher)
nvidia-smi --query-gpu=cuda_version --format=csv,noheader

# Test GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.9.0-base-ubuntu22.04 nvidia-smi

After Driver Update¶

1. Restart Docker (if using GPU)¶

sudo systemctl restart docker

2. Restart Cortex¶

cd /path/to/Cortex
make restart
# OR
docker compose restart

3. Test Model Startup¶

# Use the test script to verify models can start
python3 scripts/test_offline_models.py <model_id>

Troubleshooting¶

Driver Installation Fails¶

Symptoms: Installation errors, system won't boot

Solutions: 1. Boot into recovery mode: - Hold Shift during boot (Ubuntu/Debian) - Select "Advanced options" → "Recovery mode" - Drop to root shell - Remove problematic driver: apt remove nvidia-driver-* or dnf remove nvidia-driver-* - Reboot normally

Check kernel compatibility:

uname -r
# Ensure kernel version is compatible with driver

Remove conflicting packages:

# Ubuntu/Debian
dpkg -l | grep nvidia
sudo apt remove --purge nvidia-*
sudo apt autoremove

# RHEL/Fedora
rpm -qa | grep nvidia
sudo dnf remove nvidia-*

CUDA Version Still Shows Old Version¶

Symptom: nvidia-smi shows CUDA 12.8 after updating to driver 575+

Explanation: This is normal! nvidia-smi shows the maximum CUDA version your driver supports, not what's installed. Docker containers will use CUDA 12.9 from the container image.

Verification: Test with Docker:

docker run --rm --gpus all nvidia/cuda:12.9.0-base-ubuntu22.04 nvidia-smi

Container Still Fails After Driver Update¶

Check: 1. Driver version: nvidia-smi should show 575.x or higher 2. Docker GPU access: docker run --rm --gpus all nvidia/cuda:12.9.0-base-ubuntu22.04 nvidia-smi 3. Container logs: docker logs <container-name>

Common causes: - Docker not restarted after driver update - NVIDIA Container Toolkit not installed/updated - Driver not properly loaded (check lsmod | grep nvidia)

Rollback to Previous Driver¶

If the new driver causes issues:

# Ubuntu/Debian
sudo apt remove nvidia-driver-575
sudo apt install nvidia-driver-570  # or your previous version
sudo reboot

# RHEL/Fedora
sudo dnf remove nvidia-driver-575
sudo dnf install nvidia-driver-570  # or your previous version
sudo reboot

GPU Compatibility Notes¶

vLLM Requirements¶

Compute Capability: 7.0 or higher (V100, T4, RTX 20xx/30xx/40xx, A100, L4, H100, etc.)
CUDA: 11.8+ (latest images use CUDA 12.9+)
Driver: Must support the CUDA version in the vLLM Docker image

llama.cpp Requirements¶

CUDA Support: Requires CUDA-enabled build (server-cuda image)
Driver: Must support CUDA version in llama.cpp Docker image
GPU Layers: Can run on CPU (ngl=0) but GPU acceleration requires compatible driver

Best Practices¶

Check Before Updating: Always verify current driver version first
Backup Important Data: While driver updates are generally safe, backup critical data
Update During Maintenance: Driver updates require reboot - plan accordingly
Keep Old Installer: Save the old driver installer in case rollback is needed
Test After Update: Always test container startup after driver updates
Monitor Logs: Check container logs if startup fails after driver update

Additional Resources¶

NVIDIA Driver Downloads: https://www.nvidia.com/Download/index.aspx
CUDA Toolkit Archive: https://developer.nvidia.com/cuda-toolkit-archive
CUDA Release Notes: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/
vLLM GPU Installation: https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html
llama.cpp CUDA Support: https://github.com/ggerganov/llama.cpp

Quick Reference¶

CUDA Version	Minimum Driver (Linux)	Minimum Driver (Windows)
12.9+	575.51.03	576.02
12.8	525.60.13	528.33
12.7	525.60.13	528.33
12.6	525.60.13	528.33

Note: Always check the specific requirements for your vLLM/llama.cpp Docker image version. Newer images may require CUDA 12.9+, while older images may work with CUDA 12.8.