Skip to content

Model Container Lifecycle Management

Date: October 5, 2025
Status: Production Implementation


Overview

Cortex manages Docker containers for each model (vLLM and llama.cpp). This guide explains the container lifecycle, automatic cleanup, and troubleshooting orphaned containers.


Container Naming Convention

vLLM Containers

vllm-model-{id}
- Example: vllm-model-2 for model with database ID 2 - Image: vllm/vllm-openai:latest - Network: cortex_default

llama.cpp Containers

llamacpp-model-{id}
- Example: llamacpp-model-4 for model with database ID 4 - Image: ghcr.io/ggml-org/llama.cpp:server-cuda - Network: cortex_default


Automatic Container Lifecycle

On Model Start (Admin UI → Start Button)

Sequence: 1. Admin clicks "Start" on model 2. Pre-start validation (dry-run) checks VRAM, configuration 3. If warnings exist, user confirms to proceed 4. Backend creates/recreates container 5. State set to startingloading 6. Container starts and loads model 7. Health check begins polling (frontend polls readiness) 8. Model registered in gateway registry 9. State updated to running in database 10. Frontend shows success toast: "Model is now running!"

State Transitions:

stopped → starting → loading → running
                          ↓
                       failed (on error)

Container Configuration: - Restart policy: no (manual start only) - Network: cortex_default (service-to-service communication) - Volumes: Models directory mounted read-only - GPU: Allocated via NVIDIA runtime (if ngl > 0)

On Model Stop (Admin UI → Stop Button)

Sequence: 1. Admin clicks "Stop" on model 2. Backend stops container (graceful shutdown) 3. Container removed 4. Model unregistered from gateway registry 5. State updated to "stopped" in database 6. Registry persisted to database

Timeout: - vLLM: 5 seconds - llama.cpp: 10 seconds (larger models need more time)

On Gateway Shutdown (make down / docker compose down)

NEW: Automatic Model Container Cleanup

Sequence: 1. Gateway receives shutdown signal 2. Queries database for all running models 3. Stops each model container 4. Updates all models to "stopped" state 5. Clears container_name and port fields 6. Gateway shuts down

Log Output:

[shutdown] Stopping all managed model containers...
[shutdown] Stopping container for model 4 (huihui-ai 120B)...
[shutdown] Stopped 1 model container(s)

Benefit: Model containers don't persist after Cortex shutdown


Orphaned Container Detection

What is an Orphaned Container?

Definition: A model container that is running but: - Not in the database (deleted model) - Database shows "stopped" but container still running - From previous Cortex instance (before shutdown hook)

Automatic Detection

On make down:

$ make down

Stopping Cortex services...
Note: Model containers will be stopped by gateway shutdown hook
✓ Services stopped

Checking for orphaned model containers...
Found 2 orphaned model container(s)
Run 'make clean-models' to remove them

Detection Logic: - Scans for containers matching vllm-model-* or llamacpp-model-* - Counts running containers - Alerts if any found after gateway shutdown


Manual Cleanup

# Clean up all model containers
make clean-models

What it does: - Runs scripts/cleanup-orphaned-containers.sh - Lists all model containers - Asks for confirmation - Stops and removes all containers - Shows summary

Option 2: Cleanup Script Directly

# Interactive cleanup
bash scripts/cleanup-orphaned-containers.sh

# Output:
# Found model containers:
#   - vllm-model-3
#   - llamacpp-model-4
# Total: 2 container(s)
# 
# Stop and remove all these containers? (yes/no): yes
# 
# Processing vllm-model-3... ✓ Removed
# Processing llamacpp-model-4... ✓ Removed
# 
# Cleanup Complete
# Stopped and removed: 2

Option 3: Docker Commands

# List model containers
docker ps -a --filter "name=vllm-model-" --filter "name=llamacpp-model-"

# Stop all vLLM containers
docker ps -q --filter "name=vllm-model-" | xargs -r docker stop
docker ps -a -q --filter "name=vllm-model-" | xargs -r docker rm

# Stop all llama.cpp containers
docker ps -q --filter "name=llamacpp-model-" | xargs -r docker stop
docker ps -a -q --filter "name=llamacpp-model-" | xargs -r docker rm

Common Scenarios

Scenario 1: Gateway Restart (Normal Operation)

Before Shutdown Hook (Old Behavior):

1. make down
2. Gateway stops
3. Model containers keep running ❌
4. make up
5. Gateway starts
6. Old containers still running (orphaned)
7. New models can't use same ports

After Shutdown Hook (New Behavior):

1. make down
2. Gateway receives shutdown signal
3. Gateway stops all model containers ✅
4. Gateway shuts down
5. make up
6. Gateway starts fresh
7. No orphaned containers ✓

Scenario 2: Gateway Crash (Unexpected)

What Happens: - Gateway crashes without shutdown hook running - Model containers keep running - On restart, containers are orphaned

Recovery:

# Check for orphans
make down  # Will detect orphans

# Clean up
make clean-models

# Restart fresh
make up

Scenario 3: Model Deleted But Container Running

Cause: Model deleted from database but container not stopped

Detection:

# List containers
docker ps --filter "name=vllm-model-" --filter "name=llamacpp-model-"

# Check database
curl -b cookies.txt http://localhost:8084/admin/models | jq '.[] | .id'

# If container ID doesn't match any database ID → orphaned

Fix:

make clean-models


Troubleshooting

"Network cortex_default is in use" Error

Symptom: make down fails with "Resource is still in use"

Cause: Model containers are still attached to the network

Solution:

# Stop model containers first
make clean-models

# Then stop services
make down

Model Containers Restart After System Reboot

Cause: Docker daemon restart policy

Check:

docker inspect vllm-model-3 | jq '.[0].HostConfig.RestartPolicy'

Should show:

{
  "Name": "no",
  "MaximumRetryCount": 0
}

If shows "unless-stopped" or "always": - This is a bug in container creation - Containers will auto-restart - Report as issue

Health Page Shows Model But Container Doesn't Exist

Symptom: Model appears in health page but docker ps doesn't show container

Cause: Stale registry entry (container was removed but registry not updated)

Solution:

# Option 1: Restart the model (will re-create container)
# In UI: Stop → Start

# Option 2: Clear stale registry
# Restart gateway (will reload registry from database)
make restart


Best Practices

For Administrators

1. Always Use make down (Not docker compose down directly):

# Good:
make down  # Runs shutdown hook + detects orphans

# Avoid:
docker compose down  # Bypasses Makefile checks

2. Clean Up Orphans Regularly:

# Weekly maintenance:
make down
make clean-models  # If orphans detected
make up

3. Monitor Model Containers:

# Check running containers:
docker ps --filter "name=model-"

# Should match models shown in UI

For Developers

1. Always Stop Containers on Model Deletion:

@router.delete("/models/{model_id}")
async def delete_model(model_id: int):
    # ALWAYS call stop_container_for_model
    try:
        stop_container_for_model(m)
    except Exception:
        pass
    # Then delete from database

2. Update State on Container Failure:

except Exception as e:
    # Clean up failed container
    try:
        stop_container_for_model(m)
    except:
        pass
    # Update state
    await session.execute(
        update(Model)
        .where(Model.id == model_id)
        .values(state="failed", container_name=None, port=None)
    )

3. Test Shutdown Hook:

# Start a model
# Stop gateway
# Verify container stopped
docker ps --filter "name=model-"  # Should be empty


Makefile Commands

Container Management

# Stop and remove all model containers
make clean-models

# Stop services (will auto-stop models via shutdown hook)
make down

# Stop services and remove all containers
make clean-all

# Check for orphans after shutdown
make down  # Will show count if any found

Monitoring

# List all containers
make ps

# Check model container status
docker ps --filter "name=model-"

# View model logs
docker logs llamacpp-model-4

Implementation Details

Shutdown Hook

Location: backend/src/main.py:243-293

Logic:

@app.on_event("shutdown")
async def on_shutdown():
    # 1. Query database for running models
    result = await session.execute(
        select(Model).where(Model.state == "running")
    )
    running_models = result.scalars().all()

    # 2. Stop each container
    for m in running_models:
        stop_container_for_model(m)

    # 3. Update database (all to stopped)
    await session.execute(
        update(Model)
        .where(Model.state == "running")
        .values(state="stopped", container_name=None, port=None)
    )

    # 4. Continue with normal shutdown
    # (close http_client, redis, database engine)

Triggers: - make down - docker compose down - Ctrl+C on foreground process - SIGTERM signal - Container stop

Limitations: - Only runs on graceful shutdown - If gateway crashes (SIGKILL), hook doesn't run - Use make clean-models to clean up after crashes

Cleanup Script

Location: scripts/cleanup-orphaned-containers.sh

Features: - Scans for all vllm-model-* and llamacpp-model-* containers - Shows list before action - Asks for confirmation - Stops and removes each container - Reports success/failure count - Safe to run anytime


Migration Notes

Upgrading from Previous Versions

If you have orphaned containers from before this feature:

# 1. Check for orphans
docker ps --filter "name=model-"

# 2. Clean them up
make clean-models

# 3. Restart Cortex
make up

# 4. Verify no orphans
make down  # Should show "✓ No orphaned containers"

Testing the Shutdown Hook

# 1. Start a model via UI
# 2. Verify container running:
docker ps --filter "name=model-"

# 3. Stop gateway:
make down

# 4. Check containers (should be stopped):
docker ps --filter "name=model-"  # Should be empty

# 5. Check database:
curl -b cookies.txt http://localhost:8084/admin/models | jq '.[] | {id, state}'
# All should show state="stopped"

Summary

Cortex now automatically manages model container lifecycle:

On Model Start: Container created and registered
On Model Stop: Container stopped and removed
On Gateway Shutdown: All model containers stopped
Orphan Detection: Automatic on make down
Easy Cleanup: make clean-models command

Result: No more orphaned containers! Clean, predictable lifecycle management. 🎉