Skip to content

Quickstart (Docker Compose)

This is the fastest way to run CORTEX locally.

Prerequisites

  • Docker and Docker Compose
  • Optional: NVIDIA GPU + drivers for running vLLM with CUDA

The Makefile provides automatic configuration:

make quick-start
# - Auto-detects your IP
# - Auto-enables monitoring on Linux
# - Creates admin user
# - Shows access URLs

Alternative: Direct Docker Compose

You can also use Docker Compose directly:

# From repo root
docker compose -f docker.compose.dev.yaml up --build

# Cortex will:
# ✓ Auto-detect host IP in the gateway container (fallback)
# ✓ Configure CORS automatically
# ✓ Work from your network

# Note: Monitoring profiles (linux, gpu) need manual enabling:
export COMPOSE_PROFILES=linux,gpu
docker compose -f docker.compose.dev.yaml up -d --build

Recommendation: Use make up instead - it auto-enables monitoring on Linux!

Services exposed: - Admin UI (Next.js): http://localhost:3001 - Gateway (FastAPI): http://localhost:8084 - Prometheus: http://localhost:9090 - PgAdmin: http://localhost:5050 (admin@local / admin)

Health check:

curl http://localhost:8084/health

Network Access (Serving to LAN)

Cortex runs with host network mode for the gateway, making it accessible from any device on your network.

Access URLs

From Admin UI API Gateway
Same machine http://localhost:3001 http://localhost:8084
LAN devices http://<HOST_IP>:3001 http://<HOST_IP>:8084
Docker containers http://host.docker.internal:8084 (with extra_hosts)

Get your host IP:

make info    # Shows detected IP and access URLs

Firewall Configuration (Linux/UFW)

If you have UFW firewall enabled, allow Cortex ports:

# Allow Cortex ports
sudo ufw allow 3001/tcp comment 'Cortex Admin UI'
sudo ufw allow 8084/tcp comment 'Cortex API Gateway'
sudo ufw reload

# Or allow your entire local network
sudo ufw allow from 192.168.0.0/16 comment 'Local network'
sudo ufw reload

# Verify
sudo ufw status

Troubleshoot firewall issues:

# Check for blocked connections
sudo tail -20 /var/log/ufw.log | grep BLOCK

Docker Container Access

For applications running in Docker containers to reach Cortex:

# docker-compose.yaml
services:
  your-app:
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      OPENAI_API_BASE: "http://host.docker.internal:8084/v1"
      OPENAI_API_KEY: "your-cortex-api-key"

On Linux with UFW, also run:

make setup-firewall  # Allows Docker container traffic to host

curl -X POST http://localhost:8084/admin/bootstrap-owner \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","password":"admin","org_name":"Default"}'

Login via UI at http://localhost:3001/login or via API:

curl -X POST http://localhost:8084/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","password":"admin"}' -i
(The response sets a cortex_session httpOnly cookie for dev.)

If the UI reports a CORS error, ensure the gateway allows your UI origin. In docker.compose.dev.yaml, the gateway sets:

CORS_ALLOW_ORIGINS: http://localhost:3001,http://127.0.0.1:3001
Recreate the gateway after edits:

docker compose -f docker.compose.dev.yaml up -d --build gateway

Create an API key

curl -X POST http://localhost:8084/admin/keys \
  -H 'Content-Type: application/json' \
  -d '{"scopes":"chat,completions,embeddings"}'
Copy the returned token immediately; it is shown once.

Make an API call

curl -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  http://localhost:8084/v1/chat/completions \
  -d '{"model":"meta-llama/Llama-3-8B-Instruct","messages":[{"role":"user","content":"Hello!"}]}'

Monitoring

With Makefile (automatic on Linux):

make up                    # Auto-enables monitoring
make monitoring-status     # Check monitoring health

With Docker Compose (manual):

export COMPOSE_PROFILES=linux,gpu
docker compose -f docker.compose.dev.yaml up -d

Verify monitoring:

# Check Prometheus targets
curl http://localhost:9090/targets

# Check GPU metrics
curl http://localhost:8084/admin/system/gpus

# Check host metrics
curl http://localhost:8084/admin/system/summary

All metrics are visible in the Admin UI → System Monitor page, including: - Per-model inference metrics (requests, tokens, latency) - GPU utilization and memory - Host CPU, memory, disk, network

Smoke test script

You can also run scripts/smoke.sh after bringing the stack up.

Stopping

Ctrl+C then docker compose -f docker.compose.dev.yaml down.

Reset to a fresh state (wipe dev databases)

docker compose -f docker.compose.dev.yaml down -v
docker ps -a --filter "name=vllm-model-" -q | xargs -r docker rm -f
docker compose -f docker.compose.dev.yaml up -d --build