Configuration¶
CORTEX is configured primarily via environment variables. Defaults are defined in backend/src/config.py.
Tip: create a
.envfile inbackend/for local overrides.
Core settings¶
| Variable | Default | Description |
|---|---|---|
VLLM_GEN_URLS |
http://localhost:8001 |
Comma-separated base URLs for generation pool |
VLLM_EMB_URLS |
http://localhost:8002 |
Comma-separated base URLs for embeddings pool |
INTERNAL_VLLM_API_KEY |
`` | Token used by gateway to call private vLLM upstreams |
GATEWAY_DEV_ALLOW_ALL_KEYS |
True |
Dev bypass for API key auth (set to False in prod) |
REQUEST_MAX_BODY_BYTES |
1048576 |
Max request size (bytes); 413 if exceeded |
RATE_LIMIT_ENABLED |
False |
Enable rate limit checks (Redis required) |
RATE_LIMIT_RPS |
10 |
Requests per second allowed per identifier |
RATE_LIMIT_BURST |
20 |
Additional burst per second |
RATE_LIMIT_WINDOW_SEC |
0 |
Sliding window length (0 disables) |
RATE_LIMIT_MAX_REQUESTS |
0 |
Max requests within sliding window |
REDIS_URL |
redis://redis:6379/0 |
Redis connection URL |
CONCURRENCY_LIMIT_ENABLED |
False |
Limit concurrent streams per identifier |
MAX_CONCURRENT_STREAMS_PER_ID |
5 |
Max concurrent streaming requests |
CB_ENABLED |
False |
Enable circuit breaker |
CB_FAILURE_THRESHOLD |
5 |
Failures before opening breaker |
CB_COOLDOWN_SEC |
30 |
Cooldown period after breaker trips |
HEALTH_CHECK_TTL_SEC |
10 |
Health snapshot TTL used in routing |
HEALTH_CHECK_PATH |
/health |
Upstream health path |
HEALTH_POLL_SEC |
15 |
Background health poll cadence |
OTEL_ENABLED |
False |
Enable OpenTelemetry tracing |
OTEL_SERVICE_NAME |
cortex-gateway |
OTel service.name |
OTEL_EXPORTER_OTLP_ENDPOINT |
`` | OTLP HTTP endpoint |
TOKEN_ESTIMATION_ENABLED |
True |
Estimate token counts when upstream doesn’t return usage |
PROMETHEUS_URL |
http://prometheus:9090 |
Prometheus base URL |
CORS_ENABLED |
True |
Enable CORS middleware |
CORS_ALLOW_ORIGINS |
http://localhost:3001 |
Allowed origins (comma-separated or *) |
SECURITY_HEADERS_ENABLED |
True |
Add secure headers on responses |
DATABASE_URL |
postgresql+asyncpg://cortex:cortex@postgres:5432/cortex |
Async SQLAlchemy URL |
ADMIN_BOOTSTRAP_USERNAME |
`` | Optional owner bootstrap username |
ADMIN_BOOTSTRAP_PASSWORD |
`` | Optional owner bootstrap password |
ADMIN_BOOTSTRAP_ORG |
`` | Optional org name on bootstrap |
CORTEX_MODELS_DIR |
/var/cortex/models |
Container-visible models directory |
HF_CACHE_DIR |
/var/cortex/hf-cache |
Container-visible Hugging Face cache |
CORTEX_MODELS_DIR_HOST |
same as CORTEX_MODELS_DIR |
Host path for models (Docker bind) |
HF_CACHE_DIR_HOST |
same as HF_CACHE_DIR |
Host path for HF cache (Docker bind) |
VLLM_IMAGE |
vllm/vllm-openai:latest |
Image used for managed model containers (for offline reproducibility, pin to a tested tag and cache it via make prepare-offline) |
vLLM Container Environment Variables¶
These are automatically set by Cortex when starting vLLM containers based on model configuration:
| Variable | Description |
|---|---|
CUDA_VISIBLE_DEVICES |
GPU selection (set from model's selected_gpus) |
HF_HUB_OFFLINE |
Set to 1 when offline_mode is enabled |
VLLM_USE_V1 |
Set to 1 when V1 engine is enabled |
VLLM_LOGGING_LEVEL |
Set to DEBUG when debug logging is enabled |
VLLM_TRACE_FUNCTION |
Set to 1 when trace mode is enabled |
VLLM_ENGINE_ITERATION_TIMEOUT_S |
Request timeout in seconds (if configured) |
NCCL_TIMEOUT |
Multi-GPU communication timeout (default: 1800) |
NCCL_DEBUG |
Set to WARN for multi-GPU setups |
NCCL_BLOCKING_WAIT |
Set to 1 for blocking NCCL operations |
NCCL_LAUNCH_MODE |
Set to PARALLEL for optimal multi-GPU performance |
Security guidance¶
- In production, set
GATEWAY_DEV_ALLOW_ALL_KEYS=falseand configure API keys. - Restrict
CORS_ALLOW_ORIGINSto the actual frontend origins; avoid*with credentials. - Use strong
INTERNAL_VLLM_API_KEYwhen upstreams are network-reachable.
Compose profiles¶
linuxenables the node-exporter;gpuenables the DCGM exporter and requests GPU access for containers that need it.- Enable per command or via env:
# one-off docker compose -f docker.compose.dev.yaml --profile linux --profile gpu up -d # persistent for the shell export COMPOSE_PROFILES=linux,gpu docker compose -f docker.compose.dev.yaml up -d
CORS notes (dev)¶
- When the UI runs at
http://localhost:3001, ensure the gateway allows that origin:CORS_ALLOW_ORIGINS=http://localhost:3001,http://127.0.0.1:3001 - Preflight check (should return Access-Control-Allow-Origin):
curl -i -X OPTIONS http://localhost:8084/auth/login \ -H 'Origin: http://localhost:3001' \ -H 'Access-Control-Request-Method: POST'