Observability¶
CORTEX exposes Prometheus metrics and optionally OpenTelemetry traces.
Prometheus metrics¶
From backend/src/metrics.py
:
- gateway_requests_total{route,status}
— request counts
- gateway_request_latency_seconds{route}
— request latencies (histogram)
- gateway_upstream_latency_seconds{path}
— upstream call latency
- gateway_upstream_latency_by_upstream_seconds{path,base_url}
— latency per upstream
- gateway_stream_ttft_seconds{path}
— time-to-first-token for streaming
- gateway_upstream_selected_total{path,base_url}
— selection counts
- gateway_key_auth_allowed_total{reason}
/ gateway_key_auth_blocked_total{reason}
— auth decisions
- gateway_upstream_health{base_url}
— health poller status (gauge)
vLLM exporters and node/DCGM exporters can be scraped for GPU and host metrics.
Dashboards¶
- Provide Grafana dashboards for gateway KPIs (latency, errors, selection, TTFT) and system metrics.
Tracing (optional)¶
- Enable OTel via env; spans propagated through FastAPI and httpx when configured.