CORTEX¶
CORTEX is an OpenAI-compatible gateway and admin UI for running vLLM and llama.cpp inference engines on your own infrastructure. It provides secure access control, health‑aware routing, usage metering, and a modern admin interface.
- OpenAI-compatible endpoints:
/v1/chat/completions
,/v1/completions
,/v1/embeddings
- Health checks, circuit breaking, retries, and metrics via Prometheus
- Admin APIs and UI for organizations, users, API keys, models, and usage
- Optional Redis for rate limiting and concurrency caps; optional OpenTelemetry tracing
Get started in minutes:
1) Read the Quickstart (Docker) to run the stack locally. 2) Explore the Health and Keys pages in the admin UI. 3) Call the API via curl or SDKs using your generated API key.
Quick links¶
- Getting Started → Quickstart (Docker)
- API → OpenAI-compatible
- Operations → Deployments
- Contributing → How to Contribute
Screenshots¶
(coming soon) Health dashboard and API Keys management.
License and ownership¶
Copyright © {{CURRENT_YEAR}} Aulendur LLC. Licensed under the terms in LICENSE.txt
. See NOTICE.txt
for attributions.