Skip to content

CORTEX

Home

CORTEX¶

CORTEX is an OpenAI-compatible gateway and admin UI for running vLLM and llama.cpp inference engines on your own infrastructure. It provides secure access control, health‑aware routing, usage metering, and a modern admin interface.

OpenAI-compatible endpoints: /v1/chat/completions, /v1/completions, /v1/embeddings
Health checks, circuit breaking, retries, and metrics via Prometheus
Admin APIs and UI for organizations, users, API keys, models, and usage
Optional Redis for rate limiting and concurrency caps; optional OpenTelemetry tracing

Get started in minutes:

1) Read the Quickstart (Docker) to run the stack locally. 2) Explore the Health and Keys pages in the admin UI. 3) Call the API via curl or SDKs using your generated API key.

Quick links¶

Getting Started → Quickstart (Docker)
API → OpenAI-compatible
Operations → Deployments
Contributing → How to Contribute

Screenshots¶

(coming soon) Health dashboard and API Keys management.

License and ownership¶

Copyright © {{CURRENT_YEAR}} Aulendur LLC. Licensed under the terms in LICENSE.txt. See NOTICE.txt for attributions.