Integrating External Applications with Cortex¶
This guide explains how to connect external applications (like MAGE, LangChain, or any OpenAI-compatible client) to Cortex for LLM inference.
Quick Reference¶
| Item | Value |
|---|---|
| API Base URL | http://<HOST_IP>:8084/v1 |
| Authentication | Bearer token (API key) |
| Protocol | HTTP (HTTPS via reverse proxy) |
| Port | 8084 (configurable via GATEWAY_PORT) |
Architecture Overview¶
Cortex gateway runs with host network mode, meaning it binds directly to the host machine's network interfaces. This allows any Docker container or external application to reach Cortex without special network configuration.
┌───────────────────────────────────────────────────────────────────┐
│ Host Machine (e.g., 192.168.1.11) │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Cortex Gateway (host network mode) │ │
│ │ Listening on 0.0.0.0:8084 │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ↑ │
│ ┌────────────────────┼────────────────────┐ │
│ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │ MAGE │ │LangChain│ │ Any │ │
│ │Container│ │ App │ │ Client │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ (Docker) (Docker) (LAN/localhost) │
│ │
└───────────────────────────────────────────────────────────────────┘
Access points:
- Docker containers: http://host.docker.internal:8084 or http://<HOST_IP>:8084
- Same machine: http://localhost:8084 or http://127.0.0.1:8084
- LAN devices: http://<HOST_IP>:8084
First-Time Setup (Linux Only)¶
1. Allow Network Access (Required for LAN access)¶
If UFW firewall is enabled, allow Cortex ports:
# Allow Cortex ports from your network
sudo ufw allow 3001/tcp comment 'Cortex Admin UI'
sudo ufw allow 8084/tcp comment 'Cortex API Gateway'
sudo ufw reload
Or allow your entire local network:
sudo ufw allow from 192.168.0.0/16 comment 'Local network'
sudo ufw reload
2. Allow Docker Container Access (Required for Docker apps)¶
If external applications running in Docker containers need to reach Cortex:
cd /path/to/Cortex
make setup-firewall
This adds a UFW rule to allow traffic from Docker networks (172.16.0.0/12).
Verify Setup¶
# Check UFW rules
sudo ufw status
# Test connectivity from another machine
curl http://<HOST_IP>:8084/health
Without this setup, connections from LAN devices or Docker containers will timeout.
API Endpoints¶
Cortex implements the OpenAI-compatible API specification:
| Endpoint | Method | Description |
|---|---|---|
/v1/models |
GET | List available models |
/v1/chat/completions |
POST | Chat completions (streaming supported) |
/v1/completions |
POST | Text completions |
/v1/embeddings |
POST | Generate embeddings |
/health |
GET | Health check (no auth required) |
Authentication¶
All API requests (except /health) require an API key passed via the Authorization header:
Authorization: Bearer <YOUR_API_KEY>
Creating an API Key¶
- Login to Cortex Admin UI at
http://<HOST_IP>:3001 - Navigate to API Keys page
- Click Create Key
- Copy the generated token (shown only once)
Connecting from Docker Containers¶
Method 1: Using host.docker.internal (Recommended)¶
Add extra_hosts to enable the host.docker.internal hostname:
Docker Compose:
services:
your-app:
image: your-app-image
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
OPENAI_API_BASE: "http://host.docker.internal:8084/v1"
OPENAI_API_KEY: "your-cortex-api-key"
Docker run:
docker run --add-host=host.docker.internal:host-gateway \
-e OPENAI_API_BASE="http://host.docker.internal:8084/v1" \
-e OPENAI_API_KEY="your-cortex-api-key" \
your-app-image
Method 2: Using Host LAN IP¶
If host.docker.internal doesn't work, use the host's LAN IP directly:
# Find your host IP
ip route get 1.1.1.1 | grep -oP 'src \K\S+'
# Example: 192.168.1.11
# Use in your app
OPENAI_API_BASE="http://192.168.1.11:8084/v1"
API Examples¶
List Available Models¶
curl -X GET "http://localhost:8084/v1/models" \
-H "Authorization: Bearer YOUR_API_KEY"
Chat Completions¶
curl -X POST "http://localhost:8084/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nemotron30b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 100,
"temperature": 0.7
}'
Streaming Chat Completions¶
curl -X POST "http://localhost:8084/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nemotron30b",
"messages": [{"role": "user", "content": "Tell me a story"}],
"max_tokens": 500,
"stream": true
}'
Embeddings¶
curl -X POST "http://localhost:8084/v1/embeddings" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "e5-mistral-7b-instruct",
"input": "The quick brown fox jumps over the lazy dog"
}'
Python Integration (OpenAI SDK Compatible)¶
from openai import OpenAI
# Point to Cortex instead of OpenAI
client = OpenAI(
base_url="http://localhost:8084/v1",
api_key="your-cortex-api-key"
)
# List models
models = client.models.list()
print("Available models:", [m.id for m in models.data])
# Chat completion
response = client.chat.completions.create(
model="nemotron30b",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="nemotron30b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
LangChain Integration¶
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8084/v1",
api_key="your-cortex-api-key",
model="nemotron30b",
temperature=0.7
)
response = llm.invoke("What is the capital of France?")
print(response.content)
Troubleshooting¶
Connection Timeout from Docker Container¶
Symptom: Requests from your Docker container to Cortex timeout.
Diagnosis:
# From Cortex directory
make test-external-access
Solutions:
-
Run firewall setup (Linux with UFW):
make setup-firewall -
Ensure
extra_hostsis configured:extra_hosts: - "host.docker.internal:host-gateway" -
Test from your container:
docker exec -it your-container curl http://host.docker.internal:8084/health
401 Unauthorized¶
Cause: Missing or invalid API key.
Solution:
1. Create a new API key in Cortex Admin UI
2. Ensure the header format is exactly: Authorization: Bearer <key>
3. Check key hasn't expired
503 Service Unavailable / No Upstreams¶
Cause: No models are running in Cortex.
Solution: 1. Login to Cortex Admin UI 2. Go to Models page 3. Start a model by clicking the play button
CORS Errors (Browser Only)¶
Note: CORS only affects browser-based requests. Server-to-server requests (like backend services → Cortex) are NOT affected by CORS.
Verification Checklist¶
- Firewall setup completed (
make setup-firewall) - Cortex gateway is running (
docker ps | grep cortex-gateway) - At least one model is running in Cortex
- API key has been created
- Health endpoint responds:
curl http://localhost:8084/health - Models endpoint responds:
curl http://localhost:8084/v1/models -H "Authorization: Bearer KEY"
Environment Variables Reference¶
For applications using standard OpenAI SDK environment variables:
export OPENAI_API_BASE="http://host.docker.internal:8084/v1"
export OPENAI_API_KEY="your-cortex-api-key"