OpenAI-compatible API¶
The gateway implements the following endpoints under /v1:
POST /v1/chat/completionsPOST /v1/completionsPOST /v1/embeddings
Authenticate with an API key: Authorization: Bearer <token>.
Chat completions (example)¶
curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
"$GATEWAY/v1/chat/completions" \
-d '{
"model":"meta-llama/Llama-3-8B-Instruct",
"messages":[{"role":"user","content":"Hello!"}],
"stream": false
}'
Streaming is supported with "stream": true.
Token usage¶
- If upstream reports usage, gateway forwards it.
- If not, gateway estimates usage (configurable) based on prompt length and message content.
Scopes¶
chatfor/chat/completions,completionsfor/completions,embeddingsfor/embeddings.
Errors¶
Errors use a consistent envelope:
{
"error": {"code": 401, "message": "invalid_credentials"},
"request_id": "..."
}
request_id when reporting issues.