Cerebrium
Cerebrium is a serverless GPU infrastructure platform for real-time AI and ML workloads. Developers package code with the Cortex framework and Cerebrium CLI, then deploy each function as an authenticated REST endpoint on autoscaling GPU/CPU compute billed per second, with streaming, WebSocket, async, and OpenAI-compatible invocation patterns.
APIs
Cerebrium Inference / Run Endpoints API
Calls each deployed function as an authenticated POST endpoint at /v4/{project}/{app}/{function}, billed per second of GPU/CPU compute. Supports synchronous JSON, Server-Sent Ev...
Cerebrium Streaming Endpoints API
Streams live model output over a Server-Sent Events (text/event-stream) response from a Python generator that yields data, invoked on the same /run endpoint with an Accept of te...
Cerebrium Async Requests API
Submits long-running inference asynchronously with the async=true query parameter, returning 202 Accepted with a run_id; results are forwarded to a configured webhookEndpoint ra...
Cerebrium App Deployment / Management API
Packages a Cortex project and deploys each function as a persistent REST endpoint via the Cerebrium CLI (init, login, run, deploy), with apps, deployments, scaling, and configur...
Cerebrium Logs / Status API
Surfaces app logs, metrics, and platform status through the CLI (cerebrium logs, cerebrium status), the app dashboard, and the public status page.