Cerebrium Inference / Run Endpoints API
Calls each deployed function as an authenticated POST endpoint at /v4/{project}/{app}/{function}, billed per second of GPU/CPU compute. Supports synchronous JSON, Server-Sent Events streaming, async submission via async=true, and OpenAI-compatible chat/embedding requests using the standard OpenAI client with a Cerebrium JWT.
Documentation
Documentation
https://www.cerebrium.ai/docs/cerebrium/endpoints/inference-api
APIReference
https://www.cerebrium.ai/docs/cerebrium/endpoints/inference-api