Hugging Face Deployment and Operations
Unified workflow for deploying, scaling, and operating ML model inference endpoints on dedicated infrastructure. Combines Inference Endpoints management with TGI server monitoring. Used by ML platform engineers and DevOps teams.
What You Can Do
MCP Tools
list-endpoints
List all dedicated inference endpoints for a namespace.
create-endpoint
Create a new dedicated inference endpoint.
get-endpoint
Get details of a specific endpoint.
update-endpoint
Update an existing endpoint configuration.
delete-endpoint
Delete a dedicated inference endpoint.
pause-endpoint
Pause a running endpoint to stop billing.
resume-endpoint
Resume a paused endpoint.
scale-to-zero
Scale an endpoint to zero replicas.
get-endpoint-logs
Get logs for an endpoint.
get-endpoint-metrics
Get metrics for an endpoint.
list-providers
List available cloud providers and hardware options.
tgi-health-check
Check if the TGI server is healthy and responding.
tgi-server-info
Get information about the deployed model and TGI server.
tgi-metrics
Get Prometheus metrics from the TGI server.