Lamini · Rate Limits

Lamini Rate Limits

Name: Lamini Rate Limits
Creator: Lamini
Keywords: AI, LLM, Fine-Tuning, Memory Tuning, Inference, Rate Limiting, Quotas, Throttling

The Lamini Platform meters inference and tuning usage per account and is governed primarily by available credit / spend on the On-Demand tier and by reserved GPU capacity on Enterprise. Concurrent inference requests and tuning jobs are bounded by account capacity rather than fixed published per-minute request quotas. Specific numeric limits are not publicly documented and are not reconciled in this artifact.

Lamini Rate Limits is the machine-readable rate-limit profile for Lamini on the APIs.io network, conforming to the API Commons Rate Limits specification.

It captures 4 rate-limit definitions, measuring requests, jobs, steps, and usd.

The profile also includes 2 backoff/retry policies defined and response codes documented for throttled.

Tagged areas include AI, LLM, Fine-Tuning, Memory Tuning, and Inference.

4 Limits Throttle: 429

AILLMFine-TuningMemory TuningInferenceRate LimitingQuotasThrottling

Limits

Concurrent Inference Requests account

requests

see provider documentation

Inference concurrency bounded by account capacity and credit/tier.

Concurrent Tuning Jobs account

jobs

see provider documentation

Number of simultaneous tuning jobs bounded by GPU capacity / tier.

Tuning Throughput account

steps

see provider documentation

Burst tuning scales linearly across multiple GPUs/nodes on On-Demand.

Credit / Spend Ceiling account

usd

see provider documentation

On-Demand usage is bounded by available prepaid or free credit.

Policies

Capacity-Based Limits

Throughput scales with On-Demand credit and Enterprise reserved GPU capacity rather than fixed per-minute quotas.

Backoff Strategy

Clients should implement exponential backoff with jitter and honor Retry-After on 429 responses.

Lamini Rate Limits

Limits

Policies

Sources