Chunkr
Chunkr is an open-source document intelligence platform that turns complex documents (PDF, Office, images) into RAG- and LLM-ready data. The Chunkr Cloud API at api.chunkr.ai performs layout analysis, OCR, segmentation, and chunking, and runs proprietary in-house vision models; the AGPL-3.0 open-source release (lumina-ai-inc/chunkr) can be self-hosted via Docker.
APIs
Chunkr Parse Task API
Creates a parse task that runs layout analysis, OCR, segmentation, and chunking over an uploaded document, returning structured chunks, pages, and segment metadata with configur...
Chunkr Extract Task API
Creates an extract task that pulls schema-driven structured data from a document, returning JSON output with citations and metrics against a caller-supplied extraction schema an...
Chunkr Task Management API
Retrieves, lists, cancels, and deletes parse and extract tasks - get a task by id, get its parse or extract output, list tasks with pagination, cancel a running task, and delete...
Chunkr Files API
Uploads, lists, retrieves, downloads, and deletes files that can be referenced by parse and extract tasks via ch://files/{file_id} references.
Chunkr Health and Extras API
Liveness health check and metadata helpers, including listing all supported file types accepted by the parsing and extraction pipelines.
