Docling Python Library
The core Docling Python library and `docling` CLI. Parses PDFs, DOCX, PPTX, XLSX, HTML, images (PNG/TIFF/JPEG), audio (WAV/MP3), WebVTT, LaTeX, and plain text into a unified `DoclingDocument` representation that can be exported to Markdown, HTML, lossless JSON, DocTags, and WebVTT. Implements advanced PDF understanding — page layout, reading order, table structure (TableFormer), code and formula recognition, picture classification — plus OCR (EasyOCR, Tesseract, RapidOCR, Mac OCR) and the GraniteDocling visual language model pipeline. Runs locally for air-gapped and sensitive-data use.
Documentation
Documentation
https://docling-project.github.io/docling/
GettingStarted
https://docling-project.github.io/docling/getting_started/quickstart/