Docling Synthetic Data Generation

Tools for synthesizing labeled document data from real corpora — useful for fine-tuning layout, table, and reading-order models, and for stress-testing downstream RAG pipelines.

API entry from apis.yml

apis.yml Raw ↑
aid: docling:docling-sdg
name: Docling Synthetic Data Generation
tags:
- Synthetic Data
- Training
- Documents
humanURL: https://github.com/docling-project/docling-sdg
properties:
- url: https://github.com/docling-project/docling-sdg
  type: Documentation
- url: https://github.com/docling-project/docling-sdg
  type: SourceCode
description: Tools for synthesizing labeled document data from real corpora — useful for fine-tuning layout,
  table, and reading-order models, and for stress-testing downstream RAG pipelines.