Agent Skill · MotherDuck
motherduck-load-data
Load data into MotherDuck from local files, object storage, HTTPS, dataframes, or external databases. Use when choosing a MotherDuck-specific ingestion path, especially CTAS and INSERT...SELECT, bulk loading, secrets, and Postgres-endpoint versus DuckDB-client tradeoffs.
Skill body
Load Data into MotherDuck
Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.
Source Of Truth
- Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
- Use
CREATE SECRETand cloud-storage docs for protected-object-store workflows. - Use the DuckDB database upload docs when the source is an existing local
.duckdb,.ddb, or attached DuckDB database. - Keep the loading advice aligned with MotherDuck’s documented posture:
- batch over streaming
- Parquet over CSV when you control the format
- dataframe,
COPY, CTAS, orINSERT ... SELECTover row-by-row inserts - native MotherDuck storage first unless DuckLake is explicitly required
Default Posture
- Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.
- Prefer
CREATE TABLE AS SELECTfor first loads andINSERT INTO ... SELECTfor appends. - For whole local DuckDB databases, use
CREATE OR REPLACE DATABASE remote_name FROM CURRENT_DATABASE(), an attached local database, or a file path from a native DuckDB client after attachingmd:. - Use Parquet for durable bulk movement whenever you control the source format.
- Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
- Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
- Keep raw landing minimally transformed; do typing, deduplication, and business logic in staging or modeling steps.
- Keep source storage close to the MotherDuck region when you control placement.
Workflow
- Identify where the source data actually lives.
- Choose the loading path:
- object storage or HTTPS: remote read into MotherDuck
- local file or local DuckDB: use a DuckDB client path
- in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
- external database: use the appropriate scan or replication path from a DuckDB-capable environment
- Land the data into a raw or staging table with minimal transformation.
- Validate row counts, types, and a few business aggregates immediately after the load.
- Promote into modeled tables only after the landing step is correct.
Open Next
references/INGESTION_PATTERNS.mdfor format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patterns
Related Skills
motherduck-connectfor choosing between the Postgres endpoint and a DuckDB client pathmotherduck-explorefor inspecting destination databases and validating landed tablesmotherduck-queryfor writing CTAS, append, and validation SQLmotherduck-model-datafor promoting landed data into staging and analytics tablesmotherduck-ducklakeonly when object-storage-backed lakehouse storage is an explicit requirement
Skill frontmatter
license: MIT