Agent Skill · MotherDuck

motherduck-load-data

Load data into MotherDuck from local files, object storage, HTTPS, dataframes, or external databases. Use when choosing a MotherDuck-specific ingestion path, especially CTAS and INSERT...SELECT, bulk loading, secrets, and Postgres-endpoint versus DuckDB-client tradeoffs.

View SKILL.md on GitHub → Source repository Provider profile

Provider: MotherDuck Path in repo: plugins/motherduck-skills/skills/motherduck-load-data/SKILL.md

Skill body

Load Data into MotherDuck

Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.

Source Of Truth

Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
Use CREATE SECRET and cloud-storage docs for protected-object-store workflows.
Use the DuckDB database upload docs when the source is an existing local .duckdb, .ddb, or attached DuckDB database.
Keep the loading advice aligned with MotherDuck’s documented posture:
- batch over streaming
- Parquet over CSV when you control the format
- dataframe, COPY, CTAS, or INSERT ... SELECT over row-by-row inserts
- native MotherDuck storage first unless DuckLake is explicitly required

Default Posture

Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.
Prefer CREATE TABLE AS SELECT for first loads and INSERT INTO ... SELECT for appends.
For whole local DuckDB databases, use CREATE OR REPLACE DATABASE remote_name FROM CURRENT_DATABASE(), an attached local database, or a file path from a native DuckDB client after attaching md:.
Use Parquet for durable bulk movement whenever you control the source format.
Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
Keep raw landing minimally transformed; do typing, deduplication, and business logic in staging or modeling steps.
Keep source storage close to the MotherDuck region when you control placement.

Workflow

Identify where the source data actually lives.
Choose the loading path:
- object storage or HTTPS: remote read into MotherDuck
- local file or local DuckDB: use a DuckDB client path
- in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
- external database: use the appropriate scan or replication path from a DuckDB-capable environment
Land the data into a raw or staging table with minimal transformation.
Validate row counts, types, and a few business aggregates immediately after the load.
Promote into modeled tables only after the landing step is correct.

Open Next

references/INGESTION_PATTERNS.md for format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patterns

motherduck-connect for choosing between the Postgres endpoint and a DuckDB client path
motherduck-explore for inspecting destination databases and validating landed tables
motherduck-query for writing CTAS, append, and validation SQL
motherduck-model-data for promoting landed data into staging and analytics tables
motherduck-ducklake only when object-storage-backed lakehouse storage is an explicit requirement

Skill frontmatter

license: MIT