Windmill Labs
Windmill

Windmill vs dbt

dbt is a SQL transformation framework that runs in your warehouse and needs an external scheduler. Windmill pipelines fold transform, storage, orchestration and compute into one platform. Eight questions to tell them apart, answered honestly.

WindmillWindmill in one sentence

An open-source developer platform whose pipelines (alpha) fold transform (DuckDB), storage and versioning (DuckLake), orchestration and compute (your own workers) into one data-pipeline platform, alongside workflows, AI agents and apps. Any language.

dbtdbt in one sentence

The de-facto standard for SQL transformation in the warehouse: models with a ref() / source() lineage DAG, tests, snapshots, macros, a semantic layer and a generated docs site. Transform-only by design, it runs against a warehouse you provide and needs an external scheduler (cron, Airflow, Dagster or dbt Cloud jobs) to run on a cadence.

01 · What you can build

Which data work and software can you do on each?

dbt is focused and deep: SQL modeling, tests, snapshots, macros, a semantic layer and a docs site, all pushed down to a warehouse you already run. Windmill covers the same core transform-and-lineage loop with DuckDB and DuckLake (in alpha), and adds the orchestration, compute and the wider software (workflows, apps, agents) around it.

CapabilityWindmillWindmilldbtdbt
ref() / source() style dependencies inferred from the code (Windmill: alpha)
unique, not_null, accepted_values, relationships that fail the run
Keep every version of every row for effective-dated as-of joins
dbt Jinja macros vs Windmill engine-native DuckDB macro libraries
Column-to-column lineage inferred from the SQL
dbt Cloud
dbt enforces declared contracts at build time; Windmill checks consumers against captured schemas at save time (warn-only)
Fresh/stale verdict per asset; automatic re-runs of stale assets are Enterprise on Windmill, source freshness is dbt Cloud
dbt Cloud
Seeds (CSV loading)
Load small static CSVs into the warehouse as tables
Docs-site generation
A generated, browsable documentation website of the project
Semantic layer / metrics
Central metric definitions queried by BI tools
dbt Cloud
Run on a cron, react to events, retry (dbt Core needs an external scheduler)
dbt Cloud
dbt pushes SQL to your warehouse; Windmill runs DuckDB on your own workers
Python, TypeScript, Go, Bash next to SQL (dbt: Python models on some adapters)
Partial
General flows, HTTP endpoints, AI agents and internal apps on the same runtime

Windmill pipeline features (materialize, DuckLake, data tests, macros, column lineage) are in alpha. Range backfill, the freshness watchdog and write-audit-publish (a failing data test rolls back the write instead of leaving the failed table live) are Enterprise Edition features; single-partition runs, the fresh/stale badge and commit-then-test data tests are available in all editions.

02 · Target

Who is each platform built for?

dbt is built for analytics engineers modeling SQL inside a warehouse, and it set the standard for that discipline. Windmill is built for developer-led teams that want the transform layer and the orchestration, compute and surrounding software in one place, in any language.

Primary audience

WindmillWindmill

Developer-led teams building across data and software. Engineers own the platform end-to-end in any language, from DuckDB transforms to workflows, APIs, agents and apps, with Git, local dev and CI/CD.

dbtdbt

Analytics engineers and data teams who model in SQL against a cloud warehouse. dbt brought software-engineering practices (version control, testing, CI, modularity) to the analytics layer, and that is squarely who it serves.

Where it fits in the stack

WindmillWindmill

A single runtime for the whole job: ingest, transform, materialize, orchestrate, serve and alert. The data pipeline is one part of a broader internal-software platform, not a separate tool bolted to a scheduler and a warehouse.

dbtdbt

The transformation layer, sitting between ingestion (Fivetran, Airbyte) and BI (Looker, Tableau). It assumes you already run a warehouse and something to schedule it, and does the T of ELT extremely well within that boundary.

Language

WindmillWindmill

SQL (DuckDB) for transforms, plus Python, TypeScript, Go and Bash steps in the same pipeline. Any package is a first-class import.

dbtdbt

SQL first, with Jinja templating for logic and macros. Python models are supported on some warehouse adapters (Snowflake, BigQuery, Databricks), but SQL is the center of gravity.

03 · Build experience

How do you build on each platform?

dbt splits a model across a SQL file and YAML config, with a mature --select grammar and best-in-class Git-native CI. Windmill keeps the write, tests and lineage as inline annotations in one DuckDB file, and runs it on its own compute with a local wmill pipeline dev loop.

WindmillWindmill

One DuckDB file is the whole step. Comment annotations declare the managed materialization, the inputs, the tests and the lineage. There is no separate config file, no connection profile and no external scheduler: Windmill owns the write, runs it on your workers and triggers it natively.

-- pipeline
-- on ducklake://analytics/raw_customers
-- materialize ducklake://analytics/dim_customer key=customer_id
-- data_test unique customer_id
-- data_test not_null customer_id

ATTACH 'ducklake://analytics' AS dl;
SELECT customer_id, name, tier FROM dl.raw_customers;
dbtdbt

A model is a SQL select with a Jinja config() block. Tests, sources and column metadata live in a separate schema.yml. Project settings live in dbt_project.yml, the warehouse connection in profiles.yml, and an external scheduler (cron, Airflow, Dagster or dbt Cloud jobs) runs it against your warehouse.

-- models/dim_customer.sql
{{ config(
materialized = 'incremental',
unique_key = 'customer_id'
) }}

select
customer_id,
name,
tier
from {{ source('raw', 'customers') }}

Authoring

WindmillWindmill

A pipeline step is a DuckDB script whose comment annotations declare everything: -- materialize for the managed write, -- on for inputs, -- data_test for checks, -- partitioned for partitions. Config lives inline in the one file. Non-SQL steps (Python, TypeScript, Bash) use the same annotations.

dbtdbt

A model is a SQL select with a Jinja config() block. Tests, sources and descriptions live in schema.yml; project settings in dbt_project.yml; the warehouse connection in profiles.yml. The split across files is a convention thousands of teams already know well.

Lineage & selection

WindmillWindmill

Lineage is inferred from the assets each script reads and writes, including column-level lineage from the SQL. "Run up to here" bounds a run to a chosen node without a compile-time select grammar.

dbtdbt

Lineage comes from ref() and source(). The --select / --exclude grammar (with graph operators like +model and tag/path selectors) is mature and expressive, and column-level lineage is available in dbt Explorer.

Local dev & IDE

WindmillWindmill

CLI loop: wmill pipeline dev watches the folder and live-previews the graph in the browser, wmill pipeline run --local runs it from working-tree files. VS Code and AI coding tools work against the real source.

dbtdbt

dbt run / dbt build from the CLI, or the browser IDE and Studio in dbt Cloud. A large, well-documented local workflow with strong editor and adapter tooling.

Compute & warehouse

WindmillWindmill

Transforms run on DuckDB on your own workers, and results land in managed DuckLake tables. No external warehouse required, though you can still query Postgres, Snowflake or BigQuery from a step.

dbtdbt

dbt compiles SQL and runs it in your warehouse: Snowflake, BigQuery, Redshift, Databricks, Postgres and others via adapters. Compute and cost are the warehouse's; dbt provides none of its own.

Resources & secrets

WindmillWindmill

Resources and variables are first-class, encrypted at rest and shared across scripts, flows and apps. External secret backends (Vault, AWS Secrets Manager) are Enterprise only.

dbtdbt

Warehouse credentials live in profiles.yml or environment variables for dbt Core; dbt Cloud manages connections and secrets in the UI. Governed connections and fine-grained access are Cloud (paid) features.

Git & CI

WindmillWindmill

Scripts, pipelines, resources, schedules and permissions are all files deployed via the CLI and Git sync. Git sync is free for up to 2 users; beyond that is Enterprise only. Workspace forks add per-branch dev environments with forked DuckLake data, the analog of developing against dev schemas with --defer in dbt.

dbtdbt

dbt is Git-native by design: a project is a repo, and dbt Cloud has built-in CI jobs that run and test models on pull requests. This is one of dbt's strongest, most mature areas.

04 · Integrations & compute

Where does the work run, and how does it connect?

dbt connects to your warehouse through adapters and draws on a large, mature package ecosystem, but brings no compute or scheduler of its own. Windmill runs transforms on its own workers and schedules and triggers them natively, so the pipeline does not depend on a separate warehouse or orchestrator.

Where transforms run

WindmillWindmill

On your own workers via DuckDB, writing to managed DuckLake tables on object storage. You bring the compute; there is no separate warehouse in the loop unless you want one.

dbtdbt

Inside your cloud warehouse. dbt generates SQL and hands it to Snowflake, BigQuery, Redshift, Databricks or Postgres through an adapter. Warehouse choice, performance and bill are all yours.

Packages & ecosystem

WindmillWindmill

Any npm, PyPI, Go or Maven package is a first-class import with automatic dependency resolution, plus workspace DuckDB macro libraries for shared SQL logic and a community Hub of scripts.

dbtdbt

dbt Hub has hundreds of community packages (dbt-utils, dbt_expectations, codegen, audit_helper) that are widely adopted and mature. This is a genuine advantage: the package ecosystem is large and battle-tested.

Scheduling & triggers

WindmillWindmill

Native. A step fires on a schedule, on an upstream asset write (the cascade), or from a trigger (HTTP, Kafka, Postgres CDC, SQS, MQTT and more). No external orchestrator to stand up.

dbtdbt

None in dbt Core: you supply the scheduler (cron, Airflow, Dagster, or dbt Cloud jobs). dbt Cloud adds hosted job scheduling and event triggers, but that is the paid product, not the open-source engine.

05 · Migration & lock-in

How hard to get in, and how hard to get out?

dbt is easy to adopt if you already write warehouse SQL, and your data stays in your warehouse, though model code is dbt-flavored (Jinja, macros, packages). Windmill keeps transforms as standard SQL over open DuckLake Parquet you own, so leaving costs you the orchestration layer, not the data.

Getting in

WindmillWindmill

A dbt model's SQL body ports closely to a DuckDB materialize script: the config() and schema.yml become comment annotations, ref() / source() become asset URIs, and Jinja macros become DuckDB macro libraries. There is no automated converter, so it is a manual rewrite.

dbtdbt

Very low friction if you already write SQL against a supported warehouse. Init a project, point profiles.yml at your warehouse, and existing SQL becomes models with small edits. This ease of onboarding is a big part of dbt's adoption.

Getting out

WindmillWindmill

Transform logic is standard SQL and the DuckLake tables are open Parquet on object storage you own, readable by any DuckDB client. The CLI exports the workspace as plain files. What you lose leaving is the orchestration layer, not the data or the SQL.

dbtdbt

Model SQL is portable, but it is dbt-flavored: Jinja templating, ref()/source(), macros and package dependencies need reworking on any other tool. Because dbt writes to your own warehouse, the data itself never leaves your control, which lowers lock-in on the storage side.

06 · Enterprise requirements

Audit logs, observability, security, performance

Both bring software-engineering rigor to data work, and both gate governance behind paid tiers. dbt's performance is your warehouse's, tuned by incremental, state-aware rebuilds. Windmill runs transforms on its own workers, so throughput scales with the compute you allocate rather than a warehouse bill.

Observability

WindmillWindmill

Real-time streaming logs, per-run inputs / outputs / duration, the pipeline graph with live run status, and a Prometheus exporter. Every materialized asset records its snapshot id and row count.

dbtdbt

Run results and timings from the CLI and artifacts; dbt Cloud adds run history, alerting, dbt Explorer and dbt Insights. Rich run metadata, focused on the transformation graph.

Audit logs & security

WindmillWindmill

SOC 2 Type II. RBAC, SSO (up to 10 users), encrypted secrets and sandboxed execution in open source. Uncapped SSO, extended audit-log retention, SCIM and SAML are Enterprise only.

dbtdbt

SOC 2. RBAC, SSO (SAML), audit logging, PrivateLink and IP restrictions are dbt Cloud Enterprise / Enterprise+ features. dbt Core self-hosted has no built-in RBAC or SSO.

Multi-tenancy

WindmillWindmill

Multiple isolated workspaces on one instance, each with its own users, resources and secrets. Free tier is capped at 3 workspaces; unlimited is Enterprise only.

dbtdbt

Projects organize models; the free Developer tier allows 1 project, Starter 1, Enterprise up to 30 and Enterprise+ unlimited. dbt Mesh (cross-project references) is an Enterprise feature.

Performance model

WindmillWindmill

Transforms run on DuckDB on your workers, so throughput scales with the compute you allocate, independent of any warehouse. Single-node DuckDB and Polars cover the vast majority of ETL workloads without a warehouse or cluster (see ETL & data processing). Steps can be pinned to a worker tag (a high-memory or GPU pool) per -- tag.

dbtdbt

Performance is the warehouse's: dbt's job is to generate efficient SQL and rebuild only what changed (state-aware, incremental models). Cost and speed are governed by your warehouse configuration and spend.

07 · Licensing & pricing

Open source, pricing, and self-hosting?

dbt Core's Apache 2.0 license is more permissive than Windmill's AGPLv3, though its Fusion engine (ELv2) and dbt Cloud are proprietary. Windmill publishes per-seat and per-worker pricing upfront, while dbt's Enterprise tiers require a sales call.

Open-source license

WindmillWindmill

AGPLv3 core, free and unlimited to self-host. Enterprise features (SSO, dedicated workers, audit logs, external secret backends, range backfill) ship in a separate proprietary codebase. Managed cloud available.

dbtdbt

dbt Core is Apache 2.0, more permissive for modify-and-redistribute. The newer Fusion engine ships under the Elastic License 2.0 (source-available, not OSI-approved), and dbt Cloud is fully proprietary.

Pricing

WindmillWindmill

Public per-seat and per-worker pricing on the pricing page: developers around $20/month, operators $10/month, standard workers $50/month. The open-source core is free.

dbtdbt

Free Developer tier (1 seat, 3,000 models/month). Starter is $100 per developer/month (up to 5 seats). Enterprise and Enterprise+ pricing is not public and requires a sales conversation.

08 · Verdict

The verdict

dbt and Windmill are not the same category of tool. dbt is a transform-only SQL framework: it compiles models, tracks lineage through ref() and source(), and runs the SQL in a warehouse you provide, on a schedule something else supplies. Windmill pipelines cover that same transform-test-materialize-lineage loop with DuckDB and DuckLake, and also own the parts dbt deliberately leaves out: the compute, the storage, the scheduling and the triggers.

dbt is the right call if your data already lives in a cloud warehouse you are committed to, your team models in SQL, and you want the most mature analytics-engineering tool available: a large package ecosystem on dbt Hub, docs-site generation, a semantic layer, snapshots and Git-native CI, all production-hardened and widely understood. Windmill does not match dbt feature-for-feature here, and its pipelines are in alpha.

Windmill is the stronger fit if you would rather not run a separate warehouse and a separate orchestrator just to transform data. One platform runs the DuckDB transforms on your own workers, versions the results in DuckLake, and schedules and triggers everything natively, with data tests, SCD2 history, macros, freshness SLOs and column-level lineage as inline annotations. In one place it goes past dbt's defaults: on Enterprise Edition a failing data test rolls back the write (write-audit-publish) instead of leaving the failed model live. The same runtime also carries the surrounding software: workflows, APIs, AI agents and internal apps, in any language, with shared auth and observability.

If you are choosing between them, the honest framing is scope: dbt for deep, warehouse-native SQL modeling; Windmill for folding transform, storage, orchestration and compute into one platform. The fastest way to judge is to spend an afternoon in each.

Frequently asked questions

Build your internal platform on Windmill

Scripts, flows, apps, and infrastructure in one place.