Is Windmill a drop-in replacement for Dagster?

Not feature-for-feature. Both model data work as assets with lineage, so the mental model transfers. Windmill pipelines (DuckDB + DuckLake with // materialize, partitions, data tests and backfill) are in alpha, while Dagster's software-defined assets, asset checks, partitions and sensors are mature. Windmill is a fit if you also want scripts, flows, apps and AI agents on the same runtime with the transforms run in-platform by DuckDB. Dagster is a fit if you want a mature asset catalog and are happy pairing it with dbt and a separate warehouse.

Does Windmill run the transforms itself, or does it need dbt and a warehouse like Dagster?

Windmill runs them itself. A pipeline step is a DuckDB query that materializes into a managed DuckLake table on your own workers, so transform, storage/versioning, orchestration and compute are one platform. Dagster orchestrates and tracks asset metadata but relies on dbt or your own code for transforms and a separate warehouse or compute engine for the heavy lifting. Windmill also has a dbt integration if you already have dbt models.

Where does Dagster genuinely win today?

Dagster has the more mature asset model: asset checks, partition and backfill depth, sensors, a rich lineage catalog with column-level lineage (Dagster+ paid), declarative automation, strong dbt integration and an established data-engineering community. Windmill pipelines are alpha; they now cover freshness SLOs (fresh/stale on the graph, with an Enterprise watchdog that re-runs stale scripts), column-level lineage inferred from the SQL and schema contracts, but the asset-check, partition and sensor surface is younger and Dagster's declarative automation is more configurable.

How does pricing compare?

Dagster+ starts at a $10/month Solo plan and a $100/month Starter plan (pay-as-you-go credits on top), with Pro and Enterprise priced by contacting sales. RBAC, SSO, audit logs and column-level lineage are gated to the higher tiers. Windmill publishes per-seat and per-worker Enterprise pricing on its pricing page (around $20/mo per developer, $10/mo per operator), and the open-source core is free and unlimited to self-host.

Windmill

Windmill cloud

OSS

Windmill vs Dagster

Q: Can I migrate Dagster assets to Windmill automatically?

No, there is no automated converter. The migration is a manual rewrite: each Dagster @asset becomes a Windmill script (a DuckDB // materialize step or an any-language // pipeline script), asset checks become // data_test lines, and partitions/backfill map to Windmill partitions and backfill. The transform SQL usually ports as-is; the Definitions wiring is replaced by comment annotations Windmill reads from asset lineage.

Q: Is Dagster open source?

Dagster's OSS core is Apache 2.0, which is permissive if you plan to modify and redistribute. Dagster+ (the managed cloud with the catalog, insights, RBAC, SSO and branch deployments) is proprietary. Windmill's OSS core is AGPLv3 and its Enterprise features ship in a separate proprietary codebase. Both are free to self-host the core.

Q: Can Windmill do things Dagster does not?

Yes. Windmill is a general platform, not only a data orchestrator: the same runtime powers standalone scripts exposed as APIs and webhooks, general (non-data) workflows with approval steps, low-code and full-code (React/Svelte) internal apps, and AI agents with tool calling and sandboxes. Dagster is focused on data orchestration and the asset catalog and does not ship low-code apps or an agent framework.

Eight questions teams ask during a bake-off, with an honest answer on each one for where Windmill or Dagster is the better pick.

Try Windmill cloud Self-host in 3 mins

The 8 questions

01What you can buildWhich solution fits for your specific use case
02TargetWho the platform is built for
03Build experienceHow you build on each platform
04IntegrationsHow the platform integrates with your existing stack
05Migration & lock-inHow hard to get in, how hard to get out
06Enterprise requirementsAudit logs, observability, security, performance
07Licensing & pricingOpen source, pricing, self-hosting
08VerdictThe verdict

Windmill in one sentence

An open-source developer platform to build and orchestrate all your internal software on one runtime: scripts, workflows, data pipelines, AI agents and internal apps. Its pipelines feature (alpha) does the transforms in-platform with DuckDB and managed DuckLake tables, and runs on your own workers.

Dagster in one sentence

An open-source data orchestrator built around Python software-defined assets, with asset checks, partitions, backfills, sensors and a rich lineage catalog. It orchestrates and tracks assets but relies on dbt or your own code for transforms and a separate warehouse or engine for compute. Best for data-engineering teams standardizing on an asset-based catalog.

01 · What you can build

Which internal software can you build and orchestrate?

Windmill is built to centralize and orchestrate all your internal software in one place: data pipelines, scripts, general workflows, full-code apps, AI agents and operator UIs run on a single runtime, with the transforms run in-platform by DuckDB and DuckLake. Dagster focuses on data orchestration and the asset catalog, and pairs with dbt and a separate warehouse for transforms and compute.

Primitive	Windmill	Dagster
Data pipelines & ETL Scheduled and event-driven data jobs with branches, retries and parallelism
Asset lineage & catalog Declare assets and let the platform infer the graph from what each step reads and writes	Alpha
Asset checks / data tests unique, not_null, accepted_values and relationship checks that fail the run	Alpha
Partitions & backfill Time and key partitions with range backfill (one-click range backfill is Enterprise on Windmill)
Freshness policies Declared SLOs with a fresh/stale verdict; automatic re-runs of stale assets (Enterprise on Windmill, declarative automation on Dagster)
Column-level lineage Column-to-column lineage inferred from the SQL (Dagster: Dagster+ paid)	Alpha	Dagster+
Built-in transform engine DuckDB + managed DuckLake tables do the transforms in-platform (Dagster orchestrates external compute / dbt)
General workflows (DAG) Chain any steps into flows with branching, loops and approval steps beyond data assets		Partial
Scripts as standalone APIs & jobs Standalone functions exposed as APIs, webhooks or cron without a wrapping asset
Scheduled tasks Cron jobs with retries, error handling and alerting
AI agent workflows Agents that call tools, branch on outputs and run as workflows
Low-code internal apps Drag-and-drop dashboards and admin tools with built-in components
Full-code internal apps Custom dashboards and admin tools built in React or Svelte

02 · Target

Who is each platform built for?

Windmill is built for developer-led teams that want data pipelines and the rest of their internal software (apps, agents, general workflows) on one runtime. Dagster is built for data-engineering teams that want a mature asset catalog over dbt and a warehouse.

Windmill

Dagster

Primary audience

Windmill

Developer-led teams that want one platform for data and non-data work. Engineers own it end-to-end: code, Git, local dev, code review, CI/CD, AI coding tools, infrastructure as code. Data engineers get asset-based pipelines; the same team also ships scripts, apps and agents.

Dagster

Data-engineering teams standardizing on an asset-based catalog. Python-first, and a strong fit for shops already invested in dbt and a cloud warehouse who want lineage, checks, partitions and backfills over that stack.

Where non-data work lives

Windmill

On the same runtime. General workflows, approval steps, operator UIs and both low-code and full-code internal apps are first-class, so a data pipeline and the dashboard that reads its output live in one place.

Dagster

Outside Dagster. It focuses on orchestrating data assets, so dashboards, internal tools and operator front-ends come from other tools in your stack.

03 · Build experience

How do you build on each platform?

Windmill declares assets, transforms and checks with comment annotations on ordinary scripts in any language, and runs the transforms in-platform with DuckDB. Dagster declares assets as Python functions and a Definitions object, and sends the transform out to dbt or a warehouse.

Windmill

A DuckDB script declares the asset it produces with // materialize, and the checks it must pass with // data_test. Windmill infers the graph from asset lineage. Any language works as a step, so a Python script reading the same DuckLake table is a first-class part of the same pipeline.

orders_daily.sql
enrich.py

-- pipeline
-- materialize ducklake://main/orders_daily key=order_id
-- partitioned daily
-- data_test not_null order_id
-- data_test unique order_id

ATTACH 'ducklake://main' AS dl;

SELECT order_id, amount, created_at
FROM dl.orders
WHERE created_at::date = '{partition}'

# pipeline
# on ducklake://main/orders_daily
import wmill

def main(partition: str):
    # any-language step, same managed DuckLake read
    df = wmill.ducklake("main").read("orders_daily", partition=partition)
    ship_to_warehouse(df)

Dagster

Assets are Python functions decorated with @dg.asset. Dependencies, partitions and checks are declared as arguments and separate decorated functions, then registered in a Definitions object. The transform itself is SQL you send to DuckDB, dbt or another engine from inside the asset body.

assets.py
definitions.py

import dagster as dg

@dg.asset
def orders(duckdb: dg.ResourceParam): ...

daily = dg.DailyPartitionsDefinition(start_date="2026-01-01")

@dg.asset(deps=[orders], partitions_def=daily)
def orders_daily(context: dg.AssetExecutionContext, duckdb):
    part = context.partition_key
    with duckdb.get_connection() as conn:
        conn.execute(
            f"CREATE OR REPLACE TABLE orders_daily AS "
            f"SELECT order_id, amount, created_at FROM orders "
            f"WHERE created_at::date = '{part}'"
        )

import dagster as dg
from .assets import orders, orders_daily

@dg.asset_check(asset=orders_daily)
def order_id_not_null(duckdb) -> dg.AssetCheckResult: ...

defs = dg.Definitions(
    assets=[orders, orders_daily],
    asset_checks=[order_id_not_null],
)

Windmill

Dagster

Authoring

Windmill

A pipeline step is a script in a folder. A DuckDB script declares the asset it produces with // materialize and the checks it must pass with // data_test; Windmill infers the graph from asset lineage and owns the write (idempotent, snapshotted DuckLake tables). Any language works as a step: a Python or TypeScript // pipeline script that reads and writes assets is first-class. This is alpha.

Dagster

Assets are Python functions decorated with @dg.asset, wired through function arguments and a Definitions object. Checks are @dg.asset_check functions; partitions and schedules are declared objects. Authoring is Python-only; the transform SQL is code you send to DuckDB, dbt or a warehouse from inside the asset body.

Transforms & compute

Windmill

Step logic is standard code (SQL, Python, TypeScript, Go, Bash) that runs anywhere, and the CLI exports the whole workspace as plain files. DuckLake tables are open Parquet + catalog, so the data is not locked in. What you lose leaving Windmill is the orchestration layer, not the transforms or the data.

Dagster

The transform SQL and dbt models port cleanly, since they are standard. What needs rewriting is the Dagster-specific layer: @asset / @asset_check decorators, the Definitions object, sensors and partition definitions all use the Dagster API and have to be reimplemented on another platform.

06 · Enterprise requirements

Windmill

~10ms cold starts, with dedicated-worker mode Enterprise only keeping runtimes pre-warmed. DuckDB does the transform in-process on the worker, so a pipeline step needs no external cluster: single-node DuckDB and Polars cover the vast majority of ETL workloads (see ETL & data processing). We do not publish a head-to-head Windmill-vs-Dagster benchmark.

Dagster

Scales by running assets on external compute you provision (Kubernetes, Databricks, a warehouse), so throughput is largely a property of that backend rather than Dagster itself. Per-run process/pod startup adds overhead that is negligible for multi-minute data jobs.

07 · Licensing & pricing

Open source, pricing, and self-hosting?

Dagster's OSS core is Apache 2.0, more permissive than Windmill's AGPLv3 for redistribution. Windmill publishes full per-seat and per-worker Enterprise pricing, while Dagster+ Pro and Enterprise require a sales conversation and gate RBAC, SSO and audit logs to Pro. Both are free to self-host the core.

Windmill

Dagster

Open-source license

Windmill

AGPLv3 core, free and unlimited to self-host. Enterprise features (SSO, dedicated workers, audit logs, external secret backends, one-click range backfill) ship in a separate proprietary codebase. Managed cloud available.

Dagster

Apache 2.0 core, free and unlimited to self-host, and more permissive than AGPLv3 for redistribution. Dagster+ (the managed cloud with the catalog, insights, RBAC, SSO and branch deployments) is proprietary.

Pricing

Windmill

Public per-seat and per-worker pricing on the pricing page (around $20/mo per developer, $10/mo per operator). Enterprise adds SSO, audit logs, dedicated workers and advanced worker groups.

Dagster

Dagster+ starts at a $10/mo Solo plan and $100/mo Starter plan, both pay-as-you-go on credits on top. Pro and Enterprise are priced by contacting sales. RBAC, SSO, audit logs and column-level lineage require the Pro tier. There is no permanently free managed tier (Solo is the entry price).

08 · Verdict

The verdict

Windmill and Dagster model data work the same way, as assets with lineage, but draw the platform boundary in different places. Dagster is a focused data orchestrator: assets are Python functions, and it tracks and schedules them while dbt and a warehouse do the transforms and compute. Windmill puts the transform inside the platform: a pipeline step is a DuckDB query that materializes into a managed DuckLake table on your own workers, so transform, storage, orchestration and compute are one system, in any language.

Dagster is the safer pick today if data orchestration is the whole job and you want maturity: software-defined assets, asset checks, deep partitions and backfills, sensors, a rich lineage catalog with column-level lineage, freshness policies and a strong dbt integration, all battle-tested by a large data-engineering community. Windmill's pipelines are alpha by comparison: freshness SLOs, column-level lineage, schema contracts, SCD2 history and range backfill all shipped recently, and the asset-check, partition and sensor surface is younger.

Windmill is the better pick if you want more than a data orchestrator. The same runtime that runs your pipelines also runs standalone scripts exposed as APIs, general workflows with approval steps, low-code and full-code (React/Svelte) internal apps, and AI agents, with shared auth, secrets and observability across all of them. Its pipelines also do the transform in-platform with DuckDB and DuckLake, so a small-to-mid data pipeline needs no separate warehouse, and more of the enterprise foundation (RBAC, SSO, audit logs) ships in open source with published pricing rather than a sales call.

The switching cost is low on both: Dagster's dbt models and transform SQL port cleanly, and Windmill's step code plus open Parquet DuckLake data runs anywhere after you leave. If you're deciding between the two, the fastest way to judge is to spend an afternoon in each.

Frequently asked questions

Data pipelines on Windmill

The product overview: the DuckDB and DuckLake data-pipeline feature set, integrations and benchmarks at a glance.

Pipelines documentation

Every annotation and option, with the full "how it compares to dbt and Dagster" section.

Windmill vs dbt

The transform-layer comparison: how Windmill's DuckDB + DuckLake model lines up against dbt.

Build your internal platform on Windmill

Scripts, flows, apps, and infrastructure in one place.

Get started for free