Windmill Labs
Windmill

Windmill vs Dagster

Eight questions teams ask during a bake-off, with an honest answer on each one for where Windmill or Dagster is the better pick.

WindmillWindmill in one sentence

An open-source developer platform to build and orchestrate all your internal software on one runtime: scripts, workflows, data pipelines, AI agents and internal apps. Its pipelines feature (alpha) does the transforms in-platform with DuckDB and managed DuckLake tables, and runs on your own workers.

DagsterDagster in one sentence

An open-source data orchestrator built around Python software-defined assets, with asset checks, partitions, backfills, sensors and a rich lineage catalog. It orchestrates and tracks assets but relies on dbt or your own code for transforms and a separate warehouse or engine for compute. Best for data-engineering teams standardizing on an asset-based catalog.

01 · What you can build

Which internal software can you build and orchestrate?

Windmill is built to centralize and orchestrate all your internal software in one place: data pipelines, scripts, general workflows, full-code apps, AI agents and operator UIs run on a single runtime, with the transforms run in-platform by DuckDB and DuckLake. Dagster focuses on data orchestration and the asset catalog, and pairs with dbt and a separate warehouse for transforms and compute.

PrimitiveWindmillWindmillDagsterDagster
Scheduled and event-driven data jobs with branches, retries and parallelism
Declare assets and let the platform infer the graph from what each step reads and writes
Alpha
unique, not_null, accepted_values and relationship checks that fail the run
Alpha
Time and key partitions with range backfill (one-click range backfill is Enterprise on Windmill)
Declared SLOs with a fresh/stale verdict; automatic re-runs of stale assets (Enterprise on Windmill, declarative automation on Dagster)
Column-to-column lineage inferred from the SQL (Dagster: Dagster+ paid)
Alpha
Dagster+
DuckDB + managed DuckLake tables do the transforms in-platform (Dagster orchestrates external compute / dbt)
Chain any steps into flows with branching, loops and approval steps beyond data assets
Partial
Standalone functions exposed as APIs, webhooks or cron without a wrapping asset
Cron jobs with retries, error handling and alerting
Agents that call tools, branch on outputs and run as workflows
Drag-and-drop dashboards and admin tools with built-in components
Custom dashboards and admin tools built in React or Svelte
02 · Target

Who is each platform built for?

Windmill is built for developer-led teams that want data pipelines and the rest of their internal software (apps, agents, general workflows) on one runtime. Dagster is built for data-engineering teams that want a mature asset catalog over dbt and a warehouse.

Primary audience

WindmillWindmill

Developer-led teams that want one platform for data and non-data work. Engineers own it end-to-end: code, Git, local dev, code review, CI/CD, AI coding tools, infrastructure as code. Data engineers get asset-based pipelines; the same team also ships scripts, apps and agents.

DagsterDagster

Data-engineering teams standardizing on an asset-based catalog. Python-first, and a strong fit for shops already invested in dbt and a cloud warehouse who want lineage, checks, partitions and backfills over that stack.

Where non-data work lives

WindmillWindmill

On the same runtime. General workflows, approval steps, operator UIs and both low-code and full-code internal apps are first-class, so a data pipeline and the dashboard that reads its output live in one place.

DagsterDagster

Outside Dagster. It focuses on orchestrating data assets, so dashboards, internal tools and operator front-ends come from other tools in your stack.

03 · Build experience

How do you build on each platform?

Windmill declares assets, transforms and checks with comment annotations on ordinary scripts in any language, and runs the transforms in-platform with DuckDB. Dagster declares assets as Python functions and a Definitions object, and sends the transform out to dbt or a warehouse.

WindmillWindmill

A DuckDB script declares the asset it produces with // materialize, and the checks it must pass with // data_test. Windmill infers the graph from asset lineage. Any language works as a step, so a Python script reading the same DuckLake table is a first-class part of the same pipeline.

-- pipeline
-- materialize ducklake://main/orders_daily key=order_id
-- partitioned daily
-- data_test not_null order_id
-- data_test unique order_id

ATTACH 'ducklake://main' AS dl;

SELECT order_id, amount, created_at
FROM dl.orders
WHERE created_at::date = '{partition}'
DagsterDagster

Assets are Python functions decorated with @dg.asset. Dependencies, partitions and checks are declared as arguments and separate decorated functions, then registered in a Definitions object. The transform itself is SQL you send to DuckDB, dbt or another engine from inside the asset body.

import dagster as dg

@dg.asset
def orders(duckdb: dg.ResourceParam): ...

daily = dg.DailyPartitionsDefinition(start_date="2026-01-01")

@dg.asset(deps=[orders], partitions_def=daily)
def orders_daily(context: dg.AssetExecutionContext, duckdb):
part = context.partition_key
with duckdb.get_connection() as conn:
conn.execute(
f"CREATE OR REPLACE TABLE orders_daily AS "
f"SELECT order_id, amount, created_at FROM orders "
f"WHERE created_at::date = '{part}'"
)

Authoring

WindmillWindmill

A pipeline step is a script in a folder. A DuckDB script declares the asset it produces with // materialize and the checks it must pass with // data_test; Windmill infers the graph from asset lineage and owns the write (idempotent, snapshotted DuckLake tables). Any language works as a step: a Python or TypeScript // pipeline script that reads and writes assets is first-class. This is alpha.

DagsterDagster

Assets are Python functions decorated with @dg.asset, wired through function arguments and a Definitions object. Checks are @dg.asset_check functions; partitions and schedules are declared objects. Authoring is Python-only; the transform SQL is code you send to DuckDB, dbt or a warehouse from inside the asset body.

Transforms & compute

WindmillWindmill

Done in-platform. DuckDB runs the transformation and materializes into a managed DuckLake table on your own workers, with versioning and time-travel on every write. No separate warehouse required for the pipeline itself.

DagsterDagster

Delegated. Dagster orchestrates and records asset metadata but relies on dbt or your own code for transforms and a separate warehouse or engine (Snowflake, BigQuery, Spark, DuckDB) for compute. Dagster Pipes runs code on external compute (Kubernetes, Databricks, Lambda) and streams logs and metadata back.

Local dev & IDE

WindmillWindmill

Run scripts locally with the Windmill CLI, a VS Code extension, native language tooling (LSP, type-checking) and AI coding tools (Claude Code, Codex). For pipelines, wmill pipeline dev watches the folder and live-previews the graph in the browser, and wmill pipeline run --local runs it from working-tree files.

DagsterDagster

dagster dev runs the full UI and daemon locally against your Python definitions, with the standard Python toolchain (pytest, ruff, your editor's LSP). A strong local story for Python.

Resources & secrets

WindmillWindmill

Resources (typed connection info) and variables (secrets) are first-class, encrypted at rest, scoped by folders and groups, and reusable across scripts, flows and apps. External secret backends (Vault, AWS Secrets Manager) are Enterprise only.

DagsterDagster

Resources are typed Python objects (ConfigurableResource); secrets come from environment variables locally, and from the Dagster+ secrets manager or your cloud provider's manager in production. UI-managed secrets sit in Dagster+ Enterprise only.

Git & CI

WindmillWindmill

Full IaC: scripts, flows, apps, resources, variables, schedules, folders, groups and permissions are all files in Git, deployed via the CLI and Git sync. Git sync is free for up to 2 users; beyond that is Enterprise only. Workspace forks give each branch an isolated environment, including forked DuckLake data environments so a dev pipeline writes to dev tables instead of production.

DagsterDagster

Definitions are Python in your repo, deployed by your own CI to OSS, or through Dagster+ code locations. Branch deployments (a preview environment per pull request) are a Dagster+ feature Enterprise only.

04 · Integrations

How does the platform integrate with your existing stack?

Windmill imports any package as the vendor's real SDK and runs every step as a native script on your own workers. Dagster ships mature integration libraries for the data stack (dbt, Airbyte, Fivetran, Snowflake, Databricks) and uses Pipes to run code on external compute.

Connecting out

WindmillWindmill

Any npm, PyPI, Go or Maven package is a first-class import with automatic per-script dependency resolution, so you call the vendor's real SDK directly. 50+ pre-built resource types cover databases (Postgres, Snowflake, BigQuery), SaaS (Slack, Stripe, GitHub, OpenAI) and infrastructure (S3, Redis, Kafka), shared across scripts, flows and apps and encrypted at rest.

DagsterDagster

A catalog of integration libraries for the data stack: dbt, Airbyte, Fivetran, Snowflake, BigQuery, Databricks, Spark, Sling and more. Each is a Python package exposing resources and asset factories. Beyond those, any PyPI package is a normal import inside an asset.

Receiving events

WindmillWindmill

Native triggers for HTTP, cron, Kafka, NATS, Postgres CDC, SQS, MQTT, SMTP and WebSocket. Every script also gets an HTTP endpoint and webhook URL for free. Pipelines add asset triggers: writing an asset runs every step that reads it.

DagsterDagster

Sensors poll external systems (S3, a table, an API) and launch runs when something changes, alongside cron schedules and asset/freshness-based declarative automation. Webhook-style triggering is typically built with sensors or the GraphQL API.

Running on external compute

WindmillWindmill

Steps run as native scripts on your own workers in any language, with worker groups to route heavy jobs to dedicated hardware. Heavy compute is the worker doing the work, not a separate cluster to wire up.

DagsterDagster

Dagster Pipes runs your code on external compute (Kubernetes, Databricks, AWS Lambda, GCP Dataproc) in any language, streaming logs and metadata back to Dagster. This is a mature, well-documented pattern for offloading the heavy lifting to a cluster you already run.

05 · Migration & lock-in

How hard to get in, and how hard to get out?

Windmill keeps switching cost low: step code is standard SQL or code and DuckLake data is open Parquet, so both run anywhere after you leave. Dagster ports its dbt models and transform SQL cleanly, but the asset, check, sensor and Definitions layer is Dagster-specific and needs rewriting elsewhere.

Getting in

WindmillWindmill

Assets map across directly: a Dagster @asset becomes a DuckDB // materialize step or an any-language // pipeline script, asset checks become // data_test lines, and partitions/backfill map to Windmill partitions and backfill. The transform SQL usually ports as-is; the Definitions wiring is replaced by comment annotations. Pipelines are alpha, so expect rough edges.

DagsterDagster

If you already think in Python assets and dbt models, onboarding is quick. Existing dbt projects load as assets through the dbt integration, and Python transforms wrap in @asset functions.

Getting out

WindmillWindmill

Step logic is standard code (SQL, Python, TypeScript, Go, Bash) that runs anywhere, and the CLI exports the whole workspace as plain files. DuckLake tables are open Parquet + catalog, so the data is not locked in. What you lose leaving Windmill is the orchestration layer, not the transforms or the data.

DagsterDagster

The transform SQL and dbt models port cleanly, since they are standard. What needs rewriting is the Dagster-specific layer: @asset / @asset_check decorators, the Definitions object, sensors and partition definitions all use the Dagster API and have to be reimplemented on another platform.

06 · Enterprise requirements

Audit logs, observability, security, performance

Both cover the enterprise basics, though more of Dagster's are gated to Dagster+ (RBAC, SSO, audit logs, column-level lineage). Windmill ships RBAC, SSO (to 10 users) and audit logs in open source and does the transform compute in-platform. Dagster leans on the external compute you provision, which is a strong fit for large warehouse and Spark workloads.

Observability

WindmillWindmill

Real-time streaming logs, per-run inputs / outputs / duration, worker-queue metrics and a Prometheus exporter. Trace ID on every job. Pipelines show live status and lineage on the asset graph.

DagsterDagster

A strong observability story built around the asset catalog: per-asset run history, materialization metadata, asset checks and lineage in the UI. Column-level lineage and cost insights are Dagster+ features Enterprise only.

Audit logs

WindmillWindmill

Full trail of who ran / edited / deployed what. Extended retention is Enterprise only.

DagsterDagster

Audit logs are a Dagster+ (Pro) feature Enterprise only.

Security & access

WindmillWindmill

SOC 2 Type II. RBAC, SSO (up to 10 users), encrypted secrets and sandboxed execution in open source. Uncapped SSO, audit logs and advanced access controls (SCIM, SAML) are Enterprise only.

DagsterDagster

In OSS, access control is what you build around it. RBAC, SSO, SAML sync and teams are Dagster+ features, gated to the Pro tier (Starter has per-deployment RBAC only) Enterprise only.

Multi-tenancy & isolation

WindmillWindmill

Multiple isolated workspaces on one instance, each with their own users, resources, secrets and access controls. Free tier is capped at 3 workspaces; unlimited is Enterprise only.

DagsterDagster

Dagster+ organizes work into deployments and code locations. Solo and Starter cap deployments and code locations; unlimited deployments are a Pro-tier feature Enterprise only.

Performance & compute

WindmillWindmill

~10ms cold starts, with dedicated-worker mode Enterprise only keeping runtimes pre-warmed. DuckDB does the transform in-process on the worker, so a pipeline step needs no external cluster: single-node DuckDB and Polars cover the vast majority of ETL workloads (see ETL & data processing). We do not publish a head-to-head Windmill-vs-Dagster benchmark.

DagsterDagster

Scales by running assets on external compute you provision (Kubernetes, Databricks, a warehouse), so throughput is largely a property of that backend rather than Dagster itself. Per-run process/pod startup adds overhead that is negligible for multi-minute data jobs.

07 · Licensing & pricing

Open source, pricing, and self-hosting?

Dagster's OSS core is Apache 2.0, more permissive than Windmill's AGPLv3 for redistribution. Windmill publishes full per-seat and per-worker Enterprise pricing, while Dagster+ Pro and Enterprise require a sales conversation and gate RBAC, SSO and audit logs to Pro. Both are free to self-host the core.

Open-source license

WindmillWindmill

AGPLv3 core, free and unlimited to self-host. Enterprise features (SSO, dedicated workers, audit logs, external secret backends, one-click range backfill) ship in a separate proprietary codebase. Managed cloud available.

DagsterDagster

Apache 2.0 core, free and unlimited to self-host, and more permissive than AGPLv3 for redistribution. Dagster+ (the managed cloud with the catalog, insights, RBAC, SSO and branch deployments) is proprietary.

Pricing

WindmillWindmill

Public per-seat and per-worker pricing on the pricing page (around $20/mo per developer, $10/mo per operator). Enterprise adds SSO, audit logs, dedicated workers and advanced worker groups.

DagsterDagster

Dagster+ starts at a $10/mo Solo plan and $100/mo Starter plan, both pay-as-you-go on credits on top. Pro and Enterprise are priced by contacting sales. RBAC, SSO, audit logs and column-level lineage require the Pro tier. There is no permanently free managed tier (Solo is the entry price).

08 · Verdict

The verdict

Windmill and Dagster model data work the same way, as assets with lineage, but draw the platform boundary in different places. Dagster is a focused data orchestrator: assets are Python functions, and it tracks and schedules them while dbt and a warehouse do the transforms and compute. Windmill puts the transform inside the platform: a pipeline step is a DuckDB query that materializes into a managed DuckLake table on your own workers, so transform, storage, orchestration and compute are one system, in any language.

Dagster is the safer pick today if data orchestration is the whole job and you want maturity: software-defined assets, asset checks, deep partitions and backfills, sensors, a rich lineage catalog with column-level lineage, freshness policies and a strong dbt integration, all battle-tested by a large data-engineering community. Windmill's pipelines are alpha by comparison: freshness SLOs, column-level lineage, schema contracts, SCD2 history and range backfill all shipped recently, and the asset-check, partition and sensor surface is younger.

Windmill is the better pick if you want more than a data orchestrator. The same runtime that runs your pipelines also runs standalone scripts exposed as APIs, general workflows with approval steps, low-code and full-code (React/Svelte) internal apps, and AI agents, with shared auth, secrets and observability across all of them. Its pipelines also do the transform in-platform with DuckDB and DuckLake, so a small-to-mid data pipeline needs no separate warehouse, and more of the enterprise foundation (RBAC, SSO, audit logs) ships in open source with published pricing rather than a sales call.

The switching cost is low on both: Dagster's dbt models and transform SQL port cleanly, and Windmill's step code plus open Parquet DuckLake data runs anywhere after you leave. If you're deciding between the two, the fastest way to judge is to spend an afternoon in each.

Frequently asked questions

Build your internal platform on Windmill

Scripts, flows, apps, and infrastructure in one place.