Windmill Labs
Windmill

Build production-grade data pipelines

For data teams who want reliable ETL with native DuckDB and Ducklake integrations. No Airflow or Spark clusters to manage.

Write pipeline steps in Python, TypeScript, SQL, Go, Bash and 15+ languages
Native integrations with DuckDB, Ducklake and Polars with zero config
Built-in S3 / Azure Blob / GCS workspace storage with dataset browsing
Single-node performance that outperforms Spark on most ETL workloads

Trusted by 4,000+ organizations, including 300+ EE customers at scale:

ZoomZoomKahootInvesting.comCFA InstituteAxiansAxiansPhotoroomPavePanther LabsNocd

Build pipelines with full code flexibility and DAG visualization

Each pipeline is a flow — a directed acyclic graph where each step is a script in Python, TypeScript, SQL or any supported language.

20+ languages

Write each pipeline step in the language that fits best. Python, TypeScript, SQL, Go, Bash, Rust, PHP, and 20+ more. Mix and match freely within a single flow.

20+ languages

Parallel branches

Fan out extraction steps across independent sources and collect results automatically.

Parallel branches

Restart from any step

Fix a bug and re-run from the failing step. No need to replay the entire pipeline.

Restart from any step

Retries & error handlers

Configurable retry count with exponential backoff. Custom error handler scripts per step.

Retries & error handlers

Trigger from anywhere

Cron schedules, webhooks, Postgres CDC, Kafka topics, SQS queues — or just click "Run".

Trigger from anywhere

Data tables

Built-in relational storage with zero setup. Query from Python, TypeScript, SQL or DuckDB. Credentials are managed internally and never exposed.

Data tables

The native DuckDB and Ducklake orchestrator

The only orchestrator with zero-config DuckDB, Ducklake and S3 support. Credentials and connections are handled automatically, just write your query.

DuckDB

DuckDB

Query S3 files with SQL. DuckDB scripts auto-connect to your workspace storage. No credentials to manage, no connection strings to configure.

DuckDB
Ducklake

Ducklake

Store massive datasets in S3 and query them with SQL. Full data lake with catalog support, versioning and ACID transactions.

Ducklake
Workspace S3

Workspace S3

Link your workspace to S3, Azure Blob, GCS, R2 or MinIO. Browse and preview Parquet, CSV and JSON directly from the UI.

Workspace S3
Polars

Polars

Lightning-fast DataFrames in Python. Read and write Parquet directly from your workspace S3 bucket with zero config.

Polars

Assets lineage

Pipeline steps pass datasets as lightweight JSON pointers to S3 objects. No serialization overhead, no memory limits.

Assets lineage
And 50+ more:PostgreSQLMySQLBigQuerySnowflakeRedshiftClickHouseMongoDBREST APIsSee all integrations

Challenging the status quo of data warehouses

Stop paying per-query. Run DuckDB and Ducklake locally on your workers.

WindmillWindmill + DuckDBDuckDB
SnowflakeSnowflake / BigQueryBigQuery
ComputeLocal on your workersRemote warehouse
Cost modelFlat, pay for infra onlyPer-query pricing
Data storageYour S3 bucket, open formatsVendor-managed, proprietary
Vendor lock-inNoYes
OrchestrationBuilt-in (flows, retries, schedules)Separate tool needed
SetupZero config, auto-connectedCredentials, drivers, networking
Data egress feesNoYes

Windmill also orchestrates Snowflake, BigQuery and other warehouses. You can mix local DuckDB steps with remote warehouse queries in the same pipeline.

Production-grade performance that replaces Spark

Polars and DuckDB process data on a single node far faster than distributed frameworks for the vast majority of ETL workloads.

TPC-H benchmark, 8 queries on m4.xlarge (8 vCPUs, 32 GB RAM)

Frequently asked questions

Start building data pipelines today

Get started for free on Windmill Cloud or self-host the open-source version.