Build production-grade data pipelines
For data teams who want reliable ETL with native DuckDB and Ducklake integrations. No Airflow or Spark clusters to manage.
Build pipelines with full code flexibility and DAG visualization
Each pipeline is a flow — a directed acyclic graph where each step is a script in Python, TypeScript, SQL or any supported language.
20+ languages
Write each pipeline step in the language that fits best. Python, TypeScript, SQL, Go, Bash, Rust, PHP, and 20+ more. Mix and match freely within a single flow.

Parallel branches
Fan out extraction steps across independent sources and collect results automatically.

Restart from any step
Fix a bug and re-run from the failing step. No need to replay the entire pipeline.

Retries & error handlers
Configurable retry count with exponential backoff. Custom error handler scripts per step.

Trigger from anywhere
Cron schedules, webhooks, Postgres CDC, Kafka topics, SQS queues — or just click "Run".

Data tables
Built-in relational storage with zero setup. Query from Python, TypeScript, SQL or DuckDB. Credentials are managed internally and never exposed.

The native DuckDB and Ducklake orchestrator
The only orchestrator with zero-config DuckDB, Ducklake and S3 support. Credentials and connections are handled automatically, just write your query.
DuckDB
Query S3 files with SQL. DuckDB scripts auto-connect to your workspace storage. No credentials to manage, no connection strings to configure.

Ducklake
Store massive datasets in S3 and query them with SQL. Full data lake with catalog support, versioning and ACID transactions.

Workspace S3
Link your workspace to S3, Azure Blob, GCS, R2 or MinIO. Browse and preview Parquet, CSV and JSON directly from the UI.

Polars
Lightning-fast DataFrames in Python. Read and write Parquet directly from your workspace S3 bucket with zero config.

Assets lineage
Pipeline steps pass datasets as lightweight JSON pointers to S3 objects. No serialization overhead, no memory limits.

Challenging the status quo of data warehouses
Stop paying per-query. Run DuckDB and Ducklake locally on your workers.
| Compute | Local on your workers | Remote warehouse |
| Cost model | Flat, pay for infra only | Per-query pricing |
| Data storage | Your S3 bucket, open formats | Vendor-managed, proprietary |
| Vendor lock-in | No | Yes |
| Orchestration | Built-in (flows, retries, schedules) | Separate tool needed |
| Setup | Zero config, auto-connected | Credentials, drivers, networking |
| Data egress fees | No | Yes |
Windmill also orchestrates Snowflake, BigQuery and other warehouses. You can mix local DuckDB steps with remote warehouse queries in the same pipeline.
Production-grade performance that replaces Spark
Polars and DuckDB process data on a single node far faster than distributed frameworks for the vast majority of ETL workloads.
TPC-H benchmark, 8 queries on m4.xlarge (8 vCPUs, 32 GB RAM)
More you can build on Windmill
Data pipelines are just one use case. The same platform powers internal tools, AI agents, workflow automations and scheduled tasks.

Build production-grade internal tools with backend scripts, data tables and React, Vue or Svelte frontends.

Build AI agents with tool-calling, DAG orchestration, sandboxes and direct access to your scripts and resources.

Run scripts on cron schedules, webhooks or custom triggers with retries and error handlers built in.
Frequently asked questions
Start building data pipelines today
Get started for free on Windmill Cloud or self-host the open-source version.
