Windmill on AWS ECS
Windmill can be deployed on an ECS (Elastic Container Service) cluster. Below are the detailed steps to get a Windmill stack up and running. The number of servers and workers, as well as the instance sizes, should be tuned to your own usecases.
To give a brief overview, we will first create a VPC and an associated security group. This VPC will contain the Database, the instances powering the ECS cluster, and the Windmill containers. Then we will create a database on RDS, and finally the ECS cluster. Once the ECS cluster is running, we will define the tasks definitions and the associated services. Note again that the architecture of your Windmill stack very well depends on how you're planning to use Windmill. For this tutorial, we will create a Windmill stack of this shape: 2 Windmill servers, 2 "multi-purpose" Windmill workers and 1 "native" worker. Windmill LSP and Multiplayer will also be deployed.
Familiar with Terraform? The terraform files are available here in Windmill's GitHub repo to deploy this Windmill Stack on ECS with just a few commands!
Create a VPC and a security group
- Go to AWS VPC and create a new one.
- Name:
windmill-vpc
- Choose a CIDR block of your choice. We're going to use
10.0.0.0/16
here. Any value will work, just make sure you'll have enough IPs in the CIDR - IPv6 is not required
- We recommend using at least 2 availability zones, with 2 public and 2 private subnets
- NAT is required for ECS containers to have access to internet. We recommend creating one in each AZ
- Enable both DNS options
- Name:
- You will need a security group linked to this VPC. Go to AWS Security Group menu and create a new one
- Name:
windmill-sg
- Link it to the VPC created above
- Add the following inbound rule:
All traffic FROM this security group
- If you want to SSH into the servers, you might want to add the inbound rule:
SSH from MyIP/Anywhere
. This is not required but might help with debugging - Add a rule
HTTP traffic FROM anywhere
to be able to hit Windmill Server from your workstation. You can refine security here by having 2 security groups with only one allowing HTTP traffic. And later you'll place only the server in this security group and all the other containers in the more restricted security group. For simplicity, we will just have one security group here. - The default outbound rule can be used:
All traffic TO 0.0.0.0/0
- Name:
Create a RDS database
- Go to AWS RDS
- Standard create
- Engine option:
- PostgreSQL
- Any recent version of PostgreSQL will work. At the time of writing we're using 16.1-R1
- Template
- Choose the template of your choice (Production VS Dev/Test) depending on your needs. This will significantly impact costs
- Availability and durability
- We recommend using a Multi-AZ DB instance for production use cases. Single DB instance can be a good fit for testing Windmill, as it is less expensive
- Settings
- DB instance identifier:
windmill-db
- Leave
postgres
as the username, and choose a strong master password. Keep it in a safe place, you will need it in the following steps
- DB instance identifier:
- Instance configuration
- We recommend using at least a
db.t4g.xlarge
ordb.m5d.xlarge
- The number of connection should be taken into account to determine the instance size. AWS has its own rule to set the default number of connections, depending on the instance memory. Windmill requires 50 connection per server plus 5 additional connection per worker. So, for example for the 2 servers plus 4 workers we're deploying here, we need at least 120 DB connections
- We recommend using at least a
- Storage. This is on a per-usecase basis. Just be aware that Windmill stores logs and job results in the database. Depending on the job retention period you want to later configure in Windmill, you might have different requirements
- A good starting point is 400Gib / 12000 IOPS
- It is recommended to turn on autoscaling.
- Connectivity
- Don't connect to an EC2 compute resource. Container of the ECS cluster will connect to it using its URL
- Attach it to the VPC created above
- The DB doesn't need public access
- Link it the security group created above
- RDS proxy can be a good option in certain cases. It is not required
- We advise to use a certificate authority
- The port can be left to the default:
5432
- Database authentication
- Windmill uses Password authentication
- Monitoring
- Choose whatever you prefer to monitor your database
- Additional configuration
- Initial database name should be set to
windmill
- It is advised to enable automated backups
- Encryption can be set depending on your requirement, same for log export and maintenance.
- Initial database name should be set to
Create the ECS cluster
As said in the introduction, the architecture of your stack depends of your needs. The only requires parts are one Windmill server at least one multi-purpose worker.
- Go to AWS ECS and Create a new cluster
- Cluster configuration
- Name:
windmill-cluster
and leave the namespace the same
- Name:
- Infrastructure
- Uncheck AWS Fargate and check Amazon EC2 instance
- Create a new Auto-scaling Group On-Demand
- Choose Linux as the operating system. We recommend using either default (x86_64) or ARM64 architecture. WARNING: Whichever you choose, make sure the EC2 instance type chose below is matching the OS architecture
- Choose an EC2 instance type matching the OS above. Here we're using the default Amazon Linux 2 OS, so we will choose a
t3.medium
- This Auto-scaling Group will host the 2 Windmill servers, the multi-purpose and native Windmill workers, LSP and multiplayer. The maximum capacity should be at least 6 hosts (5 for the entire load and 1 for rolling updates)
- Allowing SSH access is not required
- We recommend allocating at least 100GiB of volume size
- Network settings for EC2 instances
- Attach it to the VPC and security group created above
- The instance can be placed in the private subnets, we will access them through a load balancer
Create a Load Balancer and Target Groups
We're going to create 3 target groups, for the Windmill server, LSP and Multiplayer
- Go to Target Groups and create a new one
- Target type: IP addresses
- Target group name:
windmill-cluster-server-tg
- Protocol: HTTP / Port 8000
- Attach it to the VPC created above
- Protocol Version: HTTP1
- No need to add explicit IP targets right now. The ECS services will register themselves automatically
- Do the same for LSP
- Same steps as above but with name:
windmill-cluster-lsp-tg
and port:3001
- Same steps as above but with name:
- Do the same for Multiplayer
- Same steps as above but with name:
windmill-cluster-multip-tg
and port:3002
- Same steps as above but with name:
Now create a Load balancer:
The load balancer plays the role of a single entry door for all Windmill endpoints. If you want to expose Windmill over HTTPS, only the Load Balancer LISTENER created below should be changed to listen on protocol HTTPS
/ port 443
and then you can customize the server certificate you want to use. No other action is required on the services as they are placed in the private subnets and not exposed directly.
- Create a new Application Load Balancer
- Name:
windmill-cluster-alb
- It must be internet facing
- IP address type: IPv4
- Network mapping: select the VPC created above and map it to the 2 public subnets
- Security group: Select the security group created above
- Listener: Default listener on port 80 / Forward to the Target Group
windmill-cluster-server-tg
- Click on Create
- Name:
- Once the ALB is created, go to its page to add rules for LSP and Multiplayer
- Select the Listener
HTTP:80
and click on Manage rule > Add Rule
- Select the Listener
- Add a Route for LSP
- Name:
lsp
- Add a condition:
Path is /ws/*
- Click Next
- Select target group
windmill-cluster-lsp-tg
- Give it a priority of
10
- Click on Create
- Name:
- Add a group for Multiplayer
- Name:
multiplayer
- Add a condition:
Path is /ws_mp/*
- Click Next
- Select target group
windmill-cluster-multiplayer-tg
- Give it a priority of
20
- Click on Create
- Name:
Create the task definitions
We will create 6 tasks definitions here:
- REQUIRED - For Windmill Server
- REQUIRED - For multi-purpose Windmill workers
- OPTIONAL - For native Windmill workers
- OPTIONAL - For Windmill LSP
- OPTIONAL - For Windmill Multiplayer
- Server
- Multi-purpose worker
- Native worker
- LSP
- Multiplayer
Windmill server
- Name:
windmill-server
- Launch Type: AWS EC2 instances
- OS / Arch: Linux/x86_64 (to match the EC2 instance type and OS architecture)
- Network mode:
awsvpc
. This will virtually attach the containers to the VPC networks. The NAT of the VPC is required to give internet access to the containers. - Task Size: 1vCPU / 1.5GiB Memory (this is for a cluster of 6
t3.medium
. Adapt it to the hardware you provisioned) - Task Role: None
- No Task placement
- Container:
- name:
windmill-server
- image:
ghcr.io/windmill-labs/windmill:<LATEST_RELEASE>
(orghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>
for EE). The latest released version can be found here. To upgrade Windmill, all tasks definition will need to be updated. - Essential container: YES
- Port mapping: 8000 / TCP / http / HTTP
- Resource allocation: 1 CPU / 1.5 GiB memory
- Environment variable:
JSON_FMT=true
,MODE=server
andDATABASE_URL=postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable
. Replace the hostname and password with the ones from the RDS database your created above - Turn on log collection for easy debugging
- Add the following health check:
CMD-SHELL, curl -f http://localhost:8000/api/version || exit 1
- Interval:10s
- Timeout:5s
- Retries:5
- Leave the rest default
- name:
Task definition JSON
The following fields needs to be set manually: LATEST_RELEASE
, DB_PASSWORD
, DB_HOSTNAME
and ECS_TASK_EXECUTION_ROLE_ARN
.
{
"containerDefinitions": [
{
"name": "windmill-server",
"image": "ghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>",
"cpu": 1024,
"memory": 1536,
"portMappings": [
{
"name": "http",
"containerPort": 8000,
"hostPort": 8000,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [
{
"name": "JSON_FMT",
"value": "true"
},
{
"name": "DATABASE_URL",
"value": "postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable"
},
{
"name": "MODE",
"value": "server"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/windmill-server",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8000/api/version || exit 1"],
"interval": 10,
"timeout": 5,
"retries": 5
}
}
],
"family": "windmill-server-json",
"executionRoleArn": "<ECS_TASK_EXECUTION_ROLE_ARN>",
"networkMode": "awsvpc",
"volumes": [],
"placementConstraints": [],
"requiresCompatibilities": ["EC2"],
"cpu": "1024",
"memory": "1536",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Windmill multi-purpose worker
- Name:
windmill-worker
- Launch Type: AWS EC2 instances
- OS / Arch: Linux/x86_64 (to match the EC2 instance type and OS architecture)
- Network mode:
awsvpc
- Task Size: 2vCPU / 3.5GiB Memory (this is for a cluster of 6
t3.medium
. Adapt it to the hardware you provisioned) - Task Role: None
- No Task placement
- Container:
- name:
windmill-worker
- image:
ghcr.io/windmill-labs/windmill:<LATEST_RELEASE>
(orghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>
for EE) - Essential container: YES
- Port mapping: No port mapping for workers
- Resource allocation: 2 CPU / 3.5 GiB memory
- Environment variable:
JSON_FMT=true
,MODE=worker
,WORKER_GROUP=default
andDATABASE_URL=postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable
- Add a Bind volume named
worker_dependency_cache
mapped to/tmp/windmill/cache
- Turn on log collection for easy debugging
- This is it, leave the rest default
- name:
Task definition JSON
The following fields needs to be set manually: LATEST_RELEASE
, DB_PASSWORD
, DB_HOSTNAME
and ECS_TASK_EXECUTION_ROLE_ARN
.
{
"containerDefinitions": [
{
"name": "windmill-worker",
"image": "ghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>",
"cpu": 2048,
"memory": 3072,
"portMappings": [],
"essential": true,
"environment": [
{
"name": "DATABASE_URL",
"value": "postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable"
},
{
"name": "JSON_FMT",
"value": "true"
},
{
"name": "WORKER_GROUP",
"value": "default"
},
{
"name": "MODE",
"value": "worker"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/windmill-worker",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "windmill-worker",
"executionRoleArn": "<ECS_TASK_EXECUTION_ROLE_ARN>",
"networkMode": "awsvpc",
"volumes": [
{
"name": "worker_dependency_cache",
"host": {
"sourcePath": "/tmp/windmill/cache"
}
}
],
"placementConstraints": [],
"requiresCompatibilities": ["EC2"],
"cpu": "2048",
"memory": "3072",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Windmill native worker
- Name:
windmill-native-worker
- Launch Type: AWS EC2 instances
- OS / Arch: Linux/x86_64 (to match the EC2 instance type and OS architecture)
- Network mode:
awsvpc
- Task Size: 2vCPU / 3.5GiB Memory (this is for a cluster of 6
t3.medium
. Adapt it to the hardware you provisioned) - Task Role: None
- No Task placement
- Container:
- name:
windmill-worker
- image:
ghcr.io/windmill-labs/windmill:<LATEST_RELEASE>
(orghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>
for EE) - Essential container: YES
- Port mapping: no port mapping for workers
- Resource allocation: 2 CPU / 3.5 GiB memory
- Environment variable:
JSON_FMT=true
,MODE=worker
,WORKER_GROUP=native
andDATABASE_URL=postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable
- Add a Bind volume named
worker_dependency_cache
mapped to/tmp/windmill/cache
- Turn on log collection for easy debugging
- This is it, leave the rest default
- name:
Task definition JSON
The following fields needs to be set manually: LATEST_RELEASE
, DB_PASSWORD
, DB_HOSTNAME
and ECS_TASK_EXECUTION_ROLE_ARN
.
{
"containerDefinitions": [
{
"name": "windmill-native-worker",
"image": "ghcr.io/windmill-labs/windmill-ee:<LATEST_RELEASE>",
"cpu": 2048,
"memory": 3072,
"portMappings": [],
"essential": true,
"environment": [
{
"name": "DATABASE_URL",
"value": "postgres://postgres:<DB_PASSWORD>@<DB_HOSTNAME>:5432/windmill?sslmode=disable"
},
{
"name": "JSON_FMT",
"value": "true"
},
{
"name": "WORKER_GROUP",
"value": "native"
},
{
"name": "MODE",
"value": "worker"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/windmill-native-worker",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
},
"secretOptions": []
}
}
],
"family": "windmill-native-worker",
"executionRoleArn": "<ECS_TASK_EXECUTION_ROLE_ARN>",
"networkMode": "awsvpc",
"volumes": [
{
"name": "worker_dependency_cache",
"host": {
"sourcePath": "/tmp/windmill/cache"
}
}
],
"placementConstraints": [],
"requiresCompatibilities": ["EC2"],
"cpu": "2048",
"memory": "3072",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Windmill LSP
- Name:
windmill-lsp
- Launch Type: AWS EC2 instances
- OS / Arch: Linux/x86_64 (to match the EC2 instance type and OS architecture)
- Network mode:
awsvpc
- Task Size: 1vCPU / 1.5GiB Memory (this is for a cluster of 6
t3.medium
. Adapt it to the hardware you provisioned) - Task Role: None
- No Task placement
- Container:
- name:
windmill-lsp
- image:
ghcr.io/windmill-labs/windmill-lsp:latest
- Essential container: YES
- Port mapping: 3001 / TCP / http / HTTP
- Resource allocation: 1 CPU / 1.5 GiB memory
- Environment variable:
JSON_FMT=true
, - Add a Bind volume named
lsp_cache
mapped to/root/.cache
- Turn on log collection for easy debugging
- This is it, leave the rest default
- name:
Task definition JSON
The following field needs to be set manually: ECS_TASK_EXECUTION_ROLE_ARN
.
{
"containerDefinitions": [
{
"name": "windmill-lsp",
"image": "ghcr.io/windmill-labs/windmill-lsp:latest",
"cpu": 1024,
"memory": 1536,
"portMappings": [
{
"name": "http",
"containerPort": 3001,
"hostPort": 3001,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [
{
"name": "JSON_FMT",
"value": "true"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/windmill-lsp",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "windmill-lsp",
"executionRoleArn": "<ECS_TASK_EXECUTION_ROLE_ARN>",
"networkMode": "awsvpc",
"volumes": [
{
"name": "lsp_cache",
"host": {
"sourcePath": "/root/.cache"
}
}
],
"placementConstraints": [],
"requiresCompatibilities": ["EC2"],
"cpu": "1024",
"memory": "1536",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Windmill Multiplayer
- Name:
windmill-multiplayer
- Launch Type: AWS EC2 instances
- OS / Arch: Linux/x86_64 (to match the EC2 instance type and OS architecture)
- Network mode:
awsvpc
- Task Size: 1vCPU / 1.5GiB Memory (this is for a cluster of 6
t3.medium
. Adapt it to the hardware you provisioned) - Task Role: None
- No Task placement
- Container:
- name:
windmill-multiplayer
- image:
ghcr.io/windmill-labs/windmill-multiplayer:latest
- Essential container: YES
- Port mapping: 3002 / TCP / http / HTTP
- Resource allocation: 1 CPU / 1.5 GiB memory
- Environment variable:
JSON_FMT=true
- Turn on log collection for easy debugging
- This is it, leave the rest default
- name:
Task definition JSON
The following field needs to be set manually: ECS_TASK_EXECUTION_ROLE_ARN
.
{
"containerDefinitions": [
{
"name": "windmill-multiplayer",
"image": "ghcr.io/windmill-labs/windmill-multiplayer:latest",
"cpu": 1024,
"memory": 1536,
"portMappings": [
{
"name": "http",
"containerPort": 3002,
"hostPort": 3002,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [
{
"name": "JSON_FMT",
"value": "true"
}
],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/windmill-multiplayer",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "windmill-multiplayer",
"executionRoleArn": "<ECS_TASK_EXECUTION_ROLE_ARN>",
"networkMode": "awsvpc",
"volumes": [],
"placementConstraints": [],
"requiresCompatibilities": ["EC2"],
"cpu": "1024",
"memory": "1536",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Create the services
One for each task definition, we will now create 6 services. Obviously if you skipped some optional task definition above, the corresponding service shouldn't be created.
- Server
- Multi-purpose worker
- Native worker
- LSP
- Multiplayer
Windmill server
- Environment: leave everything default. It will be using the default capacity provider
- Application type: Service
- Task definition: select the latest revision of the
windmill-server
task - Service name:
windmill-server
- Service replica: 2 (to follow the architecture we presented above)
- Networking: Select the VPC created above, and place the services in the private subnets. Select the security group created above (or the one allowing traffic on port 80)
- Load balancer: Link it to the load balancer created above with the target group
windmill-cluster-server-tg
Multi-purpose Windmill worker
- Environment: leave everything default. It will be using the default capacity provider
- Application type: Service
- Task definition: select the latest revision of the
windmill-worker
task - Service name:
windmill-worker
- Service replica: 2 (to follow the architecture we presented above)
- Networking: Select the VPC created above, and place the services in the private subnets. No need for workers to be in the public subnets, as there's NAT in the VPC. Select the security group created above
- Load balancer: No load balancer, the container is not exposing any port
Native Windmill worker
- Environment: leave everything default. It will be using the default capacity provider
- Application type: Service
- Task definition: select the latest revision of the
windmill-native-worker
task - Service name:
windmill-native-worker
- Service replica: 1 (to follow the architecture we presented above)
- Networking: Select the VPC created above, and place the services in the private subnets. No need for workers to be in the public subnets, as there's NAT in the VPC. Select the security group created above
- Load balancer: No load balancer, the container is not exposing any port
Windmill LSP
- Environment: leave everything default. It will be using the default capacity provider
- Application type: Service
- Task definition: select the latest revision of the
windmill-lsp
task - Service name:
windmill-lsp
- Service replica: 1
- Networking: Select the VPC created above, and place the services in the private subnets. Select the security group created above
- Load balancer: Link it to the load balancer created above with the target group
windmill-cluster-lsp-tg
Windmill Multiplayer
- Environment: leave everything default. It will be using the default capacity provider
- Application type: Service
- Task definition: select the latest revision of the
windmill-multiplayer
task - Service name:
windmill-multiplayer
- Service replica: 1
- Networking: Select the VPC created above, and place the services in the private subnets. Select the security group created above
- Load balancer: Link it to the load balancer created above with the target group
windmill-cluster-multip-tg
Open Windmill
Go back to the windmill-server-lb
and copy its DNS. Open it in a new tab. You should see the Windmill Login interface. The default credentials are [email protected]
/ changeme
. Follow the instructions to go through the initial Windmill setup during which you will be invited to update your login email and password.