Results: Go
To evaluate orchestrator performance in Go, we tested Hatchet, Temporal, and Windmill using a consistent set of benchmark scenarios: 400 lightweight tasks and 100 long-running tasks, both in single-worker and 10-worker configurations.
Go provides an ideal testbed for orchestration engine performance due to its low startup overhead and tight runtime, meaning most latency observed can be attributed to the orchestration engine itself.
The other orchestration engines either didn't support the Go language natively (e.g Kestra only via Docker) or not at all.
Single worker
Long-Running Tasks (10 tasks)
In the single-worker configuration for long-running tasks, total flow durations were: Hatchet: 30.715s, Temporal: 28.313s and Windmill: 27.648s.
As expected with heavier workloads, execution time dominated the run: Windmill: 93.82%, Temporal: 91.81% and Hatchet: 88.69%
This suggests that all three engines handle Go workloads efficiently once tasks are assigned. Assignment times were lowest for Windmill (2.46%) and Temporal (2.60%), with Hatchet slightly higher at 4.95%. Transition times were also well managed, with Windmill again leading at just 3.73% of total time.
The results confirm that in long-running tasks, all three engines effectively minimize orchestration overhead in Go, with Windmill slightly outperforming the others in total time and transition latency.
Hatchet | Temporal | Windmill | |
---|---|---|---|
Total duration (in seconds) | 30.715 | 28.313 | 27.648 |
Assignment | 1.519 (4.95%) | 0.735 (2.60%) | 0.679 (2.46%) |
Execution | 27.242 (88.69%) | 25.995 (91.81%) | 25.938 (93.82%) |
Transition | 1.954 (6.36%) | 1.583 (5.59%) | 1.031 (3.73%) |
Lightweight Tasks (400 tasks)
For lightweight tasks, engine overhead becomes critical. The results show: Hatchet: 35.845s, Temporal: 39.016s and Windmill: 19.702s.
Here, the execution of fibo(10) is negligible (~10ms), so orchestration mechanics dominate performance. Windmill stands out with less than half the total time of either Temporal or Hatchet.
Breaking it down: Assignment: Windmill: 18.53%, Temporal: 41.24% and Hatchet: 68.87%. Transition: Windmill: 67.17%, Temporal: 52.62% and Hatchet: 26.75%.
The high transition percentage for Windmill reflects its very fast execution and relatively even task distribution. Hatchet and Temporal, in contrast, spend more time assigning tasks and maintaining queues, which stretches the total duration.
Overall, Windmill clearly leads in lightweight throughput in Go with a single worker, suggesting minimal orchestration latency and efficient task sequencing.
Hatchet | Temporal | Windmill | |
---|---|---|---|
Total duration (in seconds) | 35.845 | 39.016 | 19.702 |
Assignment | 24.688 (68.87%) | 16.090 (41.24%) | 3.651 (18.53%) |
Execution | 1.570 (4.38%) | 2.397 (6.14%) | 2.817 (14.30%) |
Transition | 9.587 (26.75%) | 20.529 (52.62%) | 13.234 (67.17%) |
Multi-worker
10 workers: 100 long running tasks
All three engines performed well under this heavier workload, with durations as follows: Temporal: 11.152s, Windmill: 11.899s and Hatchet: 17.753s.
Temporal had slightly better worker utilization (94.86%) compared to Windmill (92.19%), with Windmill showing slightly higher avg wait time (5.478s vs. 5.012s). Execution time per worker was nearly identical across engines (roughly 11s), suggesting that the core compute workload was equally distributed.
Windmill's advantage, however, lies in its fast scheduling (0.024s) and low transition costs, which helped maintain competitive performance despite a slightly higher wait time.
Hatchet showed lower utilization (63.05%) and higher idle times, indicating room for improvement in orchestrating long-running parallel workloads in Go. Note that for multi-worker scenarios, we tried both using RunBulkNoWait
that splits up all individual fibonacci tasks as separate workflows and setting PARALLEL=true
that runs all fibonacci tasks in parallel in a single workflow with parrallel tasks. We did not see significant difference in the results.
Multi-Worker Comparison (10 workers)
Metric per | Hatchet | Temporal | Windmill |
---|---|---|---|
Total Duration (s) | 17.753 | 11.152 | 11.899 |
First Task Scheduled (s) | 0.000 | 0.041 | 0.024 |
First Task Started (s) | 0.170 | 0.063 | 0.377 |
Worker Load Distribution
Metric per worker | Statistic | Hatchet | Temporal | Windmill |
---|---|---|---|---|
Tasks per Worker | Average (StdDev) | 10.0 (1.26) | 10.0 (0.45) | 10.0 (0.45) |
Min / Max | 8 / 13 | 9 / 11 | 9 / 11 | |
Active Time (s) | Average (StdDev) | 11.193 (0.640) | 10.578 (0.207) | 10.970 (0.191) |
Min / Max | 9.675 / 12.020 | 10.270 / 10.883 | 10.597 / 11.249 | |
Idle Time (s) | Average (StdDev) | 6.560 (0.640) | 0.574 (0.207) | 0.929 (0.191) |
Min / Max | 5.733 / 8.078 | 0.269 / 0.882 | 0.650 / 1.302 | |
Avg Wait Time (s) | Average (StdDev) | 8.322 (0.372) | 5.012 (0.239) | 5.478 (0.171) |
Min / Max | 7.679 / 8.780 | 4.598 / 5.287 | 5.105 / 5.696 | |
Avg Task Duration (s) | Average (StdDev) | 1.137 (0.154) | 1.059 (0.040) | 1.099 (0.050) |
Min / Max | 0.865 / 1.370 | 0.979 / 1.141 | 1.008 / 1.215 | |
Worker Utilization (%) | Average (StdDev) | 63.05 (3.60) | 94.86 (1.86) | 92.19 (1.61) |
Min / Max | 54.50 / 67.71 | 92.09 / 97.59 | 89.06 / 94.54 |
10 workers: 400 lightweight tasks
In the multi-worker configuration for lightweight tasks, Temporal was the fastest, completing all 400 tasks in just 4.270 seconds. Windmill followed at 7.224 seconds, while Hatchet significantly lagged behind with a total duration of 37.809 seconds.
At first glance, Windmill appears notably slower than Temporal, but a deeper look into worker load distribution helps explain the trade-offs in orchestration design.
- Worker Utilization: Temporal: 32.41%, Windmill: 11.75%, Hatchet: 3.16%
- Avg Wait Time: Windmill: 4.235s, Temporal: 2.115s, Hatchet: 19.187s
Despite Hatchet having perfect task distribution (exactly 40 per worker), the wait time and idle time were extremely high, indicating tasks were not being effectively overlapped or queued. Its average task duration was very low (0.030s), but these were not leveraged due to orchestration bottlenecks. Note that for multi-worker scenarios, we tried both using RunBulkNoWait
that splits up all individual fibonacci tasks as separate workflows and setting PARALLEL=true
that runs all fibonacci tasks in parallel in a single workflow with parrallel tasks. We did not see significant difference in the results.
Temporal and Windmill both showed strong parallel execution, but with some differences: Windmill started scheduling tasks faster (first task scheduled at 0.025s), but took longer to start executing the first task (1.663s). This is due to the fact that Windmill, when executing flow iterations in parallel, will create a sub-flow for each iteration containing the fibonacci task. This creates a slight overhead in orchestration compared to the other engines. Temporal, by contrast, started execution almost immediately at 0.108s and maintained lower wait and idle times.
Multi-Worker Comparison (10 workers)
Metric per | Hatchet | Temporal | Windmill |
---|---|---|---|
Total Duration (s) | 37.809 | 4.270 | 7.224 |
First Task Scheduled (s) | 0.000 | 0.049 | 0.025 |
First Task Started (s) | 0.723 | 0.108 | 1.663 |
Worker Load Distribution
Metric per worker | Statistic | Hatchet | Temporal | Windmill |
---|---|---|---|---|
Tasks per Worker | Average (StdDev) | 40.0 (0.00) | 40.0 (1.95) | 40.0 (3.13) |
Min / Max | 40 / 40 | 37 / 43 | 36 / 45 | |
Active Time (s) | Average (StdDev) | 1.193 (0.178) | 1.384 (0.069) | 0.849 (0.064) |
Min / Max | 0.877 / 1.501 | 1.228 / 1.511 | 0.764 / 0.952 | |
Idle Time (s) | Average (StdDev) | 36.616 (0.178) | 2.886 (0.069) | 6.375 (0.064) |
Min / Max | 36.308 / 36.932 | 2.759 / 3.042 | 6.272 / 6.460 | |
Avg Wait Time (s) | Average (StdDev) | 19.187 (0.014) | 2.115 (0.077) | 4.235 (0.130) |
Min / Max | 19.165 / 19.207 | 1.967 / 2.217 | 3.949 / 4.475 | |
Avg Task Duration (s) | Average (StdDev) | 0.030 (0.004) | 0.035 (0.001) | 0.021 (0.001) |
Min / Max | 0.022 / 0.038 | 0.032 / 0.036 | 0.019 / 0.023 | |
Worker Utilization (%) | Average (StdDev) | 3.16 (0.47) | 32.41 (1.61) | 11.75 (0.89) |
Min / Max | 2.32 / 3.97 | 28.76 / 35.39 | 10.58 / 13.18 |