Results: Python
Long-Running Tasks (10 tasks)
At a macro level, the total time to complete the long-running task flow varied widely across orchestrators. Airflow was by far the slowest, taking 54.668 seconds, followed by Kestra at 15.786s, Prefect at 15.489s, and Temporal at 7.247s. Hatchet and Windmill were the fastest in this scenario, with total durations of 7.793s and 8.347s respectively. When Windmill was run in dedicated worker mode, it edged ahead slightly with a total of 7.701s.
Execution time—defined as the period during which tasks were actively being processed by workers—dominated the total runtime for all engines, especially for Hatchet (96.21%), Temporal (96.56%), and Windmill (93.83%). This is expected given the computationally intensive nature of fibo(33). The higher execution ratios suggest these engines introduce minimal orchestration overhead and keep workers consistently busy.
Airflow and Prefect, by contrast, spent significantly more time on assignment, consuming 40.35% and 9.77% of total runtime, respectively. This indicates slower dispatch of the initial tasks, especially noticeable before the parallelism benefits take effect. Despite this, Prefect still maintained decent performance compared to Airflow, though both trail behind more modern orchestrators.
Windmill in dedicated mode exhibited slightly higher assignment time (4.80%) than its normal mode (5.13%), suggesting a shift of overhead from task execution to scheduling. Nonetheless, Windmill's transition time—the delay between finishing one task and initiating the next—was remarkably low at 1.04%, demonstrating highly efficient task chaining.
Overall, the engines that most closely aligned with Windmill's dedicated worker architecture—namely Hatchet and Temporal—showed similar performance characteristics: tight scheduling, consistent task execution throughput, and minimal orchestration noise.
Airflow | Kestra | Prefect | Hatchet | Temporal | Windmill | Windmill Dedicated | |
---|---|---|---|---|---|---|---|
Total duration (in seconds) | 54.668 | 15.786 | 15.489 | 7.793 | 7.247 | 8.347 | 7.701 |
Assignment | 22.058 (40.35%) | 1.365 (8.65%) | 1.513 (9.77%) | 0.123 (1.58%) | 0.099 (1.37%) | 0.428 (5.13%) | 0.370 (4.80%) |
Execution | 28.272 (51.72%) | 14.029 (88.87%) | 13.658 (88.18%) | 7.498 (96.21%) | 6.998 (96.56%) | 7.832 (93.83%) | 7.205 (93.56%) |
Transition | 4.338 (7.94%) | 0.392 (2.48%) | 0.318 (2.05%) | 0.172 (2.21%) | 0.150 (2.07%) | 0.087 (1.04%) | 0.126 (1.64%) |
Lightweight Tasks (40 tasks)
We can exclude Airflow from the previous chart as it was performing much slower than the other orchestrators and focus on the other orchestrators:
The lightweight task scenario produced a far starker contrast in performance. As expected, Airflow underperformed dramatically, taking 116.221 seconds to complete the 40-task flow. The next slowest, Kestra, completed in 6.044s, while Prefect followed at 4.872s. Temporal, Windmill, and Hatchet all performed significantly better, with durations of 2.967s, 4.383s, and 1.222s respectively. Hatchet delivered the fastest performance, completing the flow in just 1.222 seconds, followed by Windmill in dedicated mode at 2.092 seconds, and Temporal at 2.967 seconds.
In lightweight scenarios, where each task executes in around 10ms, orchestration overhead becomes the dominant factor. Execution accounted for only a small portion of total time—just 11.19% for Temporal, 8.18% for Hatchet, and a mere 5.83% for Windmill in dedicated mode. The implication is that most of the runtime is now spent coordinating tasks, rather than executing them.
Windmill, in normal mode, spent more time on task execution (50.54%) compared to other engines. This is due to the way Windmill handles task startup—using isolated, "cold-started" task containers. As a result, each task includes some initialization cost, making Windmill slightly slower than Hatchet and Temporal in this lightweight test. However, this changes when Windmill is run in dedicated worker mode. In this configuration, startup overhead is minimized, and orchestration becomes more efficient. Execution time drops to 5.83%, and assignment jumps to 85.80%, resembling the tight loop style seen in Temporal and Hatchet.
Transition times in this scenario further highlight orchestration differences. Windmill again stood out with one of the lowest transition delays—only 7.57% in standard mode and 8.37% in dedicated. Temporal’s transition overhead was higher (53.15%), which may point to internal mechanisms such as durable state recording. Hatchet also showed a relatively high transition cost (54.91%), which is interesting given its otherwise strong performance.
Airflow | Kestra | Prefect | Hatchet | Temporal | Windmill | Windmill Dedicated | |
---|---|---|---|---|---|---|---|
Total duration (in seconds) | 116.221 | 6.044 | 4.872 | 1.222 | 2.967 | 4.383 | 2.092 |
Assignment | 75.112 (64.63%) | 2.679 (44.32%) | 2.174 (44.62%) | 0.451 (36.91%) | 1.058 (35.66%) | 1.836 (41.89%) | 1.795 (85.80%) |
Execution | 12.516 (10.77%) | 1.499 (24.80%) | 1.546 (31.73%) | 0.100 (8.18%) | 0.332 (11.19%) | 2.215 (50.54%) | 0.122 (5.83%) |
Transition | 28.593 (24.60%) | 1.866 (30.87%) | 1.152 (23.65%) | 0.671 (54.91%) | 1.577 (53.15%) | 0.332 (7.57%) | 0.175 (8.37%) |
These results confirm that Windmill-dedicated, Hatchet, and Temporal are the top performers in lightweight task orchestration, where internal engine latency dominates. Windmill, despite its cold-start architecture in normal mode, holds up well and excels in transition responsiveness. Prefect and Kestra show adequate performance in both cases but are less consistent under varying load. Airflow, though functional, is considerably slower in both test scenarios and appears less suitable for modern, latency-sensitive workflows.