Persistent storage refers to any method of storing data that remains intact and accessible even after a system is powered off, restarted, or experiences a crash.
In the context of Windmill, the stakes are: where to effectively store and manage the data manipulated by Windmill (ETL, data ingestion and preprocessing, data migration and sync etc.) ?
When it comes to storing data manipulated by Windmil, it is recommended to only store Windmill-specific elements (resources, variables etc.). To store data, it is recommended to use external storage service providers that can be accessed from Windmill.
This present document gives a list of trusted services to use alongside Windmill.
There are 4 kinds of persistent storage in Windmill:
Small data that is relevant in between script/flow execution and can be persisted on Windmill itself.
Big structured SQL data that is critical to your services and that is stored externally on an SQL Database or Data Warehouse.
Object storage for large data such as S3.
NoSQL and document database such as MongoDB and Key-Value stores.
You already have your own database
If you already have your own database provided by a supported integration, you can easily connect it to Windmill.
Within Windmill: not recommended
Windmill is not designed to store heavy data that extends beyond the execution of a script or flow. Indeed, for each computation the worker executing is not the same as the previous computation, so the data would have to be retrieved from another location.
Instead, Windmill is very convenient to use alongside data storage providers to manipulate big amounts of data.
There are however internal methods to persist data between executions of jobs.
States and Resources
Within Windmill, you can use States and Resources as a way to store a transient state - that can be represented as small JSON.
In Windmill, States are considered as resources, but they are excluded from the Workspace tab for clarity. They are displayed on the Resources menu, under a dedicated tab.
States are used by scripts to keep data persistent between runs of the same script by the same trigger (schedule or user).
A state is an object stored as a resource of the resource type
state which is meant to persist across distinct executions of the same script.
This is what enables Flows to watch for changes in most event watching scenarios. The pattern is as follows:
- Retrieve the last state or, if undefined, assume it is the first execution.
- Retrieve the current state in the external system you are watching, e.g. the list of users having starred your repo or the maximum ID of posts on Hacker News.
- Calculate the difference between the current state and the last internal state. This difference is what you will want to act upon.
- Set the new state as the current state so that you do not process the elements you just processed.
- Return the differences calculated previously so that you can process them in the next steps. You will likely want to forloop over the items and trigger one Flow per item. This is exactly the pattern used when your Flow is in the mode of "Watching changes regularly".
The convenience functions do this are:
getState()which retrieves an object of any type (internally a simple Resource) at a path determined by
getStatePath, which is unique to the user currently executing the Script, the Flow in which it is currently getting called in - if any - and the path of the Script.
setState(value: any)which sets the new state.
Please note it requires importing the wmill client library from Deno/Bun.
get_state()which retrieves an object of any type (internally a simple Resource) at a path determined by
get_state_path, which is unique to the user currently executing the Script, the Flow in which it is currently getting called in - if any - and the path of the Script.
set_state(value: Any)which sets the new state.
Please note it requires importing the wmill client library from Python.
States are a specific type of resource in Windmill where the type is
state the path is automatically calculated for you based on the schedule path (if any) and the script path. In some cases, you want to set the path arbitrarily and/or use a different type than
state. In this case, you can use the
getResource functions. A same resource can be used across different scripts and flows.
setResource(value: any, path?: string, initializeToTypeIfNotExist?: string): which sets a resource at a given path. This is equivalent to
setStatebut allows you to set an arbitrary path and chose a type other than state if wanted. See api.
getResource(path: string): gets a resource at a given path. See api.
Variables are similar to resources but have no types, can be tagged as
secret (in which case they are encrypted by the workspace key) and can only store strings. In some situations, you may prefer
getVariable to resources.
setResource are convenient ways to persist json between multiple script executions.
For heavier ETL processes or sharing data between steps in a flow, Windmill provides a Shared Directory feature.
The Shared Directory allows steps within a flow to share data by storing it in a designated folder.
Although Shared Folders are recommended for persisting states within a flow, it's important to note that all steps are executed on the same worker and the data stored in the Shared Directory is strictly ephemeral to the flow execution.
To enable the Shared Directory, follow these steps:
- Open the
Settingsmenu in the Windmill interface.
- Go to the
- Toggle on the option for
Shared Directory on './shared'.
Once the Shared Directory is enabled, you can use it in your flow by referencing the
./shared folder. This folder is shared among the steps in the flow, allowing you to store and access data between them.
Structured Databases: Postgres (Supabase, Neon.tech)
For Postgres databases (best for structured data storage and retrieval, where you can define schema and relationships between entities), we recommend using Supabase or Neon.tech.
Supabase is an open-source alternative to Firebase, providing a backend-as-a-service platform that offers a suite of tools, including real-time subscriptions, authentication, storage, and a PostgreSQL-based database.
Get a Connection string.
- Go to the
- Find your Connection Info and Connection String. Direct connections are on port 5432.
- Go to the
You can also integrate Supabase directly through its API.
You can find examples and premade Supabase scripts on Windmill Hub.
More tutorials on Supabase:
Neon.tech is an open-source cloud database platform that provides fully managed PostgreSQL databases with high availability and scalability.
Get a Connection string. You can obtain it connection string from the Connection Details widget on the Neon Dashboard: select a branch, a role, and the database you want to connect to and a connection string will be constructed for you.
Adding the connection string as a Postgres resource requires to parse it.
For example, for
psql postgres://daniel:<password>@ep-restless-rice.us-east-2.aws.neon.tech/neondb, that would be:
Where the sslmode should be "require" and Neon uses the default PostgreSQL port,
Large Data Files: S3, R2, MinIO
On heavier data objects & unstructured data storage, Amazon S3 (Simple Storage Service) and its alternatives Cloudflare R2 and MinIO are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.
Amazon S3, Cloudflare R2 and MinIO all follow the same API schema and therefore have a common Windmill resource type.
Amazon S3 (Simple Storage Service) is a scalable and durable object storage service offered by Amazon Web Services (AWS), designed to provide developers and businesses with an effective way to store and retrieve any amount of data from anywhere on the web.
Create a bucket on S3.
Make sure the user associated with the resource has the right policies allowed in AWS Identity and Access Management (IAM).
You can find examples and premade S3 scripts on Windmill Hub.
Cloudflare R2 is a cloud-based storage service that provides developers and businesses with a cost-effective and secure way to store and access their data.
Create a bucket on R2.
For best performance, install MinIO locally.
MinIO is an open-source, high-performance, and scalable object storage server that is compatible with Amazon S3 APIs, designed for building private and public cloud storage solutions.
Then from Windmill, just fill the S3 resource type.
Key-Value Stores: MongoDB Atlas, Redis, Upstash
Key-value stores are a popular choice for managing non-structured data, providing a flexible and scalable solution for various data types and use cases. In the context of Windmill, you can use MongoDB Atlas, Redis, and Upstash to store and manipulate non-structured data effectively.
MongoDB Atlas is a managed database-as-a-service platform that provides an efficient way to deploy, manage, and optimize MongoDB instances. As a document-oriented NoSQL database, MongoDB is well-suited for handling large volumes of unstructured data. Its dynamic schema enables the storage and retrieval of JSON-like documents with diverse structures, making it a suitable option for managing non-structured data.
To use MongoDB Atlas with Windmill:
You can find examples and premade MonggoDB scripts on Windmill Hub.
Redis is an open-source, in-memory key-value store that can be used for caching, message brokering, and real-time analytics. It supports a variety of data structures such as strings, lists, sets, and hashes, providing flexibility for non-structured data storage and management. Redis is known for its high performance and low-latency data access, making it a suitable choice for applications requiring fast data retrieval and processing.
To use Redis with Windmill:
Upstash is a serverless, edge-optimized key-value store designed for low-latency access to non-structured data. It is built on top of Redis, offering similar performance benefits and data structure support while adding serverless capabilities, making it easy to scale your data storage needs.
To use Upstash with Windmill: