Skip to main content

Microsoft Excel integration

Windmill doesn't have a direct API integration with Microsoft Excel, but it provides powerful ways to work with Excel files through file uploads and processing. You can handle Excel files (.xlsx, .xls) using two main approaches: base64 encoded strings for smaller files or S3 object storage for larger files and better performance.

Excel files in Windmill can be:

Base64 encoded strings

For smaller Excel files (< 10MB), you can use base64 encoded strings. This is the simplest approach for basic file processing.

import pandas as pd
import io

def main(excel: bytes, sheet_name: str = None):
"""
Read specific sheet from Excel file with options.

Args:
excel: The Excel file as bytes
sheet_name: Optional sheet name (defaults to first sheet)

Returns:
Processed Excel data
"""
excel_buffer = io.BytesIO(excel)

# Read specific sheet or first sheet
df = pd.read_excel(excel_buffer, sheet_name=sheet_name or 0)

# Example processing: get summary statistics
summary = {
"rows": len(df),
"columns": len(df.columns),
"column_names": df.columns.tolist(),
"data_preview": df.head(10).to_dict(orient='records'),
"numeric_summary": df.describe().to_dict() if df.select_dtypes(include='number').shape[1] > 0 else None
}

return summary

More details on how to use base64 encoded strings in scripts and flows can be found in the Handling files and binary data section:

For better performance and larger files, use Windmill's workspace object storage integration.

import pandas as pd
import wmill
from wmill import S3Object

def main(excel_file: S3Object, sheet_name: str = None):
"""
Process Excel file from S3 storage.

Args:
excel_file: S3Object pointing to the Excel file
sheet_name: Optional sheet name

Returns:
Processed data and summary
"""
# Load file from S3
file_content = wmill.load_s3_file(excel_file)

# Read Excel from bytes
df = pd.read_excel(file_content, sheet_name=sheet_name or 0)

# Process data
processed_data = {
"shape": df.shape,
"columns": df.columns.tolist(),
"dtypes": df.dtypes.to_dict(),
"sample_data": df.head(5).to_dict(orient='records'),
"missing_values": df.isnull().sum().to_dict(),
"numeric_stats": df.describe().to_dict() if len(df.select_dtypes(include='number').columns) > 0 else None
}

return processed_data

More details on how to use S3 object storage in scripts and flows can be found in the Object storage in Windmill (S3) section: