Chapter 10: Working with Multiple Files - Consolidating Reports
Welcome to the final chapter of this guide! You have learned to read, write, format, and visualize data in Excel. You've also learned how to write robust, professional code. Now, we will combine these skills to solve a very common business problem: consolidating data from multiple files into one master file.
Imagine you receive a separate sales report every month. Your task is to combine them all into a single annual report. Doing this manually is tedious and prone to errors. With Python, you can automate the entire process.
The Goal
We will write a script that finds all monthly sales Excel files in a directory, extracts the data from each, and consolidates it into a single master workbook.
A New Tool for Finding Files: The glob Module
To find all the files that match a certain pattern (e.g., all files starting with "sales_"), we will use Python's built-in glob module. It's perfect for finding files without needing to know all their exact names in advance.
The Setup: Creating Sample Files
First, we need some sample monthly reports to work with. We'll start with a small Python script to generate three separate, simplified sales files: sales_jan_2023.xlsx, sales_feb_2023.xlsx, and sales_mar_2023.xlsx.
Create a script named generate_sample_files.py:
from openpyxl import Workbook
# Data for each month
sales_data = {
"sales_jan_2023.xlsx": [
["Product", "Revenue"],
["Laptop", 18000],
["Mouse", 3750]
],
"sales_feb_2023.xlsx": [
["Product", "Revenue"],
["Keyboard", 6000],
["Monitor", 13500],
["Webcam", 5000]
],
"sales_mar_2023.xlsx": [
["Product", "Revenue"],
["Desk Chair", 7500],
["Laptop", 24000],
["Pen Holder", 2000]
]
}
# Create the files
for filename, data in sales_data.items():
wb = Workbook()
ws = wb.active
for row in data:
ws.append(row)
wb.save(filename)
print(f"Created {filename}")
Run this script. It will create the three Excel files we need for our consolidation script.
The Consolidation Script: consolidate_reports.py
Now for the main event. This script will find our monthly sales files and merge their data. Create a new file named consolidate_reports.py.
Run the script. It will find the three monthly files, copy their data into a new file called annual_sales_report_2023.xlsx, and intelligently handle the header row so it isn't repeated.
What You've Accomplished and Next Steps
You have successfully automated a complex, real-world data processing task. You are no longer limited to working with single files; you can now write scripts that operate on entire directories of them.
Congratulations! You've gone from the basics of Python to building robust, practical Excel automation solutions. The skills you've learned in this book are the foundation for tackling even bigger and more complex challenges. The world of automation is now at your fingertips.
import glob
from openpyxl import load_workbook, Workbook
# --- Main part of the script ---
# Create a new workbook for the consolidated data
master_wb = Workbook()
master_ws = master_wb.active
master_ws.title = "Consolidated Annual Report"
# Find all sales files from 2023 using glob
# The asterisk (*) is a wildcard that matches any characters
sales_files = glob.glob("sales_*_2023.xlsx")
print(f"Found files: {sales_files}")
# A flag to check if we've added the header yet
is_header_added = False
# Loop through each sales file
for filename in sales_files:
# Open the source workbook
wb = load_workbook(filename)
ws = wb.active
# Loop through all rows in the source sheet
for row_index, row in enumerate(ws.iter_rows(values_only=True)):
# The header is in the first row (index 0)
if row_index == 0:
# Only add the header once from the first file
if not is_header_added:
master_ws.append(row)
is_header_added = True
else:
# Append all other data rows
master_ws.append(row)
print(f"Processed {filename}")
# Save the master report
output_filename = "annual_sales_report_2023.xlsx"
master_wb.save(output_filename)
print(f"\nConsolidation complete! Master report '{output_filename}' is ready.")