By Mohammad Khorasani, Data Scientist/Engineer Hybrid

Photo by Isaac Smith on Unsplash

Microsoft Excel and Word are without a shred of doubt the two most abundantly used software in the corporate and non-corporate world. They are practically synonymous with the term ‘work’ itself. Oftentimes, not a week goes by without us firing up the combination of the two and one way or another putting their goodness to use. While for the average daily purpose automation would not be solicited, there are times when automation can be a necessity. Namely, when you have a multitude of charts, figures, tables, and reports to generate, it can become an exceedingly tedious undertaking if you choose the manual route. Well, it doesn’t have to be that way. There is in fact a way to create a pipeline in Python where you can seamlessly integrate the two to produce spreadsheets in Excel and then transfer the results to Word to generate a report virtually instantaneously.


Meet Openpyxl, arguably one of the most versatile bindings in Python that makes interfacing with Excel quite literally a stroll in the park. Armed with it you can read and write all current and legacy excel formats i.e. xlsx and xls. Openpyxl allows you to populate rows and columns, execute formulae, create 2D and 3D charts, label axes and titles, and a plethora of other abilities that can come in handy. Most importantly however, this package enables you to iterate over an endless numbers of rows and columns in Excel, thereby saving you from all that pesky number crunching and plotting that you had to do previously.


And then comes along Python-docx, this package is to Word what Openpyxl is to Excel. If you haven’t already studied their documentation, then you should probably take a look. Python-docx is without exaggeration one of the simplest and most self-explanatory toolkits I have worked with ever since I started working with Python itself. It allows you to automate document generation by inserting text, filling in tables and rendering images into your report automatically without any overhead whatsoever.

Without further ado let’s create our very own automated pipeline. Go ahead and fire up Anaconda (or any other IDE of your choice) and install the following packages:

pip install openpyxlpip install python-docx

Microsoft Excel Automation

Initially, we’ll load an Excel workbook that has already been created (shown below):

workbook =...

Continue reading: