Reading PDF Data from Excel or Power BI with Power Query

We work with PDF files in day-to-day tasks, and we are in constant need of reading or processing the data in those PDFs (Bank Statements, Invoices, Exported data available in PDF format only, etc). We usually think of paid software or OCR systems to help us on this task, but did you know that both Power BI and MS Excel have an option to get data or tables from PDFs? The results aren’t always perfect but with a couple of transformation steps the output can be really useful!

For example, I receive monthly statements from by Bank with details of charges on my credit card, and I spend some time reviewing it, wouldn’t it be great if I could both review it and analyze my historical transactions in Excel or Power BI?

Step by step guide how to read PDF data with Power Query.

Note: Steps are very similar between Power BI and MS Excel.

Go to data tab, then Get Data > From File > From PDF.

Step1: Browse to the PDF file you want to read, select it, and click Import.

Step2: In the next screen you will both the pages and the tables identified by Power Query, you can select one or multiple items, depending on the analysis you’re building.

Step3: Select the item and in the right, you’ll see a preview of the data recognized for that table. When you find the table you’re interested in, just select it and click on “transform data”

This will take you to the query editor, you will notice some columns are showing split into two, in below case, Power Query recognized the transaction description in two columns instead of 1. So we need to aply a quick transformation step

Step4: Perform a “merge column” operations to data put into one single column, by selecting both columns, then right click, and finally clicking on “Merge Columns”.

Step5: Remove any nulls from the Date column to clean up some rows

Step6: click “Close and Load”

Wait a couple of seconds until the data has loaded to the Sheet.

You just read data from a PDF! Very easy and convenient, isn’t it!

Benefits of Reading PDF data from excel with Power Query

  • It’s for free: the functionality is available in all the recent versions of Excel at no additional cost, and no additional software.
  • You can read several pages at the same time by expanding all pages in the navigation.
  • You can apply calculations if needed to any column or data point present in the file.
  • You can transform the text into different final formats using all the Power Query features.
  • It’s very simple: just a couple of clicks and transformations, and you’re done!

 

Leave a Comment

Your email address will not be published. Required fields are marked *