Azure Function to combine PDF files

We will create an Azure Function that will merge PDF documents stored in a container of our storage account in Azure, this function will receive the file names to be merged/combined in the parameters of the URL and they will be separated by commas.

Below is an example of the POST request that the code presented in this blog will be able to process.

Set up your project

  1. Create an azure function project and use the HTTP Trigger
  2. Make sure you install the below packages before getting started with coding

Create the Function code

We are ready to start writing code. We need two files:

ResultClass.cs – will return the file(s) merged as list

using System;
using System.Collections.Generic;

namespace FunctionApp1
{

    public class Result
    {

        public Result(IList newFiles)
        {
            this.files = newFiles;
        }

        public IList files { get; private set; }
    }
}

Function1.cs – Code that will receive the file names in the URL, grab them from the Storage account, merge them into one and return a download URL

using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Configuration;
using Microsoft.WindowsAzure.Storage.Blob;
using Newtonsoft.Json;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;

namespace FunctionApp1
{
    public class Function1
    {

        static Function1()
        {

            // This is required to avoid the "No data is available                         for encoding 1252" exception when saving the PdfDocument
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);

        }

        [FunctionName("Function1")]
        public async Task SplitUploadAsync(
            [HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)] HttpRequestMessage req,
            //container where files will be stored and accessed for retrieval. in this case, it's called temp-pdf
            [Blob("temp-pdf", Connection = "")] CloudBlobContainer outputContainer,
            ILogger log)
        {
            //get query parameters

            string uriq = req.RequestUri.ToString(); 
            string keyw = uriq.Substring(uriq.IndexOf('=') + 1);

            //get file name in query parameters
            String fileNames = keyw.Split("mergepfd&filenam=")[1];

            //split file name
            string[] files = fileNames.Split(',');

            //process merge
            var newFiles = await this.MergeFileAsync(outputContainer, files);

            return new Result(newFiles);

        }

        private async Task<IList> MergeFileAsync(CloudBlobContainer container, string[] blobfiles)
        {
            //init instance
            PdfDocument outputDocument = new PdfDocument();

            //loop through files sent in query
            foreach (string fileblob in blobfiles)
            {
                String intfile = $"" + fileblob;

                // get file
                CloudBlockBlob blob = container.GetBlockBlobReference(intfile);

                using (var memoryStream = new MemoryStream())
                {
                    await blob.DownloadToStreamAsync(memoryStream);

                    //get file content
                    string contents = blob.DownloadTextAsync().Result;
                   
                    //open document
                    var inputDocument = PdfReader.Open(memoryStream, PdfDocumentOpenMode.Import);

                    //get pages
                    int count = inputDocument.PageCount;
                    for (int idx = 0; idx < count; idx++)
                    {
                        //append
                        outputDocument.AddPage(inputDocument.Pages[idx]);
                    }


                }
            }


            var outputFiles = new List();
            var tempFile = String.Empty;

            //call save function to store output in container
            tempFile = await this.SaveToBlobStorageAsync(container, outputDocument);

            outputFiles.Add(tempFile);

            //return file(s) url
            return outputFiles;
        }

        private async Task SaveToBlobStorageAsync(CloudBlobContainer container, PdfDocument document)
        {

            //file name structure
            var filename = $"merge-{DateTime.Now.ToString("yyyyMMddhhmmss")}-{Guid.NewGuid().ToString().Substring(0, 4)}.pdf";

            // Creating an empty file pointer
            var outputBlob = container.GetBlockBlobReference(filename);

            using (var stream = new MemoryStream())
            {
                //save result of merge
                document.Save(stream);
                await outputBlob.UploadFromStreamAsync(stream);
            }

            //get sas token
            var sasBlobToken = outputBlob.GetSharedAccessSignature(new SharedAccessBlobPolicy()
            {
                SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5),
                Permissions = SharedAccessBlobPermissions.Read
            });

            //return sas token
            return outputBlob.Uri + sasBlobToken;
        }
    }
}

Create Azure Function

Create the resource in Azure for the Azure Function. Below you can see how I did the set up for my function

Publish your Code

Now it’s time to Publish our Function to the Azure service. Right click on your solution and select the “Publish” option.

Follow the instructions in the next screens, once you reach the Service Dependencies section, make sure that you select the storage account where your PDF documents to merge are saved. Also, double check that the right container name is used in the connection string.

In below screenshot you can see how I selected storageaccountazure8721 as the one that contains my “temp-pdf” container. Note that this container can be the same where your Function is storing its files or it can a completely different.

Make sure that the container name in the connection string exists in the Storage account you select in the dependency – in this case “temp-pdf”

Click publish and wait for the success confirmation from Visual Studio.

You’re all set!

All is ready. You can merge any number of pdf documents into one and get a link that will allow direct download of the merged file.

Leave a Comment