We will create an Azure Function that will merge PDF documents stored in a container of our storage account in Azure, this function will receive the file names to be merged/combined in the parameters of the URL and they will be separated by commas.
Below is an example of the POST request that the code presented in this blog will be able to process.
Set up your project
- Create an azure function project and use the HTTP Trigger
- Make sure you install the below packages before getting started with coding
Create the Function code
We are ready to start writing code. We need two files:
ResultClass.cs – will return the file(s) merged as list
using System;
using System.Collections.Generic;
namespace FunctionApp1
{
public class Result
{
public Result(IList newFiles)
{
this.files = newFiles;
}
public IList files { get; private set; }
}
}
Function1.cs – Code that will receive the file names in the URL, grab them from the Storage account, merge them into one and return a download URL
using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Configuration;
using Microsoft.WindowsAzure.Storage.Blob;
using Newtonsoft.Json;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
namespace FunctionApp1
{
public class Function1
{
static Function1()
{
// This is required to avoid the "No data is available for encoding 1252" exception when saving the PdfDocument
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
}
[FunctionName("Function1")]
public async Task SplitUploadAsync(
[HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)] HttpRequestMessage req,
//container where files will be stored and accessed for retrieval. in this case, it's called temp-pdf
[Blob("temp-pdf", Connection = "")] CloudBlobContainer outputContainer,
ILogger log)
{
//get query parameters
string uriq = req.RequestUri.ToString();
string keyw = uriq.Substring(uriq.IndexOf('=') + 1);
//get file name in query parameters
String fileNames = keyw.Split("mergepfd&filenam=")[1];
//split file name
string[] files = fileNames.Split(',');
//process merge
var newFiles = await this.MergeFileAsync(outputContainer, files);
return new Result(newFiles);
}
private async Task<IList> MergeFileAsync(CloudBlobContainer container, string[] blobfiles)
{
//init instance
PdfDocument outputDocument = new PdfDocument();
//loop through files sent in query
foreach (string fileblob in blobfiles)
{
String intfile = $"" + fileblob;
// get file
CloudBlockBlob blob = container.GetBlockBlobReference(intfile);
using (var memoryStream = new MemoryStream())
{
await blob.DownloadToStreamAsync(memoryStream);
//get file content
string contents = blob.DownloadTextAsync().Result;
//open document
var inputDocument = PdfReader.Open(memoryStream, PdfDocumentOpenMode.Import);
//get pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
//append
outputDocument.AddPage(inputDocument.Pages[idx]);
}
}
}
var outputFiles = new List();
var tempFile = String.Empty;
//call save function to store output in container
tempFile = await this.SaveToBlobStorageAsync(container, outputDocument);
outputFiles.Add(tempFile);
//return file(s) url
return outputFiles;
}
private async Task SaveToBlobStorageAsync(CloudBlobContainer container, PdfDocument document)
{
//file name structure
var filename = $"merge-{DateTime.Now.ToString("yyyyMMddhhmmss")}-{Guid.NewGuid().ToString().Substring(0, 4)}.pdf";
// Creating an empty file pointer
var outputBlob = container.GetBlockBlobReference(filename);
using (var stream = new MemoryStream())
{
//save result of merge
document.Save(stream);
await outputBlob.UploadFromStreamAsync(stream);
}
//get sas token
var sasBlobToken = outputBlob.GetSharedAccessSignature(new SharedAccessBlobPolicy()
{
SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5),
Permissions = SharedAccessBlobPermissions.Read
});
//return sas token
return outputBlob.Uri + sasBlobToken;
}
}
}
Create Azure Function
Create the resource in Azure for the Azure Function. Below you can see how I did the set up for my function
Publish your Code
Now it’s time to Publish our Function to the Azure service. Right click on your solution and select the “Publish” option.
Follow the instructions in the next screens, once you reach the Service Dependencies section, make sure that you select the storage account where your PDF documents to merge are saved. Also, double check that the right container name is used in the connection string.
In below screenshot you can see how I selected storageaccountazure8721 as the one that contains my “temp-pdf” container. Note that this container can be the same where your Function is storing its files or it can a completely different.
Click publish and wait for the success confirmation from Visual Studio.
You’re all set!
All is ready. You can merge any number of pdf documents into one and get a link that will allow direct download of the merged file.