Table of Contents
What is automated document understanding?
Automated document understanding is the process of using artificial intelligence (AI) models to analyze text, structure, and patterns inside a document to categorize them and/or extract key information from them, without manual intervention.
Basic OCR (Optical Character Recognition) only converts images or documents into digitized text, but automated document understanding can help teams to:
- Identify document types
- Recognize key fields
- Understand document layout (tables that span between pages, headers)
- Understand relationships between data points
- Transform unstructured content into structured data
In the context of Power Platform, automated document understanding is powered by AI Builder. In the higher scheme of the Microsoft ecosystem, document understanding and processing can be done from Azure’s Microsoft Foundry platform.
AI Builder
AI Builder is the official Power Platform component to integrate artificial intelligence capabilities to Power Apps applications and Power Automate workflows. It includes:
- Ready-to-use models for common business scenarios (like invoice data extraction, insurance cards)
- Custom models that users can create to automate document processing of their own forms, industry-specific documents or templates.
- Custom prompts to analyze documents with more flexibility and not rely on pre-defined formats.
The diagram below shows the Power Platform ecosystem and how AI builder fits into solutions development:
There are many out of the box actions that can be called from both Power Automate and Power Apps. When one of these actions is called, each action has its own input and output parameters. For example, when selecting “process invoices” it expects an invoice file as input:
How AI document understanding models work?
AI document understanding models work by combining traditional text extraction technologies (OCR), machine learning, and layout analysis to automate the interpretation of documents, instead of simply reading text.
The general process most document understanding models follow is:
- Document ingestion: the model receives a file as an input variable. The file can be in different formats, such as a PDF, scanned document, or image.
- Text extraction: if the document is scanned or an image, OCR converts the visual text into digitized text.
- Layout and structure analysis: diagrams, drawings, tables, headers, footers, labels and paragraphs.
- Field recognition and classification: depending on what’s expected from the model, it will extract pre-set fields or custom fields from the document or will perform classification of the document.
- Confidence scoring: most models return a confidence score about how certain the model is about each result. This is really helpful for example when a low confidence score is returned, the document can be sent for manual review.
- Structured data output: once all the analysis is done by the model, the extracted information is returned in structured format (usually JSON format).
Document understanding models in AI Builder
Document understanding models in Power Platform are powered by AI Builder, the AI capability that enables organizations to extract, classify, and process information from documents without requiring data science expertise. With AI Builder, businesses can train custom models or use pre-built models to read invoices, receipts, contracts, IDs, and other document types. By turning unstructured files—such as PDFs, scans, and images—into structured data, AI Builder allows companies to automate document-heavy workflows directly within Power Apps and Power Automate, improving accuracy, speed, and operational efficiency.
Custom model
Custom document processing models allow users to train AI using their own document samples, such as custom forms, industry-specific documents and templates.
The process of creating a custom model requires a few steps that can be performed from a visual interface:
- Choose document type
- Label the required fields (e.g., Name, application number, amount, etc)
- Upload sample documents
- Train the model (ie, tag the documents by mapping the required fields to where the data should be extracted)
- Use the model from Power Apps or Power Automate.
Pre-built models
Pre-built models are ready-to-use AI models trained by Microsoft for common business document types. These types of models require minimal setup and can be used pretty quickly from Power Automate and Power Apps.
Here is an example of the list of available pre-built actions from Power Automate.
Extract information from invoices
This AI model is ideal for accounts payable automation and allows the automatic analysis and extraction of supplier invoices.
Automatically capture:
- Invoice number
- Vendor details
- Invoice date
- Line items
- Total amount
- Tax values
Extract all the text in photos and PDF documents (OCR)
Useful for digitizing scanned documents.
Optical Character Recognition (OCR) extracts all readable text from:
- Scanned documents
- Images
- Printed PDFs
Extract information from receipts
This AI model is useful for expense management. Employees can submit their expenses for a trip through a Power Apps application and extract receipt data automatically with this AI builder model.
Capture:
- Merchant name
- Date
- Total amount
- Tax
- Payment method
Extract information from identity documents
This AI model can read data from identification documents such as passports, driver’s licenses or national IDs. Useful for onboarding and verification processes.
Capture:
- Name
- ID number
- Date of birth
Extract information from business cards
Helps digitize contacts into CRM systems.
Automatically capture:
- Name
- Job title
- Company
- Phone number
- Email address
Extract information from contracts
This AI model is useful because it allows processing data from both structured and unstructured documents.
Identify and extract:
- Key clauses
- Dates
- Parties involved
- Contract values
Bottom of Form
Extract information from health insurance card
Useful in healthcare and insurance workflows.
Capture:
- Member name
- Policy number
- Group number
- Provider details
Custom prompts with AI models
Custom prompts for document understanding allow users to extract specific information from documents using natural language instructions. Instead of using pre-built models or a custom model that relies on predefined fields or rigid templates, a prompt can be designed to describe what we want to extract from the document, and the model interprets the document accordingly.
Custom prompts can also become handy when documents include drawings, diagrams or more complex content.
Here is an example of a prompt built to analyze an invoice, classify the expense type and also extract some key information (invoice amount, date, supplier).
Once the prompt is built, it can be called from Power Automate, passing only the invoice content as an input parameter for the AI model to analyze:
Within AI Builder, custom prompts can be used to define:
- What type of document is being analyzed
- Which fields or clauses should be extracted
- How information should be formatted
- Conditions and exceptions to apply
The custom prompt approach increases flexibility, especially when working with semi-structured or variable document formats.
Microsoft Foundry
AI Builder is designed for users to quickly integrate AI and implement inside Power Platform solutions, but Microsoft offers a wide range of services that can be implemented from the Azure shop. These AI suite of services is now hosted under the “Microsoft Foundry” platform.
Behind the scenes, AI Builder runs on many of the Azure services, but it’s masked with an easier interface for set up. Microsoft Foundry requires Azure access and a subscription, so it needs someone from IT or an Admin to successfully link a service and configure it to the tenant’s official subscription. Pricing also differs between Power Platform’s AI Builder and Foundry.
The decision between using Foundry or AI Builder depends on the level of complexity and control the organization requires.
Content Understanding in Cognitive Services
Azure Open AI
The Content understanding service was previously known as “Document Intelligence”, and it’s an Azure AI service that helps users analyze and extract insights from diverse content types (images, PDFs, audio, video).
The catalog of pre-built models in Azure is wider than the available options in AI Builder:
- Traditional OCR
- Layout analysis
- Document fields
- Procurement: Invoices, receipts, hotel receipts, purchase orders, utility bills, credit memos
- US Tax forms: W-2, W-4, 1040, 1040 Senior, 1040-Schedule-A, 1040-Schedule-B, 1040-Schedule-C
- Identification documents: general, passport
- US Mortgage: 1003, 1004, 1005, 1008, closing disclosure
- Contracts
- Credit cards
- US checks
- US health insurance cards
- US pay stubs
- US marriage certificates
- Document search
- Call recordins
- Audio search
- Video search
- Image search
From a Microsoft foundry perspective, using Content Understanding provides the following benefits:
- Multimodal support. Works across document, text, image, audio, and video.
- Advanced AI capabilities. Unlock intelligent content processing with Generative AI.
- Lower cost, improved results. Optimize processing efficiency and operational cost.
Using Cognitive services for document understanding would, at a high level, require the following:
- Set up Azure resource group
- Create Microsoft Foundry resource
- Call the Document understanding API
Azure OpenAI provides access to large language models (LLMs) for different purposes. In the context of document understanding, with Azure Open AI users would be able to build a similar structure to what Custom Prompts would do in AI Builder.
Using Azure Open AI for document understanding would, at a high level, require the following:
- Set up Azure resource group
- Create Microsoft Foundry resource
- Deploy an LLM model (eg. Gpt 5.2)
- Call the Azure Open AI API, sending a prompt with instructions about the document extraction.
Custom model in Content Understanding
Same as in AI Builder, custom models can be built for documents that do not fall under any of the categories of the pre-built models. This can be done through a visual interface where everything is configurated under the content understanding studio:
Benefits of automating document understanding in Power Platform
By automating document processing, organizations can:
- Reduce manual effort
- Minimize human error
- Allow Intelligent workflow routing
- Improve operational efficiency
- Scale processes
- Foster better compliance and auditability
Some use cases that can benefit from automatic document processing, understanding and classification are:
- Accounts Payable Automation
- Employee onboarding documents review
- Bank statements digitalization
- Remittance statements processing
- Expense submission management
- Utility bills analysis
Automate document classification with Power GI
Let’s work together and let us help you automate document understanding with Power GI’s Power Platform expertise with our Power Automate Consulting services and our Agentic automation services.
Whether you’re ready to implement AI document processing or exploring what’s possible, contact us and discover the joy of automating document understanding processes.