cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
JShoowa92
Regular Visitor

Extracting Data From Invoices

Hi SuperPoweAutomators,

 

I am currently working on an invoice processing project using AI Builder for my organization. The extraction is supposed to happen as soon as the invoices of .pdf file-types are received via Outlook. I have 5 collections of invoices that I have to train my model on. The objective is to tag specific data from these invoices for extraction and map them into a defined Excel table. 

 

My challenge is:

Some invoice collections have multiple records on one page and each record needs to be mapped to the target Excel sheet as a single record I am having a hard time figuring out how to train my model since only some of these documents come with multi-customer records on a single page. 

 

In the screenshot I have provided here, I have a single-page invoice that has got 3 different customer records (three invoices merged into one invoice per se) and it could even be more (some can be a single page with a single record) which is easier to tag and extract. The red lines in the screenshot indicate the end and beginning of the records. How do I approach this problem and get a solution that works?

 

The required data on this invoice is:

IN,

OUT,

Nights,

GuestName,

RatePerNight(this will be multiplied by the number of Nights to evaluate the AccommodationRate), and

TotalMeals+OtherCosts

 

If there is anything that you need clarity on, kindly ask me, and thanks for your assistance in advance.

5 REPLIES 5
ARB_wcc
Super User
Super User

Hi,

 

Do you have access to the GPT Prompt Builder connector in your environment?

GPT Prompt Builder goes GA and expands to EU,UK,AU - Power Platform Community (microsoft.com)

 

If so, this task can be automated in a couple of minutes with a custom prompt like the one below:

 

 

You are an AI trained to process invoices received as PDF files via email. Your task is to identify and extract specific information from invoices that may contain MULTIPLE CUSTOMER records on a single page. You need to recognize the beginning and end of *each* customer record on the invoice, and extract the following data points: 

1. Check-in date (IN) 
2. Check-out date (OUT) 
3. Number of nights stayed (Nights) 
4. Guest name (GuestName) 
5. Rate per night (RatePerNight) 
6. Total for meals and other costs (TotalMeals+OtherCosts) 

For **EACH** customer record, calculate the 'AccommodationRate' by multiplying 'RatePerNight' by the 'Number of nights stayed'. The 'TotalMeals+OtherCosts' is the sum of all other charges excluding the room rate. 

Please provide the extracted information in JSON format, with *SEPARATE ENTRIES* for each customer record. If any data point is unclear or missing, indicate it as "Data not available". 

You are supposed to handle invoices with a varying number of customer records per page. 

Here is the information from a sample invoice for your reference: 

[Start of OCR Text] 

Wee/ Hotel Erna r na joma Ave. - Invoice to: WALVIS BAY NAMIBIA Tax Invoice no. I Ref. No. AH-Your reference 22_ ._ . .... . our consultant HiIma 27/09/2023 Oty. Booking Details Guest Name Unit Total 1 Dinner re 61866 Abia 195.00 195.00 2 Coke ADM 15.00 30.00 2 Still Water 500m1 Abia 15.00 30.00 1 Dinner re 61862 Abia 195.00 195.00 1 Fruitree Guava Juice Abia 25.00 25.00 1 Standard Room single. Pax: 1 Abla 750.00 750.00 Breakfast IN: 18/09/2023, Out: 19/09/2023. Nights: 1 1 Dinner re 61871 Adelson 195.00 195.00 1 Fruaree Grape juice Adelson 25.00 25.00 1 Lunch re 61859 Aoilson 195.00 195.00 1 Fruitree Guava JuiceAdilson 25.00 25.00 1 Standard Room single. Pax: 1 Adelson 750.00 750.00 Breakfast IN: 18/09/2023. Out: 19/09/2023. Nights 1 1 Lunch re 61860 Jeremiah 0 195.00 195.00 1 Still Water 500m1 Jeremiah 0 15.00 15.00 1 Standard Room single. Pax: 1 Jeremiah 0 750.00 750.00 Breakfast IN: 18/09/2023, Out: 19/09/2023. Nights 1 All given prices in Namibia Dollar Subtotal: 3375.00 

[End of OCR Text] 

Using the above instructions, extract the relevant information. 

 

 

Output:

 

AIprompt.gif

 

For more info on how you can create this flow, refer to this post from @takolota - Extract Data From PDFs and Images With GPT - Power Platform Community (microsoft.com)

 

@gbego - New connector tested and approved! 😋

 

JShoowa92
Regular Visitor

Hi, @ARB_wcc 

 

Thank you for your response. I am not well experienced with Power Automate yet. I would like to confirm that I indeed have access to the GPT Prompt Builder connector in my environment and I took a look at the flows you referred me to, however, I cannot seem to successfully extract the data from my invoice like you demonstrated in the AIPrompt.gif above. I believe there is a bug in the new Power Automate designer experience. Kindly refer to the screenshots attached hereto: 

Kindly guide me on how you got the extraction done, if possible, step by step. Thank you in advance.

Hello,

It's great that you're working on an invoice processing project using AI Builder. Addressing the challenge of extracting data from invoices that contain multiple records on a single page can indeed be complex. Here's a suggested approach to tackle this problem:

1. Data Preprocessing:

  • Break down the PDFs into individual records or pages. This may involve extracting each record and saving it as a separate PDF or image file.

2. Data Labeling:

  • For training your model, label each individual record separately. This ensures that the model learns to recognize and extract information from each record independently.

3. Train the Model:

  • Use the labeled data to train your model. Ensure that your training data includes examples of single-page invoices, as well as multi-record invoices. This diversity will help the model learn to handle both scenarios.

4. Bounding Boxes:

  • Use bounding boxes during labeling to specify the exact location of each data field within a record. This will help the model understand the spatial relationships between different pieces of information.

5. Evaluate Model Performance:

  • After training, evaluate the model's performance on a separate set of labeled data. Pay attention to its ability to correctly identify and extract information from both single and multi-record invoices.

6. Iterative Training:

  • If the initial model doesn't perform well on multi-record invoices, consider an iterative training approach. Add more labeled examples of multi-record invoices to your training set and retrain the model.

7. Consider Advanced Techniques:

  • Depending on the complexity of your invoices, you might need to explore advanced techniques such as sequence labeling or using custom models if the standard models provided by AI Builder prove insufficient.

8. Optimize for Accuracy:

  • Pay close attention to optimizing for accuracy, especially when dealing with multiple records on a single page. This might involve adjusting parameters, refining labeling, or experimenting with different models.

9. Test with Real-World Data:

  • Before deploying the model, test it with a diverse set of real-world invoices to ensure its accuracy and reliability in handling various scenarios.

10. Feedback Loop:

  • Establish a feedback loop where the model can continuously learn and improve based on new labeled examples and real-world usage.

Hi @JShoowa92 

By the looks of the screenshots shared it appears you are running into issues using the preview action for GPT Prompts. We have a new action now that is generally available, I would recommend trying to re-create your flow now with the new action and the issues should be fixed.

Please see the documentation here for reference: https://learn.microsoft.com/en-us/ai-builder/create-a-custom-prompt 

Best,
Gwenael

davidkr
Advocate I
Advocate I

Hello, how did you solve this problem? I have the same problem. I have invoices that contain products from different POs but each PO number will appear only once at the beginning of each record like yours, so I can not tag the PO number multiple time (for each of its product).

Helpful resources

Announcements

Community will be READ ONLY July 16th, 5p PDT -July 22nd

Dear Community Members,   We'd like to let you know of an upcoming change to the community platform: starting July 16th, the platform will transition to a READ ONLY mode until July 22nd.   During this period, members will not be able to Kudo, Comment, or Reply to any posts.   On July 22nd, please be on the lookout for a message sent to the email address registered on your community profile. This email is crucial as it will contain your unique code and link to register for the new platform encompassing all of the communities.   What to Expect in the New Community: A more unified experience where all products, including Power Apps, Power Automate, Copilot Studio, and Power Pages, will be accessible from one community.Community Blogs that you can syndicate and link to for automatic updates. We appreciate your understanding and cooperation during this transition. Stay tuned for the exciting new features and a seamless community experience ahead!

Summer of Solutions | Week 4 Results | Winners will be posted on July 24th

We are excited to announce the Summer of Solutions Challenge!    This challenge is kicking off on Monday, June 17th and will run for (4) weeks.  The challenge is open to all Power Platform (Power Apps, Power Automate, Copilot Studio & Power Pages) community members. We invite you to participate in a quest to provide solutions to as many questions as you can. Answers can be provided in all the communities.    Entry Period: This Challenge will consist of four weekly Entry Periods as follows (each an “Entry Period”)   - 12:00 a.m. PT on June 17, 2024 – 11:59 p.m. PT on June 23, 2024 - 12:00 a.m. PT on June 24, 2024 – 11:59 p.m. PT on June 30, 2024 - 12:00 a.m. PT on July 1, 2024 – 11:59 p.m. PT on July 7, 2024 - 12:00 a.m. PT on July 8, 2024 – 11:59 p.m. PT on July 14, 2024   Entries will be eligible for the Entry Period in which they are received and will not carryover to subsequent weekly entry periods.  You must enter into each weekly Entry Period separately.   How to Enter: We invite you to participate in a quest to provide "Accepted Solutions" to as many questions as you can. Answers can be provided in all the communities. Users must provide a solution which can be an “Accepted Solution” in the Forums in all of the communities and there are no limits to the number of “Accepted Solutions” that a member can provide for entries in this challenge, but each entry must be substantially unique and different.    Winner Selection and Prizes: At the end of each week, we will list the top ten (10) Community users which will consist of: 5 Community Members & 5 Super Users and they will advance to the final drawing. We will post each week in the News & Announcements the top 10 Solution providers.  At the end of the challenge, we will add all of the top 10 weekly names and enter them into a random drawing.  Then we will randomly select ten (10) winners (5 Community Members & 5 Super Users) from among all eligible entrants received across all weekly Entry Periods to receive the prize listed below. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once overall. If they are drawn multiple times, another user will be drawn at random.  Individuals will be contacted before the announcement with the opportunity to claim or deny the prize.  Once all of the winners have been notified, we will post in the News & Announcements of each community with the list of winners.   Each winner will receive one (1) Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value). NOTE: Prize is for conference attendance only and any other costs such as airfare, lodging, transportation, and food are the sole responsibility of the winner. Tickets are not transferable to any other party or to next year’s event.   ** PLEASE SEE THE ATTACHED RULES for this CHALLENGE**   Week 1 Results: Congratulations to the Week 1 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge.   Community MembersNumber SolutionsSuper UsersNumber Solutions Deenuji 9 @NathanAlvares24  17 @Anil_g  7 @ManishSolanki  13 @eetuRobo  5 @David_MA  10 @VishnuReddy1997  5 @SpongYe  9JhonatanOB19932 (tie) @Nived_Nambiar  8 @maltie  2 (tie)   @PA-Noob  2 (tie)   @LukeMcG  2 (tie)   @tgut03  2 (tie)       Week 2 Results: Congratulations to the Week 2 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Week 2: Community MembersSolutionsSuper UsersSolutionsPower Automate  @Deenuji  12@ManishSolanki 19 @Anil_g  10 @NathanAlvares24  17 @VishnuReddy1997  6 @Expiscornovus  10 @Tjan  5 @Nived_Nambiar  10 @eetuRobo  3 @SudeepGhatakNZ 8     Week 3 Results: Congratulations to the Week 3 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Week 3:Community MembersSolutionsSuper UsersSolutionsPower Automate Deenuji32ManishSolanki55VishnuReddy199724NathanAlvares2444Anil_g22SudeepGhatakNZ40eetuRobo18Nived_Nambiar28Tjan8David_MA22   Week 4 Results: Congratulations to the Week 4 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Week 4:Community MembersSolutionsSuper UsersSolutionsPower Automate Deenuji11FLMike31Sayan11ManishSolanki16VishnuReddy199710creativeopinion14Akshansh-Sharma3SudeepGhatakNZ7claudiovc2CFernandes5 misc2Nived_Nambiar5 Usernametwice232rzaneti5 eetuRobo2   Anil_g2   SharonS2  

Check Out | 2024 Release Wave 2 Plans for Microsoft Dynamics 365 and Microsoft Power Platform

On July 16, 2024, we published the 2024 release wave 2 plans for Microsoft Dynamics 365 and Microsoft Power Platform. These plans are a compilation of the new capabilities planned to be released between October 2024 to March 2025. This release introduces a wealth of new features designed to enhance customer understanding and improve overall user experience, showcasing our dedication to driving digital transformation for our customers and partners.    The upcoming wave is centered around utilizing advanced AI and Microsoft Copilot technologies to enhance user productivity and streamline operations across diverse business applications. These enhancements include intelligent automation, AI-powered insights, and immersive user experiences that are designed to break down barriers between data, insights, and individuals. Watch a summary of the release highlights.    Discover the latest features that empower organizations to operate more efficiently and adaptively. From AI-driven sales insights and customer service enhancements to predictive analytics in supply chain management and autonomous financial processes, the new capabilities enable businesses to proactively address challenges and capitalize on opportunities.    

Updates to Transitions in the Power Platform Communities

We're embarking on a journey to enhance your experience by transitioning to a new community platform. Our team has been diligently working to create a fresh community site, leveraging the very Dynamics 365 and Power Platform tools our community advocates for.  We started this journey with transitioning Copilot Studio forums and blogs in June. The move marks the beginning of a new chapter, and we're eager for you to be a part of it. The rest of the Power Platform product sites will be moving over this summer.   Stay tuned for more updates as we get closer to the launch. We can't wait to welcome you to our new community space, designed with you in mind. Let's connect, learn, and grow together.   Here's to new beginnings and endless possibilities!   If you have any questions, observations or concerns throughout this process please go to https://aka.ms/PPCommSupport.   To stay up to date on the latest details of this migration and other important Community updates subscribe to our News and Announcements forums: Copilot Studio, Power Apps, Power Automate, Power Pages