LLMs like GPT4o on Azure allow us to upload & prompt text responses on images. See @DamoBird365’s video for a demonstration with single images. But GPT4o does not yet support uploading & prompting on PDF files.
It is possible to convert the 1st page of a PDF to an image using the OneDrive Convert file action, but this is not very useful if you may have PDFs with multiple pages. It is also possible to convert a PDF to an array of images using 3rd party connectors like Adobe & Encodian, but part of the point of using an Azure-hosted LLM is the higher data privacy standards Microsoft has promised and sending all the document data to a 3rd party service negates much of those data privacy standards.
This is why I have developed a template using an Azure Function to convert PDF data into an array of image data before passing it off to a GPT4o prompt. By running everything in the same Azure cloud environment we can reduce external dependencies & potential data privacy concerns.
Example Flow Run
Import & Set-Up
Go to the bottom of this post & download the LLMVisionExtractPDFData_1_0_0_xx.zip file. Go to the Power Apps home page (https://make.powerapps.com/). Select Solutions on the left-side menu, select Import solution, Browse your files & select the LLMVisionExtractPDFData_1_0_0_xx.zip file you just downloaded. Then select Next & follow the menu prompts to apply or create the required connections for the solution flows. And finish importing the solution.
Once the solution is done importing, select the solution name in the list at the center of the screen. Once inside the solution click on the 3 vertical dots next to the flow name & select edit.
Now that the flow is imported & open, we now need to set up the Azure Function for converting PDF base64 to images base64 and set up the Azure LLM.
If you have already worked with and deployed Azure Functions before, then you can skip the extra installations.
If you haven't deployed Azure Functions, you can go to the Microsoft Store & make sure you have VS Code & Python installed.
Once VS Code is installed, open it. Go to the 4 blocks on the left side menu to open the list of extensions. Search for Azure in the extensions & select to install Azure Functions. Azure Account & Azure Resources will automatically be installed too.
Once all the extensions are installed, go to the Azure A on the left side menu & select to sign in to Azure.
Next set up a project folder on your machine for Azure Functions & a sub-folder for this PDF-To-Images project.
Back in VS Code select the button to create a new Azure Function. Follow the Function set-up instructions selecting the PDF-To-Images project folder you just created, Python language Model V2, and where in VS Code to open the new Azure Function project.
Once all the project files are loaded in VS Code, select the function_app.py file. Remove all the code in the file. Go back to the tab with the flow, open the "Azure Function Python Script" action, copy its contents & paste them into the function_app.py file in VS Code. Cntrl+S / Save the file.
Next go to the requirements.txt file. Go to the flow to the "Azure Function Requirements.txt" action & copy its contents. Paste the contents into the requirements.txt file in VS Code. Cntrl+S / Save the file.
Go back to the Azure A on the left-side menu. Select the Deploy function button. Select Create New in the list of function. (If Create New doesn't appear, you may have to log in to Azure, navigate to Azure Functions & go through the process to create a new function so the new function will appear in the list of function options to deploy to.) After a couple seconds VS Code will prompt you with a message to confirm you want to deploy & over-write the function. Select Deploy.
Go to Azure & login. Go to Function App. Find & select the newly deployed function. Select the 1st function under Name. Select Get function URL & in the pop-up menu & copy the Function key url. Paste the url in the "HTTP PDF B64 to PNG B64" URI input. Everything for the PDF base64 to images base64 conversion Azure Function should now be set.
Now to set up the Azure LLM. Back in the Azure tab search for Azure OpenAI. Select Create. Fill & follow the set-up prompts.
Once the resource is done deploying, select Go to resource. Then select Go to Azure OpenAI Studio.
In the OpenAI studio go to Models, select the gpt4o model, select Deploy, and fill the model deployment inputs before selecting Create
Once the model is deployed, we need to get the HTTP URI. Go to the Chat tab where it should have a chat open with the model just deployed. Open the developer tools menu on your browser & go to Network. Submit a prompt/chat message to trigger a call to the deployment. Find & select the relevant call under the Name header in the developer tools Network tab. Copy the Request URL so you can past it into the HTTP LLM action's URI input.
Next get the API key. On the chat window, select View code. Copy the API key & paste it in the value input of the api-key header line. That should finish the set-up for the call to the LLM.
Now you can Select a multi-page PDF in the Get file content action, edit what you want to extract in the Prompt action, & test run the flow.
Also note, if you change out the Get file content with another action that pulls in the file/attachment content, in the "HTTP PDF B64 to PNG B64" action body expression, make sure you add the reference to just the base64 piece of the content, like by adding the ['$content'] reference to the expression.
Thanks for any feedback,
Subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86).
And reach out on LinkedIn (https://www.linkedin.com/in/kolota/) if you want to hire me to consult or build more custom Microsoft solutions for you.
Which version is the latest? Please make a video on it. How to use it?
They’re both the same version. One is a solution package you can import on a Power Apps Solutions page, the other is a flow package you can import using the Legacy method on the Power Automate My flows page.
You can use this on any PDF file selected in a Get file content action (or like from attachments to an email message) & edit the prompt in the Prompt action to extract different data for different PDF files.
You can also see a similar method with a video to do roughly the same thing with GPT3.5 Turbo here: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-G...