12-14-2022 08:50 AM - last edited 12-14-2022 08:52 AM
Update & Create Excel Records 50-100x Faster
I was able to develop an Office Script to update rows and an Office Scripts to create rows from Power Automate array data. So instead of a flow creating a new action API call for each individual row update or creation, this flow can just send an array of new data and the Office Scripts will match up primary key values, update each row it finds, then create the rows it doesn't find.
And these Scripts do not require manually entering or changing any column names in the Script code.
• In testing for batches of 1000 updates or creates, it's doing ~2000 row updates or creates per minute, 50x faster than the standard Excel create row or update row actions at max 50 concurrency. And it accomplished all the creates or updates with less than 4 actions or only .4% of the standard 1000 action API calls.
• The Run Script code for processing data has 2 modes, the Mode 2 batch method that saves & updates a new instance of the table before posting batches of table ranges back to Excel & the Mode 1 row by row update calling on the Excel table.
The Mode 2 script batch processing method will activate for updates on tables less than 1 million cells. It does encounter more errors with larger tables because it is loading & working with the entire table in memory.
Shoutout to Sudhi Ramamurthy for this great batch processing addition to the template!
Code Write-Up: https://docs.microsoft.com/en-us/office/dev/scripts/resources/samples/write-large-dataset
Video: https://youtu.be/BP9Kp0Ltj7U
The Mode 1 script row by row method will activate for Excel tables with more than 1 million cells. But it is still limited by batch file size so updates on larger tables will need to run with smaller cloud flow batch sizes of less than 1000 in a Do until loop.
The Mode 1 row by row method is also used when the ForceMode1Processing field is set to Yes.
Be aware that some characters in column names, like \ / - _ . : ; ( ) & $ may cause errors when processing the data. Also backslashes \ in the data, which are usually used to escape characters in strings, may cause errors when processing the JSON.
Version 7 Note
Diverting from what is shown in the video, I was able to remove almost all the flow actions in the "Match new and existing data key values then batch update" scope after replicating their functions in the scripts themselves. The flow now goes directly from the "SelectGenerateData" action to the "Run script Update Excel rows" action and the script handles matching up the UpdatedData JSON keys/field names to the destination table headers.
Also version 7 changes the primary key set up in the SelectGenerateData and changes the logic for skipping cell value updates & blanking out cell values.
Now the primary key column name from the destination table must be present in the SelectGenerateData action with the dynamic content for the values you want to match up to update. No more 'PrimaryKey' line, the update script will automatically reference the primary key column in the SelectGenerateData data based on the PrimaryKeyColumnName input on the update script action in the flow.
Now leaving a blank "" in the SelectGenerateData action will make that cell value in the Excel table empty, while leaving a null in the SelectGenerateData action will skip updating / not alter the cell value that currently exists in the Excel table there.
Version 6 which looks closer to the version shown in the video can still be accessed here: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Excel-Batch-Create-Update-and-Upsert/m-p...
Version 7 Set-Up Instructions
Go to the bottom of this post & download the BatchExcel_1_0_0_xx.zip file. Go to the Power Apps home page (https://make.powerapps.com/). Select Solutions on the left-side menu, select Import solution, Browse your files & select the BatchExcel_1_0_0_xx.zip file you just downloaded. Then select Next & follow the menu prompts to apply or create the required connections for the solution flows.
Once imported, find the Batch Excel solution in the list of solution & click it to open the solution. Then click on the Excel Batch Upserts V7 item to open the flow. Once inside the flow, delete the PlaceholderValue Delete after import action.
Open the Office Script Batch Update compose action to the BatchUpdateV7 script code. Select everything inside the compose input & control + C copy it to the clipboard.
Then find & open an Excel file in Excel Online. Go to the Automate tab, click on All Scripts & then click on New Script.
When the new script opens, select everything in the script & control + V paste the BatchUpdateV7 script from the clipboard into the menu. Then rename the script BatchUpdateV7 & save it. That should make the BatchUpdateV7 reference-able in the later Run script flow action.
Do the same process to import the BatchCreateV7 script.
Then go to the List rows Sample source data action. If you are going to use an Excel table as the source of updated data, then you can fill in the Location, Document Library, File, & Table information on this action. If you are going to use a different source of updated data like SharePoint, SQL, Dataverse, an API call, etc, then delete the List rows Sample source data placeholder action & insert your new get data action for your other source of updated data.
Following that, go to the Excel batch update & create scope. Open the PrimaryKeyColumnName action, remove the placeholder values in the action input & input the column name of the unique primary key for your destination Excel table. For example, I use ID for the sample data.
Then go to the SelectGenerateData action.
If you replaced the List rows Sample source data action with a new get data action, then you will need to replace the values dynamic content from that sample action with the new values dynamic content of your new get data action (the dynamic content that outputs a JSON array of the updated data).
In either case, you will need to input the table header names from the destination Excel table on the left & the dynamic content for the updated values from the updated source action on the right. You MUST include the column header for the destination primary key column & the the primary key values from the updated source data here for the Update script to match up what rows in the destination table need which updates from the source data. All the other columns are optional / you only need to include them if you want to update their values.
After you have added all the columns & updated data you want to the SelectGenerateData action, then you can move to the Run script Update Excel rows action. Here add the Location, Document Library, File Name, Script, Table Name, Primary Key Column Name, ForceMode1Processing, & UpdatedData. You will likely need to select the right-side button to switch the UpdatedData input to a single array input before inserting the dynamic content for the SelectGenerateData action.
Then open the Condition If new records continue else skip action & open the Run script Create Excel rows action. In this run script action input the Location, Document Library, Script, Table Name, & CreateData. You again will likely need to select the right-side button to change the CreateData input to a single array input before inserting the dynamic content for the Filter array Get records not found in table output.
If you need just a batch update, then you can remove the Filter array Get records not found in table & Run script Create Excel rows actions.
If you need just a batch create, then you can replace the Run script Batch update rows action with the Run script Batch create rows action, delete the update script action, and remove the remaining Filter array Get records not found in table action. Then any updated source data sent to the SelectGenerateData action will just be created, it won't check for rows to update.
Thanks for any feedback,
Please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86).
And reach out on LinkedIn (https://www.linkedin.com/in/kolota/) if you want to hire me to consult or build more custom Microsoft solutions for you.
Office Script Code
(Also included in a Compose action at the top of the template flow)
Batch Update Script Code: https://drive.google.com/file/d/1kfzd2NX9nr9K8hBcxy60ipryAN4koStw/view?usp=sharing
Batch Create Script Code: https://drive.google.com/file/d/13OeFdl7em8IkXsti45ZK9hqDGE420wE9/view?usp=sharing
(ExcelBatchUpsertV7 is the core piece, ExcelBatchUpsertV7b includes a Do until loop set-up if you plan on updating and/or creating more than 1000 rows on large tables.)
watch?v=HiEU34Ix5gA
@berdavnes & @SikY
I was finally able to replicate your issues by adding a row to an empty table to ensure the table was not empty before the batch update to enable it to run & grab the headers in the flow, then manually deleting all rows again before the batch create. That showed the same error with the rows posting below the table.
It looks like the additional safe-guard in the script to add a row if the table was empty failed to handle this because it would add a blank row, but then earlier references before the new blank row was added did not automatically update with the new table data. I've updated the initial section of the Update & Create scripts below to help handle this...
Batch Update
function main(workbook: ExcelScript.Workbook,
TableName: string,
PrimaryKeyColumnName: string,
ForceMode1Processing: string,
UpdatedData: updateddata[],
) {
//If it is a blank table, add an empty row to prevent errors.
if (workbook.getTable(TableName).getRangeBetweenHeaderAndTotal().getRowCount() < 2) {
workbook.getTable(TableName).addRow();
}
let table = workbook.getTable(TableName);
let RowNum: number;
let TableRange = table.getRange()
let TableRangeNoHeader = table.getRangeBetweenHeaderAndTotal()
let TableData = TableRange.getValues()
let ArrayPK = table.getColumn(PrimaryKeyColumnName).getRange().getValues().join("#|#").split("#|#")
let ArrayPKErrors = new Array("")
let ColumnCount = TableRange.getColumnCount()
let TableSize = (TableRange.getRowCount()) * (ColumnCount)
let TableSizeBatchProcLimit = 1000000
Batch Create
function main(workbook: ExcelScript.Workbook,
TableName: string,
PrimaryKeyColumnName: string,
ForceMode1Processing: string,
CreateData: createdata[],
) {
//If it is a blank table, add an empty row to prevent errors.
if (workbook.getTable(TableName).getRangeBetweenHeaderAndTotal().getRowCount() < 2) {
workbook.getTable(TableName).addRow();
}
let table = workbook.getTable(TableName);
let TableRange = table.getRange();
let TableRangeNoHeader = table.getRangeBetweenHeaderAndTotal();
let TableData = TableRange.getValues();
let PKColumnIndex = table.getColumn(PrimaryKeyColumnName).getIndex();
CreateData = JSON.parse(JSON.stringify(CreateData).split('"PK"').join(`"${PKColumnIndex}"`))
let ColumnCount = TableRange.getColumnCount();
let InitialNumberOfTableRows = TableRange.getRowCount();
let TableSize = (InitialNumberOfTableRows) * (ColumnCount);
let TableSizeBatchProcLimit = 1000000;
//let BlankLine: string [];
//BlankLine = new Array(ColumnCount);
let RowNum: number;
RowNum = InitialNumberOfTableRows;
//-1 to 0 index the RowNum
RowNum = RowNum - 1;
Script Links:
Update: https://drive.google.com/file/d/1kfzd2NX9nr9K8hBcxy60ipryAN4koStw/view?usp=share_link
Create: https://drive.google.com/file/d/13OeFdl7em8IkXsti45ZK9hqDGE420wE9/view?usp=share_link
Also if you want to absolutely ensure it runs correctly, you can add an Excel action to add a blank row before the List rows action that grabs the sample headers...
@AlegatzkeMIT
If you use the most recent versions of the scripts in the comment above & use the header name for your primary key, then it should all work even if you have no rows in your table.
Hi Takolota,
You're a script genius ^^
I was looking for power automate large excel data transfer for decade and get into your script by luck.
I'm currently trying to get the V5.1b work on an excel file with 25000 rows and arround 20 columns.
I had to lower the BatchSize to 300 due to error 413 with 1000.
Now, I get an issue with the ComposeDestinationJSON, not on the first loop, but later on:
The issue detail only said:
Unable to process template language expressions in action 'ComposeDestinationJSON' inputs at line '0' and column '0': 'The template language function 'json' parameter is not valid. The provided value '
Does this ring a bell ? 🙂
Thanks in advance
Just saw that I got some [ and ] in the data I'm trying to update.
Could this be the reason why the flow doesn't work ?
Good, got it, you can't have a '\' in the excel file.
I should correct it before applying the flow.
Edit: Just need to add "replace(item()?['ColumnName'],'\','/')" in the "SelectGenerateUpdateData.
Also, according to your explaination, this process should performed arround ~2500 row updates or creates per minute.
On my side, with 25.000 rows, it took between 50min to 70min, so less than 500rows per minute.
Moreover, on my file, the delta between 2 updates should only be:
- + 1000 news rows
- - 1000 old rows
- Update 24.000 rows.
Any idea why its taking so long ? Should I sort my data in a specific way ?
@Anonymous
Sorry for any confusion, 2500 is kind of a low-estimate average for the non-loop version and it’s an average between the mode1 processing & mode2 processing where I’ve gotten the mode2 processing to process sometimes up to 40,000 rows a minute on data tables with less than 1million cells, but with larger tables greater than 1million cells & set as the default on the loop version mode1 processing can go as low as a few hundred rows per minute.
It largely has to do with limits to the size of total data the scripts can work with in memory that determine constraints on the different mode1 or mode2 processing.
@takolota , my file currently have 21 columns and 26.000 rows so arround 550.000 cells, so less than 1 millions cells.
I'm not able to set the BatchSize higher than 300 because I got the error 413, therefore, my flow delay is arround 1 hour.
@Anonymous
I'm a bit more rushed at my job right now, so it will be a bit before I can take a longer look. But any data size errors either have to do with the literal array being sent to the script from power automate being too large or with some part of the script trying to run something that pulls too much data for it's limited memory capacity.
There might be some improvements on pieces trying to pull more into memory on the script itself, but the size limit for the array going from Power Automate to the script is a hard limit set either by MS or JavaScript. I've already decreased the amount of extra character data added to the array payload as much as possible when I formatted the JSON data to be numbered instead of using the direct column header names. You may want to share more of what your data looks like because if you have some things like many multi-line text columns with large amounts of text data, then that may increase the payload size beyond the limit.
And here is more on the 413 error: https://plainenglish.io/blog/how-to-fix-413-request-entity-too-large-error-in-node-js-226c0070e4f4
Sorry for the delay @takolota
My flow is running without major issue between 40 to 50min each days.
Thats great.
However, I just see that using the V5.1, it only add and update lines, not remove lines not in the excel file.
Can you explain how to add this feature ? (I see that you have a script delete also).
Also, for information, just some line of my excel file:
Aircraft_Fleet | Sous_Flotte | Tasks_Aircraft | Tasks_Barcode | TaskInf_Status | ATA | Config_Slot_4D | Task_Name_Only | Speciality | FoundDate | Task_Due_Date_Totale | EndDate | Fault_DeferralClass | Code |
777 | 77WS | F-GSQV | TRFX9006BN1B | ACTV | 45 | 45-00 | PTH/PTE/ LMAT MANQUANT (SIGMA ID: JDIS 03873310) | Avion | 18/04/2016 00:00 | 10/11/2037 23:59 | TLI | 77WSAvion | |
777 | 77WM | F-GZNF | TRFX9006BN2K | ACTV | 45 | 45-10 | PTH / LMAT AND ACCESSORY (SIGMA ID: MDIS 36787545) | Avion | 20/11/2015 00:00 | 18/12/2037 23:59 | NSRE Man | 77WMAvion | |
777 | 77WM | F-GZNG | TRFX9006BN41 | ACTV | 45 | 45-00 | PTH/PTE/ LMAT MANQUANT (SIGMA ID: JDIS 03871630) | Avion | 14/04/2016 00:00 | 30/12/2037 23:59 | TLI | 77WMAvion |