08-10-2023 06:46 AM - last edited 08-10-2023 07:10 AM
Find and Remove Duplicates From Any Datasource
Need to identify and/or remove records in a datasource based on duplicate values in one or more columns?
This template uses a SharePoint list to demonstrate, but this method ultimately pulls in the data from any source as a standard JSON array and processes it to identify all the duplicates based on the selected columns. From there the duplicates can be easily deleted.
So it will work for SharePoint, Excel, SQL, Dataverse, API GET/DELETE calls to external datasets, and any other dataset used in Power Automate.
Example...
Use any get data action to pull the data into the flow. Use built in action functions like Order By or later use a Power Automate Sort( ) expression and Power Automate Reverse expression to format your data in ascending or descending order by your chosen column.
Sorted by date, Ascending will keep the oldest & delete the newest, while Descending will keep the newest & delete the oldest.
Sorted by number, Ascending will keep the smallest & delete the largest, while Descending will keep the largest & delete the smallest.
Move to the Select DuplicateCheckFields action and fill the column name & dynamic content value for each data source column you want to include in the check for duplicates. Only records where all listed columns match will count as duplicates.
The middle actions will process the records, identify the duplicate records, & return a JSON array of only duplicate records.
Put whatever action you will use to delete the records inside the loop and use items('Apply_to_each')?['InsertColumnNameHere'] to reference fields within the apply to each loop. For example, this SharePoint Delete item action just required the ID primary key field for each item to delete, so I used items('Apply_to_each')?['ID'] in the expression editor on the Id input.
Starting Demo SharePoint List
Final Result on The Demo SharePoint List
Rows with IDs 14, 16, & 19 were removed as both the Title & Column 2 fields on those rows matched other rows in the dataset. Notice rows with IDs 15 & 20 were not removed. Although they contain duplicate Title values, the Column 2 values were still unique for those rows.
Thanks for any feedback, & please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86)
V1.1 added a Reverse( ) expression to the processing actions to make the records kept by ascending or descending sort more intuitive for the user-inputs.
For anyone having trouble with the legacy import method, here is a post on an alternate import method: Re: Find and Remove Duplicates - Page 3 - Power Platform Community (microsoft.com)
watch?v=49xj4cN6OFI
Sorry, can't find how to edit my last post. Here is what I see for the Enter value for 'RecordJSON' key
The expression for the RecordJSON value is...
AddProperty(item(), 'MakeUniqueGUID', guid())
This and the similar flow you posted are elegant and simple solutions to one of the major issues ( imo) with Dynamics and Dataverse.
I've been able to follow along, more or less, using the definition.json file as a roadmap, but one thing is tripping me up. How are you passing the output of a find rows API call to the For Each block its contained in? I tried enumerating rows then passing that output into the iterator block, but this throws an error.
Wait, I just realized it's a scope block - I think I've got it now, thank you!
I have the same problem. When I enter the expression of RecordJSON it only gives me an error. And in the sample flow it is shown like this. Thank you for helping...
HI @takolota ,
hope you are doing fine. Still working on your workflow. 🙂
With the delete item I get an error which I don't know how to solve. The column "E-Mail" is my parameter to find all the duplicates. In this test it should be 5 duplicates which is shown correctly and the email address is also shown under the black bar. Thanks for your help!
You can cntrl + c copy this to your clipboard, then go to create a new action in your flow, go to the My clipboard tab, then press cntrl + v to paste it into the clipboard tab & select it.
{"id":"d59f1fe4-6ebf-470b-9c1d-7779c42081a3","brandColor":"#8C3900","connectionReferences":{"shared_sharepointonline":{"connection":{"id":"/providers/Microsoft.PowerApps/apis/shared_sharepointonline/connections/shared-sharepointonl-60d1f27e-bd8b-43f2-a8ad-a7be60200f79"}}},"connectorDisplayName":"Control","icon":"","isTrigger":false,"operationName":"Find_And_Remove_Duplicates","operationDefinition":{"type":"Scope","actions":{"Get_items":{"type":"OpenApiConnection","inputs":{"host":{"connectionName":"shared_sharepointonline","operationId":"GetItems","apiId":"/providers/Microsoft.PowerApps/apis/shared_sharepointonline"},"parameters":{"dataset":"@outputs('Placeholder_Please_Delete_After_Import')","table":"@outputs('Placeholder_Please_Delete_After_Import')","$orderby":"Created asc"},"authentication":{"type":"Raw","value":"@json(decodeBase64(triggerOutputs().headers['X-MS-APIM-Tokens']))['$ConnectionKey']"}},"runAfter":{},"description":"Pagination turned on & set to 100000 to get all records up to 100000. Order By ascending (asc) will keep the oldest/smallest & delete the newest/largest, Order By descending (desc) will keep the newest/largest & delete the oldest/smallest.","runtimeConfiguration":{"paginationPolicy":{"minimumItemCount":100000}},"metadata":{"operationMetadataId":"8c675d9e-b2a8-46f6-9864-7f46f72bc7e6"}},"Find_duplicates":{"type":"Scope","actions":{"Select_DuplicateCheckFields":{"type":"Select","inputs":{"from":"@outputs('Get_items')?['body/value']","select":{"RecordJSON":"@AddProperty(item(), 'MakeUniqueGUID', guid())","Title":"@item()?['Title']","Column2":"@item()?['Column2']"}},"runAfter":{},"description":"Do not remove the RecordJSON line in the Map table. Fill the column name & dynamic content value for each data source column you want to include in the check for duplicates. Only records where all listed columns match will count as duplicates.","metadata":{"operationMetadataId":"218007cf-3241-4ebf-906c-563b82e36850"}},"Filter_array_Get_duplicate_records":{"type":"Query","inputs":{"from":"@body('Select_ResetToRecordWithIsDuplicateField')","where":"@equals(item()?['IsDuplicate'], 1)"},"runAfter":{"Select_ResetToRecordWithIsDuplicateField":["Succeeded"]},"description":"Filter to only duplicates. If the original get data action sorts ascending it keeps the oldest/smallest & deletes the newest/largest. If the original get data action sorts descending it keeps the newest/largest & deletes the oldest/smallest.","metadata":{"operationMetadataId":"04080a21-d0d1-4018-9d1b-cb6411877800"}},"Select_ReformatRecordAndDuplicateChecks":{"type":"Select","inputs":{"from":"@reverse(body('Select_DuplicateCheckFields'))","select":{"RecordJSON":"@item()?['RecordJSON']","DuplicateFieldsJSON":"@removeProperty(item(), 'RecordJSON')"}},"runAfter":{"Select_DuplicateCheckFields":["Succeeded"]},"description":"Separates each item into the RecordJSON object & an object containing all the columns used in the duplicates check. Reverse ordering to make user order input for keeping records more intuitive.","metadata":{"operationMetadataId":"323c8da9-040a-47db-8d13-07d3bd68c1d7"}},"Select_ResetToRecordWithIsDuplicateField":{"type":"Select","inputs":{"from":"@body('Select_ReformatRecordAndDuplicateChecks')","select":"@addProperty(item()?['RecordJSON'], 'IsDuplicate', if(greater(length(split(join(skip(split(string(body('Select_ReformatRecordAndDuplicateChecks')), string(item())), 1), ''), string(item()?['DuplicateFieldsJSON']))), 1), 1, 0))"},"runAfter":{"Select_ReformatRecordAndDuplicateChecks":["Succeeded"]},"description":"For each, takes the original record data from the RecordJSON object & adds an IsDuplicate field, calculating if the DuplicateFieldsJSON columns object repeats in the remainder of the dataset - marking each item as a duplicate 1 or not duplicate 0.","metadata":{"operationMetadataId":"e8d367b9-f6a9-472f-899b-716cfa5b650b"}}},"runAfter":{"Get_items":["Succeeded"]},"description":"Take in a JSON array & columns to check for duplicates. Output a JSON array of only the duplicate records found for those columns.","metadata":{"operationMetadataId":"0c3ee8e3-6aec-4537-988e-f8849d28bca6"}},"Apply_to_each":{"type":"Foreach","foreach":"@body('Filter_array_Get_duplicate_records')","actions":{"Delete_item":{"type":"OpenApiConnection","inputs":{"host":{"connectionName":"shared_sharepointonline","operationId":"DeleteItem","apiId":"/providers/Microsoft.PowerApps/apis/shared_sharepointonline"},"parameters":{"dataset":"@outputs('Placeholder_Please_Delete_After_Import')","table":"@outputs('Placeholder_Please_Delete_After_Import')","id":"@items('Apply_to_each')?['ID']"},"authentication":{"type":"Raw","value":"@json(decodeBase64(triggerOutputs().headers['X-MS-APIM-Tokens']))['$ConnectionKey']"}},"runAfter":{},"description":"items('Apply_to_each')?['ID']","metadata":{"operationMetadataId":"6f676b71-f7e9-447c-ae3f-e5544bdb9bf7"}}},"runAfter":{"Find_duplicates":["Succeeded"]},"description":"Use items('Apply_to_each')?['InsertColumnNameHere'] to reference fields within the apply to each loop.","metadata":{"operationMetadataId":"7aaa57db-381b-4442-8ee0-956a4b5aea5f"}}},"runAfter":{"Placeholder_Please_Delete_After_Import":["Succeeded"]}}}
Then this Select action expression will likely be messed up, so you can remove/erase the expression, select the Map toggle on the middle right of the action twice to remove the disabled grey shade & bring back the editable single box input. Then you can input the following expression again to fix it...
addProperty(item()?['RecordJSON'], 'IsDuplicate', if(greater(length(split(join(skip(split(string(body('Select_ReformatRecordAndDuplicateChecks')), string(item())), 1), ''), string(item()?['DuplicateFieldsJSON']))), 1), 1, 0))
Hello, Im coping and pasting the last bit of instructions and keeping getting expression is invalid. they only this i changed to the template is the Map on Select Duplicatecheckfields in row 2 and 3 to my column names and the data correlating with those rows from dynamic content.