Change CSV Delimiter

takolota · ‎02-01-2022

Hello All,

I ran into some issues when pulling CSV data in from Power Automate Desktop because of commas in the actual data. I wanted to change the delimiter so I could more easily parse the data in a single Select action without commas in the actual data messing things up. I also may be parsing CSV files with hundreds or thousands of rows, so I didn’t want to use all my daily actions on this in a slow Apply to each loop.

Attached is the scope/flow I built so anyone can easily select their CSV data that has quotes around the comma-containing records, enter a new delimiter, and get the new delimiter separated data from the final compose action without the usual errors. And it only takes a few actions to do this, even on very large files.
I've found that many CSV files don't put quotes around their records with in-data commas, and this only works when there are quotes around those records. But if the file is saved as a text file, then it often puts quotes around the right records.
If you are using Power Automate Desktop, program the file to be saved as .txt and read that into your output variable.

It’s currently set to handle up to 50 comma-containing columns, but you can expand that to as many columns as needed by adding extra lines & expressions to the 1st Select action. Just follow the pattern by replacing some of the array numbers, like [50] with [51].

Also if your data has more unique values like an array with mixed quoted string data, Ex: ["string1", 2, 03/05/2022, "string3"], then this will create errors in the output.

The template for parsing CSV to JSON & entering it into a dataset uses the same change delimiter set-up: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/CSV-to-Dataset/m-p/1508191#M584

*Copying the template scope into another flow may not work as it may mess up the expressions. You may need to start with a copy of this template and copy & paste the rest of your flow into this template flow.

Thanks for any feedback, & please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86)

Version 3 Uploaded 03/26/2022 (Adjusted the 1st Select input so it can now also deal with in-data commas in the 1st column. Added more lines to the 1st Select so it can now handle up to 50 columns with commas in them.)

Google Drive Link: https://drive.google.com/file/d/11uT15hXY0VjnOKDvFxdVgkuMtXqTmA0c/view?usp=sharing

Version 4 Uploaded 04/09/2022

(More minor fixes & additions.

I adjusted several expressions so it can now handle a few more scenarios with arrays in the CSV data. It should handle any array that doesn't include double quotes and any array that is all strings with double quotes, so ["String1", "String2", "String3"], but it will have issues if it is a mixed array with some double-quoted strings and some other values, for example ["String", 4, 03/05/2022, "String2"] won't work.

I also adjusted how the LineBreak setting is set-up so it now uses the /r/n for the LineBreak. I also provided this link in the flow so anyone can look up the right string for the decodeUriComponent expression(s) if they happen to have different LineBreak characters. This change also made it possible to differentiate between in-data line-breaks and CSV row line-breaks on the files I tested, so it should now replace the in-data line-breaks, like the multiple-choice fields some sites use, with semi-colons. That should make those records much easier to deal with & parse in later actions.
I also looked over a problem with in-data trailing commas. I added a line in the settings where anyone can toggle whether they want it to adjust for trailing OR leading commas in the data, it just can't handle both in one dataset. So if one column in one row has ",String1 String2" and another column in another row has "String 3 String4," then it will have errors.)

Google Drive Link: https://drive.google.com/file/d/1ZbhFGVKHSpaH2Duv8qXwMnNww8czdgc4/view?usp=sharing

Version 5

More adjustments for comma edge cases and for instances with one comma-containing value following another in the file data.

Google Drive Link: https://drive.google.com/file/d/1il_wI9fJRk11YaI4EPQvk2efrbBNRBwr/view?usp=sharing

Update 06/01/2022
Microsoft Power Platform & Paul Murana recently did a video demonstration of how to handle CSV & other files in dataflows: https://youtu.be/8IvHxRnwJ7Q
But it currently only outputs to a dataverse or Dataverse for teams table.

cc_c · ‎06-09-2023

OK thank you it works well !! 🙂

A last thing : do you know how to change the manual trigger to a scheduled one ?

takolota · ‎06-09-2023

@cc_c

Go to the 3 dots … at the to right of the trigger action, select delete, then find the Recurrence trigger.

cc_c · ‎06-09-2023

Super ! Thank you very much ! It's been one week I try to fix this problem !

biboy852 · ‎09-19-2023

Hi, I'm using Power automate on web and trying to run the flow based on copy to clipboard action. However, here is what I gets...

Lonewalker25 · ‎09-22-2023

HI,

Your workflow works great! That said, if a column has some text ending with a dot + line break, this workflow will fail. could you tell me how to handle such a scenario? Please see the example below. In row 3, column B, a user unintentionally pressed alt-enter after typing the text, resulting in an extra line. and that breaks the workflow.

Thanks!

takolota · ‎09-22-2023

@Lonewalker25

The defaults for the template are set to check for a combination of a carriage return & a linebreak so /r/n to split the rows.
Does the user-inserted linebreak include a carriage return too? Or did you change the default because the line delimiters for your CSV actually only use a single /n linebreak & thus the actual line delimiter matches a user-entered linebreak?

Unfortunately if the line delimiter exactly matches some text or thing a user has entered, I do not know of a simple way to fix that as I don't see any simple way to distinguish the two instances of characters.

Lonewalker25 · ‎09-26-2023

Thanks, takolota for your quick reply.

The CSV we use is generated from a software application. Column B3 in my screenshot is a "free-text" field from the application. Many people have the habit of doing a carriage return after they type a paragraph of text, and B3 is a result of that.

In Power Autoamte, the issue occurs right after the native "base64ToSTring" function. As you said, there's no easy way to parse the data correctly, but there could be a way to fix the data after it is parsed.

This is how the data looks like after the base64ToString step:

data1,I also looked over a problem with in-data trailing comm,ABC123

data2,"decodeUriComponent expression(s)

 ",EDF443

data3,just can't handle both in one dataset,GXG421

takolota · ‎09-27-2023

~~@Lonewalker25~~

You know, I bet one could do some rough workarounds. Right after all lines are split by the line linebreak, one could use a Select action with the From set to a range(0, length(OutputWithAllLines)) to get a count for each line.

~~Then use the Map input to iterate over each with OutputWithAllLines?[item()] but when referencing that:~~

~~Check if the current item line length is equal to the length of the header line, if yes just return OutputWithAllLines?[item()], if no continue...~~

... check the length of the previous item count's line to see if it is equal to the header line length, if yes return a blank "", if no then concat(OutputWithAllLines?[item()], OutputWithAllLines?[sub(item(),1)]) to combine the current item line with the previous item line.

~~Note- this would also require an outer condition to check if the current item() is 0, if yes then only execute if length equals header line length OutputWithAllLines?[item()], else return blank ""~~

~~Then have a Filter array action filter out any blanks~~

~~That should then handle some cases where there is an in-data line linebreak.~~
But it would only handle a single instance of that in-data set of characters in a single column. It would still break if there were multiple instances of those in-data characters. To adjust for more instances would require an ever increasing number of expressions if one wanted to keep it fast & in a Select action.
I'm sure there is another way to handle it more slowly & only for smaller datasets with a Do Until loop & something that would only write the next line to an array variable once a concatenation of any number of lines met the length of the header line.

Or maybe something could be done by joining all lines together on the column delimiter & then chunk that big array into an array of arrays the same length as the header line split. That way each chunk should contain each real line that could then be joined in a Select.

~~I’ll have to check if that can fit/integrate in after the in-data commas are handled though as it relies on a length calculation that relies on only real delimiter characters remaining.~~

Or maybe something could be done by putting many of the previous lines in each Select Map instance as long as their calculated item number isn’t less than 0 and if the current item line length does not equal the header line length, then check if the next item line length equals the header line length or if the current item is the last item, if yes then join many of the next lines on the column delimiter & take only the header line length number of things from a column delimiter split on that join, if no then return a blank "". All else if the current item line length does match the header length, then just return the current item line.
~~Then remove blanks "".~~
~~But I'd have to see if anything like that could fit in after the column delimiter change, as it relies on all the in-data column delimiters being handled before-hand.~~

Amirin · ‎10-13-2023

Hi @takolota .

Do you have a solution for lonewalker problem already? I have same problem, in a column there is a line break. your flow will detect it as a new line..

And this is the result..

takolota · ‎10-13-2023

@Amirin

No, as you might see above I thought I had a path forward with it, but continued thinking on it & kept finding errors in the logic/things that mean it wouldn’t work.