cancel
Showing results for 
Search instead for 
Did you mean: 
Reply

Wildcard Matching 2024

Hi,

 

First of all, I'm not a BI user, I do support for Excel.

 

Wildcard comparisons in PQ are a real challenge and everything I've found so far are rudimentary solutions or workarounds to the actual problem. And everything was created several years ago.

 

Now I've tried it myself and found a solution that works (for my purposes), but it contains at least one error.

 

The code below supports the wildcards "*", "?" and "#". One problem is when the search pattern contains a "*" before a "#" and I just can't see a solution to the problem. For example, comparing "08erig1" and "*8*#*1" should give a FALSE as result...

 

If anyone has an idea how to fix the error or has a better solution, I'm grateful for any suggestions.
Or if anyone notices any other problems, I'd like to know about them too. 😀

 

Andreas.

 

 

 

 

let
  fnCompareWildcards = (expression as any, pattern as any, optional comparer as nullable function) as logical =>
    let
      pList = Text.ToList(if pattern=null then "" else pattern), 
      tList = Text.ToList(if expression=null then "" else expression), 
      pCount = List.Count(pList), 
      tCount = List.Count(tList), 
      Compare = if comparer = null then Comparer.OrdinalIgnoreCase else comparer, 
      Match = (pIndex as number, tIndex as number) as logical =>
        if pIndex >= pCount then
          tIndex >= tCount
        else if tIndex >= tCount then
          pList{pIndex} = "*"
        else if pList{pIndex} = "?" then
          @match(pIndex + 1, tIndex + 1)
        else if pList{pIndex} = "*" then
          if pIndex=pCount then
            true
          else
            List.AnyTrue(List.Transform({tIndex..tCount}, each @match(pIndex + 1, _)))
        else if pList{pIndex} = "#" then
          if pIndex=pCount then
            List.Contains({"0".."9"}, tList{tCount})
          else
            if List.Contains({"0".."9"}, tList{tIndex}) then
              @match(pIndex + 1, tIndex + 1)
            else
              false
        else
          not Logical.From(Compare(tList{tIndex}, pList{pIndex})) and @match(pIndex + 1, tIndex + 1), 
      result = Match(0, 0)
    in
      result,
documentation = [
  Documentation.Name =  "fnCompareWildcards",
  Documentation.Description = "Compares expression with pattern",
  Documentation.LongDescription = Documentation.Description,
  Documentation.Category = "String",
  Documentation.Source = "andreas.killer@gmx.net",
  Documentation.Version = "1.0",
  Documentation.Author = "Andreas Killer",
  Documentation.Examples = {[
    Description = "
fnCompareWildcards(""Andreas Killer 1964"", ""*as*er?19##"")
", 
    Code = "
expression : The text to compare
pattern    : The pattern containing wildcards as follows:

Char - Matches in expression
   ? - Any single character
   * - Zero or more characters
   # - Any single digit (0..9)

comparer : A Comparer used for controlling the comparison, if omitted Comparer.OrdinalIgnoreCase.
", 
    Result = "
true or false"]}]
in  
  Value.ReplaceType(fnCompareWildcards, Value.ReplaceMetadata(Value.Type(fnCompareWildcards), documentation))

 

 

 
 

 

 

 
 

 

 

18 REPLIES 18

Hi Ron,

 

Thank you for the code change and the tip for my function, I made a few changes, test file updated. I have expanded the generator and included special characters that can appear in file paths.

 

Although the main use is to compare file paths and file names, I would still like to be able to compare unstructured data and thus numbers. I agree with you, this could be avoided by changing the data type. However, if I do it within the function, there are no significant disadvantages, but the function is more universally applicable.

 

At the moment it seems as if I only have to fix one problem with my function. But my experience tells me that I will encounter more.

 

Your routine already works much better, but there seems to be a problem when calling RegEx, the function often fails with an error.

 

Even though I am close to the goal and RegEx is painfully slow, I (and not only me) would appreciate it if you have a solution for this. I am sure that our codes will be used by many followers for a long time to come.

 

Before I wrote this post, I spent a few days researching and having conversations with ChatGPT, Gemini and Copilot... without any really useful results. A wildcard comparison is a real problem and is often requested, but I have not really found any working solutions or code in any high-level languages. There are only partial solutions and a lot of workarounds. RegEx is also often recommended, but there is no RegEx routine that accepts a wildcard pattern anywhere.

 

Thanks again for taking your time into this.

 

Andreas.

You write: "when calling RegEx, the function often fails with an error.".

Using my routine modified as above, I found no errors, even on the 10000 row Data table.

And I had removed your line in your test code that was filtering out the errors.

 

Here is my current function code: I have it set to case-insensitive, but it could be changed so that the case sensitivity is an option

(text,regex)=>

let 
    regList = Text.ToList(regex),
    translate = 
        List.ReplaceMatchingItems(regList, List.Zip({{"*","#","?","."}, {".*","[0-9]",".","\\."}})),
    reg = "\\b" & Text.Combine(translate,"") & "\\b",


    fx = Web.Page(
        "<script>
            var x='"&Text.From(text)&"';
            var y=new RegExp('"& reg &"','i');
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
    fx <> "null"

 

and here is the current "test" code: (with the #"Replaced Errors" step commented out)

let
    Source = Data,
    #"Invoked Custom Function" = Table.Buffer(Table.AddColumn(Source, "fnRegExWildcard", each fnRegexExtr([text], [mask]))),
    //#"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function", {{"fnRegExWildcard", "Error"}}),
    #"Added Custom" = Table.AddColumn(#"Invoked Custom Function", "Correct", each [fnLIKE]=[fnRegExWildcard])
in
    #"Added Custom"

I would appreciate data that produces the errors.

You wrote: Using my routine modified as above, I found no errors, even on the 10000 row Data table.

 

If you open my text file as is, you can see that I got errors and the error is always the same:

Expression.Error: The operation could not be completed because the enumeration did not contain enough elements.

Even with your updated code, I get a lot of wrong results:

 

As you can see I have a German (DE) system, I that an issue for RefEx?
Which locale do you have?

 

Andreas.

Hi Ron,

I tested your routine on Win10 English (USA) system and get a lot of errors and wrong results.

 

It seem I got my routine working, I have to test many regular pattern during the next days, fingers crossed.

 

PQ function here:
https://www.dropbox.com/scl/fi/nw4nwwxa4g99dvbm0vl0s/fnCompareWildcards.pq?rlkey=kw8ze79eytk7uzdpm4m... 

 

Test file here:
https://www.dropbox.com/scl/fi/hnv86prqiewyajra5h0yr/CompareRegEx.xlsm?rlkey=70ge7534tuugp17ottiw2ww... 

 

Seems my routine is several 1000% faster then RegEx!

 

Andreas.

Glad you got yours working as it is clearly faster with large data.

 

Not sure what happened to my last response, but the reason for the errors and mismatches in my regex solution is two-fold.

  • There are characters in some of your masks (other than the dot) that have a special meaning in regex and need to be escaped. EG "( ) { } [ ] +"
  • I used the word boundary token to ensure that only whole words were returned. But some of your text strings begin or end with a non-Word character, so the word boundary is not at the beginning/end of the string. Would need to know a bit more about your data in order to properly translate this. (A word character is anything in the set "[A-Za-z0-9_]"). 

Hi Ron,

 

The main use is to filter paths and/or file names in order to reduce the amount of data loaded into the data model.

 

An example of file names in a folder:
Report 2023 Qtr-1.xlsx
Report 2023-Qtr2.xlsm
Whatever.pdf
Report 2023 Quarter 3.xlsx
Report 2024 Quarter1.xlsx
Report 2024 Qtr2.xlsx
Data 2024.xlsx
Report 2024 Q3.xlsx
Report 2024 Q4.xlsx

 

A user could request:
a) Analyze all reports from 2023 => Rep*2023*.xls* =>
Report 2023 Qtr-1.xlsx
Report 2023-Qtr2.xlsm
Report 2023 Quarter 3.xlsx

 

b) Compare the second quarter of all years => Rep*####*Q*2* =>
Report 2023-Qtr2.xlsm
Report 2024 Qtr2.xlsx

 

Here is a reduced example from our company, as you see it's the worst:
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 30.05.13 KW22.xlsx
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 31.01.13 KW05.xlsx
I:\LOG\Kapazitätsplanung\2013\offene Auftragszeiten 31.10.13 KW44.xlsx
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW24-01.xlsm
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW25.xlsm
I:\LOG\Kapazitätsplanung\2014\Kapazitätsplanung-KW26.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 27.02.14 KW09.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 27.03.14 KW13.xlsm
I:\LOG\Kapazitätsplanung\2014\offene Auftragszeiten 29.01.14 KW05.xlsx
I:\LOG\Kapazitätsplanung\2015\Fertigungsliste ab 07.04.15.xlsx
I:\LOG\Kapazitätsplanung\2015\Gesamt VK-Aufträge 2014.xlsx
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW11.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW12.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW12_alt.xlsm
I:\LOG\Kapazitätsplanung\2015\Kapazitätsplanung-KW13.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-17.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-18 - neu.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-18.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-20.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-22.xlsm
I:\LOG\Kapazitätsplanung\2024\Kapazitätsplanung-KW24-24.xlsm
I:\LOG\Kapazitätsplanung\2024\Produktion Kapazitätsgrobplanung 2024.xlsx
I:\LOG\Kapazitätsplanung\2024\Produktion Kapazitätsgrobplanung 35000000.xlsx

 

I have 1604 files in just that folder... welcome to my world of crap data. 😆

 

We produce systems for the food industry worldwide, and almost every customer has different requirements. If you have a new request, you can remember "yes, we've done something like that for customer XY before." Where is the data for it?

 

And in most companies it's not much different, you start somehow and then over time you develop a system that grows.

 

Andreas.

That explains your problem better.

 

A suggestion: Instead of relying on the user to construct a valid wild-card "like" input, why not have them submit the parameters utilizing a User Form. Then you'd get valid data input and your filtering code would probably be much simpler.

That means I would have to split the process into PQ.

 

I would first have to read in all the files with PQ, then call up the user form, then filter the data with VBA and LIKE, write the result into a table, read this into PQ and now I can continue with the actual data import.

 

IMHO this is quite a big detour and fraught with further difficulties. Thanks, but no thanks.

Let’s stick to the topic: Wildcard comparison.

 

During my research I often read: Use RegEx. But I have never seen how to filter real-life data with RegEx.

 

Don't get me wrong, you can make amazing comparisons with RegEx that you would otherwise have to write an entire parser for, very impressive.

 

The big problem with RegEx is the syntax, it looks like a rattlesnake has walked across the screen, no normal user understands it and even for me as an experienced programmer it is too complicated. A wildcard comparison, on the other hand, is simple and intuitive to use.

 

And it seems as it is too difficult to do a wildcard comparison with file paths using RegEx for you too. (Don’t get me wrong, I didn’t manage it either). Or is there an update that I missed?

 

Andreas.

a. I stopped working on the regex when you reported you had your routine working, and also that the regex was considerably slower.  But if your routine is not returning what you need, I'll be happy to finish up the regex now that I have more information.

 

b. I didn't realize you would have to read in all the files if you chose to use a user form. My idea of that was just to replace whatever you are using now to obtain the wild card string from the user, with a form that had a more structured input.

 

Helpful resources

Announcements

Community will be READ ONLY July 16th, 5p PDT -July 22nd

Dear Community Members,   We'd like to let you know of an upcoming change to the community platform: starting July 16th, the platform will transition to a READ ONLY mode until July 22nd.   During this period, members will not be able to Kudo, Comment, or Reply to any posts.   On July 22nd, please be on the lookout for a message sent to the email address registered on your community profile. This email is crucial as it will contain your unique code and link to register for the new platform encompassing all of the communities.   What to Expect in the New Community: A more unified experience where all products, including Power Apps, Power Automate, Copilot Studio, and Power Pages, will be accessible from one community.Community Blogs that you can syndicate and link to for automatic updates. We appreciate your understanding and cooperation during this transition. Stay tuned for the exciting new features and a seamless community experience ahead!

Summer of Solutions | Week 4 Results | Winners will be posted on July 24th

We are excited to announce the Summer of Solutions Challenge!   This challenge is kicking off on Monday, June 17th and will run for (4) weeks.  The challenge is open to all Power Platform (Power Apps, Power Automate, Copilot Studio & Power Pages) community members. We invite you to participate in a quest to provide solutions in the Forums to as many questions as you can. Answers can be provided in all the communities.    Entry Period: This Challenge will consist of four weekly Entry Periods as follows (each an “Entry Period”)   - 12:00 a.m. PT on June 17, 2024 – 11:59 p.m. PT on June 23, 2024 - 12:00 a.m. PT on June 24, 2024 – 11:59 p.m. PT on June 30, 2024 - 12:00 a.m. PT on July 1, 2024 – 11:59 p.m. PT on July 7, 2024 - 12:00 a.m. PT on July 8, 2024 – 11:59 p.m. PT on July 14, 2024   Entries will be eligible for the Entry Period in which they are received and will not carryover to subsequent weekly entry periods.  You must enter into each weekly Entry Period separately.   How to Enter: We invite you to participate in a quest to provide "Accepted Solutions" to as many questions as you can. Answers can be provided in all the communities. Users must provide a solution which can be an “Accepted Solution” in the Forums in all of the communities and there are no limits to the number of “Accepted Solutions” that a member can provide for entries in this challenge, but each entry must be substantially unique and different.    Winner Selection and Prizes: At the end of each week, we will list the top ten (10) Community users which will consist of: 5 Community Members & 5 Super Users and they will advance to the final drawing. We will post each week in the News & Announcements the top 10 Solution providers.  At the end of the challenge, we will add all of the top 10 weekly names and enter them into a random drawing.  Then we will randomly select ten (10) winners (5 Community Members & 5 Super Users) from among all eligible entrants received across all weekly Entry Periods to receive the prize listed below. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once overall. If they are drawn multiple times, another user will be drawn at random.  Individuals will be contacted before the announcement with the opportunity to claim or deny the prize.  Once all of the winners have been notified, we will post in the News & Announcements of each community with the list of winners.   Each winner will receive one (1) Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value). NOTE: Prize is for conference attendance only and any other costs such as airfare, lodging, transportation, and food are the sole responsibility of the winner. Tickets are not transferable to any other party or to next year’s event.   ** PLEASE SEE THE ATTACHED RULES for this CHALLENGE**   Week 1 Results: Congratulations to the Week 1 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Community MembersNumber of SolutionsSuper UsersNumber of Solutions @anandm08  23 @WarrenBelz  31 @DBO_DV  10 @Amik  19 AmínAA 6 @mmbr1606  12 @rzuber  4 @happyume  7 @Giraldoj  3@ANB 6 (tie)   @SpongYe  6 (tie)     Week 2 Results: Congratulations to the Week 2 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Community MembersSolutionsSuper UsersSolutions @anandm08  10@WarrenBelz 25 @DBO_DV  6@mmbr1606 14 @AmínAA 4 @Amik  12 @royg  3 @ANB  10 @AllanDeCastro  2 @SunilPashikanti  5 @Michaelfp  2 @FLMike  5 @eduardo_izzo  2   Meekou 2   @rzuber  2   @Velegandla  2     @PowerPlatform-P  2   @Micaiah  2     Week 3 Results: Congratulations to the Week 3 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge.   Week 3:Community MembersSolutionsSuper UsersSolutionsPower Apps anandm0861WarrenBelz86DBO_DV25Amik66Michaelfp13mmbr160647Giraldoj13FLMike31AmínAA13SpongYe27     Week 4 Results: Congratulations to the Week 4 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge.   Week 4:Community MembersSolutionsSuper UsersSolutionsPower Apps DBO-DV21WarranBelz26Giraldoj7mmbr160618Muzammmil_0695067Amik14samfawzi_acml6FLMike12tzuber6ANB8   SunilPashikanti8

Check Out | 2024 Release Wave 2 Plans for Microsoft Dynamics 365 and Microsoft Power Platform

On July 16, 2024, we published the 2024 release wave 2 plans for Microsoft Dynamics 365 and Microsoft Power Platform. These plans are a compilation of the new capabilities planned to be released between October 2024 to March 2025. This release introduces a wealth of new features designed to enhance customer understanding and improve overall user experience, showcasing our dedication to driving digital transformation for our customers and partners.    The upcoming wave is centered around utilizing advanced AI and Microsoft Copilot technologies to enhance user productivity and streamline operations across diverse business applications. These enhancements include intelligent automation, AI-powered insights, and immersive user experiences that are designed to break down barriers between data, insights, and individuals. Watch a summary of the release highlights.    Discover the latest features that empower organizations to operate more efficiently and adaptively. From AI-driven sales insights and customer service enhancements to predictive analytics in supply chain management and autonomous financial processes, the new capabilities enable businesses to proactively address challenges and capitalize on opportunities.    

Updates to Transitions in the Power Platform Communities

We're embarking on a journey to enhance your experience by transitioning to a new community platform. Our team has been diligently working to create a fresh community site, leveraging the very Dynamics 365 and Power Platform tools our community advocates for.  We started this journey with transitioning Copilot Studio forums and blogs in June. The move marks the beginning of a new chapter, and we're eager for you to be a part of it. The rest of the Power Platform product sites will be moving over this summer.   Stay tuned for more updates as we get closer to the launch. We can't wait to welcome you to our new community space, designed with you in mind. Let's connect, learn, and grow together.   Here's to new beginnings and endless possibilities!   If you have any questions, observations or concerns throughout this process please go to https://aka.ms/PPCommSupport.   To stay up to date on the latest details of this migration and other important Community updates subscribe to our News and Announcements forums: Copilot Studio, Power Apps, Power Automate, Power Pages

Users online (1,764)