可以单独提取每个组的出现,但不能作为重复组 [英] Can extract each occurrence of a group individually but not as a repeating group

查看:87
本文介绍了可以单独提取每个组的出现,但不能作为重复组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多文件的名称末尾都带有版本号.例如:

I have many files with version numbers as the last part of the name. For example:

Xxxxx V2.txt
Xxxxx V2.3.txt
Xxxxx V2.10.txt
Xxxxx V2.10.3.txt

我使用正则表达式提取版本号的各个部分,以便可以正确地对文件†进行排序,从而可以计算下一个版本号‡.

I use Regex to extract the parts of the version number so I can correctly sequence the files † and so I can calculate the next version number ‡.

†例如:V2.2在V2.10之前,而V2.2在V2.2.3之前.

† For example: V2.2 comes before V2.10 and V2.2 comes before V2.2.3.

‡例如:V2.9之后的下一个版本是V2.10.

‡ For example: the next version after V2.9 is V2.10.

我可以分别处理每种样式的版本号,但是不能一概而论地为所有样式创建一个Regex模式.

I can process each style of version number individually but I cannot generalise to create one Regex pattern for all styles.

Text               Pattern                          Value(s) extracted
Xxxxx V2.txt       Xxxxx V(\d+)\.txt                2
Xxxxx V2.3.txt     Xxxxx V(\d+)\.(\d+)\.txt         2  3
Xxxxx V2.10.3.txt  Xxxxx V(\d+)\.(\d+)\.(\d+)\.txt  2  10  3
Xxxxx V2.10.3.txt  Xxxxx V(\d+){\.(\d+)}*\.txt      No match

我不明白为什么最后一个模式对每种样式的版本号都不起作用.任何指导表示赞赏.

I do not understand why the last pattern does not work for every style of version number. Any guidance appreciated.

新部分以回应评论

我希望Regex模式中有一个简单的错误,并且我的代码无关紧要.我整理了测试代码以创建:

I was hoping there was a simple mistake in my Regex pattern and that my code was irrelevant. I tidied up my test code to create:

Sub CtrlTestCapture()

  Dim Patterns As Variant
  Dim Texts As Variant

  Texts = Array("Xxxxx V12.txt", _
                "Xxxxx V12.3.txt", _
                "Xxxxx V12.4.5.txt", _
                "Xxxxx V12.4.5.3.txt")

  Patterns = Array("Xxxxx V(\d+)\.txt", _
                   "Xxxxx V(\d+)\.(\d+)\.txt", _
                   "Xxxxx V(\d+)\.(\d+)\.(\d+)\.txt", _
                   "Xxxxx V(\d+){\.(\d+)}+\.txt", _
                   "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt" , _
                   "Xxxxx V(\d+)(\.(\d+))*\.txt")

  Call TestCapture(Patterns, Texts)

End Sub
Sub TestCapture(ByRef Patterns As Variant, ByRef Texts As Variant)

  Dim InxM As Long
  Dim InxS As Long
  Dim Matches As MatchCollection
  Dim PatternCrnt As Variant
  Dim RegEx As New RegExp
  Dim SubMatchCrnt As Variant
  Dim TextCrnt As Variant

  With RegEx
    .Global = True         ' Find all matches
    .MultiLine = False     ' Match cannot extend across linebreak
    .IgnoreCase = True

    For Each PatternCrnt In Patterns
     .Pattern = PatternCrnt

      For Each TextCrnt In Texts
        Debug.Print "==========================================="
        Debug.Print "   Pattern: """ & PatternCrnt & """"
        Debug.Print "      Text: """ & TextCrnt & """"
        If Not .test(TextCrnt) Then
          Debug.Print Space(12) & "Text does not match pattern"
        Else
          Set Matches = .Execute(TextCrnt)
          If Matches.Count = 0 Then
            Debug.Print Space(12) & "Match but no captures"
          Else
            For InxM = 0 To Matches.Count - 1
              Debug.Print "-------------------------------------------"
              With Matches(InxM)
                Debug.Print "     Match: " & InxM + 1
                Debug.Print "     Value: """ & .Value & """"
                Debug.Print "    Length: " & .Length
                Debug.Print "FirstIndex: " & .FirstIndex
                For InxS = 0 To .SubMatches.Count - 1
                  Debug.Print "  SubMatch: " & InxS + 1 & " """ & .SubMatches(InxS) & """"
                Next
              End With
            Next
          End If
        End If
      Next
    Next
    Debug.Print "==========================================="

  End With

End Sub

使用此代码,WiktorStribiżewregex模式产生的结果要比我不整洁的代码更好.我将必须查看原始代码以查找错误.使用此代码,WiktorStribiżewregex模式的输出为:

With this code, the Wiktor Stribiżew regex pattern produced better results than with my untidy code. I will have to review my original code to locate my mistake. With this code, the output for the Wiktor Stribiżew regex pattern is:

===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.txt"
    Length: 13
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ""
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.3.txt"
    Length: 15
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 "3"
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.4.5.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.txt"
    Length: 17
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 "4"
  SubMatch: 3 "5"
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.4.5.3.txt"
            Text does not match pattern
===========================================

这具有固定数量的捕获,而不是我尝试的可变数量.我还必须弄清楚如何将其扩展到处理"12.4.5.3",这是我见过的最复杂的版本号样式.这不是完美的方法,但绝对是我当前解决方法的改进.您正在使用我不认识的正则表达式字符,因此需要仔细研究.

This has a fixed number of captures rather than the variable number I was attempting. I will also have to work out how to extend it to process "12.4.5.3" which is the most complicated version number style I have ever seen. This is not perfect but it is definitely an improvement on my current workaround. You are using Regex characters I do not recognise so I will need to study this carefully.

使用上面的代码,Twi regex模式产生了以下输出:

With the above code, the Tiw regex pattern produced this output:

===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.txt"
    Length: 13
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ""
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.3.txt"
    Length: 15
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".3"
  SubMatch: 3 "3"
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.4.5.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.txt"
    Length: 17
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".5"
  SubMatch: 3 "5"
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.4.5.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.3.txt"
    Length: 19
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".3"
  SubMatch: 3 "3"
===========================================

也就是说,它似乎总是可以捕获:第一部分,最后一个部分(包括点)和最后一个不包含点的部分.很有希望,但还不够.

That is, it always seems to capture: the first part, the last part including the dot and the last part without the dot. Promising but not quite there.

第3部分

我忽略了要求明确说明我寻求的结果的请求.

I had overlooked the request for a clear explanation of the result I seek.

我在所有重要文件上使用版本号.我从其他人那里收到文件,其中包含版本号,其中一些比我的复杂得多.我始终以版本号作为文件名的最后一部分,并且在版本号之前始终以"V"开头.如果我收到的文件不符合我的格式,则我将它们重命名,以确保它们符合我的格式.所以我有一些文件,例如:

I use version numbers on all my important files. I receive file from others that include version numbers some of which are a lot more complicated than mine. I always have the version number as the last part of the filename and I always have a "V" before the version number. If I receive files that do not conform to my format, I rename them so they do. So I have files with names like:

  • Xxxxx VN.xxx
  • Xxxxx VN.N.xxx
  • Xxxxx VN.N.N.xxx
  • Xxxxx VN.N.N.N.xxx

我希望将Ns提取到可变长度数组或集合中,以便可以使用通用例程来处理它们.实际上,我已经有了那些通用例程.这些例程依赖于提取Ns的一些凌乱的VBA代码.我以为使用Regex可以整理代码.

I wish to extract the Ns to a variable length array or a collection so I can process them using general-purpose routines. In fact, I already have those general-purpose routines. These routines rely on some messy VBA code that extracts the Ns. I thought using Regex would allow me to tidy up my code.

推荐答案

尝试此正则表达式:

V(\d+(?:\.\d+)*)\.txt$

所需的版本已在组1中捕获.您可以使用.

The required version is captured in Group 1. You can further split the contents of Group 1 with a .

点击演示

Click for Demo

代码:

Dim objReg, strFile, objMatches, strVersion, arrVersion
strFile = "Xxxxx V2.3.txt"
Set objReg = New RegExp
objReg.Global = True
objReg.Multiline = True
objReg.Pattern = "V(\d+(?:\.\d+)*)\.txt$"

If objReg.Test(strFile) Then
    Set objMatches = objReg.Execute(strFile)
    strVersion =  objMatches.item(0).submatches.item(0)   'To get the full version number
    arrVersion = Split(strVersion,".")                    'To get each number in the version(stored in array)
End If

正则表达式说明:

  • V(\d+(?:\.\d+)*)\.txt$
  • V-匹配V
  • (\d+(?:\.\d+)*)-匹配1+次出现的数字.匹配尽可能多的数字后,匹配0个或多个出现的点.,后跟1+个数字.整个匹配项在第1组中捕获,是您所需的版本号
  • \.txt-匹配.txt
  • $-声明行的结尾.
  • V(\d+(?:\.\d+)*)\.txt$
  • V - matches V
  • (\d+(?:\.\d+)*) - matches 1+ occurrences of a digit. After matching as many digits as possible, match 0 or more occurrences of a dot . followed by 1+ digits. This whole match is captured in Group 1 and is your required version number
  • \.txt - matches .txt
  • $ - asserts the end of the line.

这篇关于可以单独提取每个组的出现,但不能作为重复组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆