使用正则表达式提取括号之间的文本 [英] Extracting Text Between Brackets with Regex

查看:922
本文介绍了使用正则表达式提取括号之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用类似的句子

"[x] Alpha

[33] Beta"

我将一组带括号的数据提取为([x],[33])

使用VBA正则表达式模式:

"(\[x\])|(\[\d*\])"

我不能直接将未括弧的数据数组提取为(x,33)

使用网络资源建议进行模式

"(?<=\[)(.*?)(?=\])"

这是VBA特有的问题(即对其Regex实施的限制)吗? 还是我误解了向前和向后看"的模式?

Public Function Regx( _
  ByVal SourceString As String, _
  ByVal Pattern As String, _
  Optional ByVal IgnoreCase As Boolean = True, _
  Optional ByVal MultiLine As Boolean = True, _
  Optional ByVal MatchGlobal As Boolean = True) _
  As Variant

Dim oMatch As Match
Dim arrMatches
Dim lngCount As Long

' Initialize to an empty array
arrMatches = Array()
With New RegExp
    .MultiLine = MultiLine
    .IgnoreCase = IgnoreCase
    .Global = MatchGlobal
    .Pattern = Pattern
    For Each oMatch In .Execute(SourceString)
        ReDim Preserve arrMatches(lngCount)
        arrMatches(lngCount) = oMatch.Value
        lngCount = lngCount + 1
    Next
End With


Sub testabove()
    Call Regx("[x] Alpha" & Chr(13) & _
      "[33] Beta", "(\[x\])|(\[\d*\])")
End Sub

解决方案

在子模式周围进行捕获,这些子模式将为您获取所需的值.

使用

"\[(x)\]|\[(\d*)\]"

(或\d+,如果您需要匹配至少一位数字,因为*表示零次或多次出现,而+表示一次或多次出现 >).

或者,使用通用模式提取不带方括号的方括号内的任何内容 :

"\[([^\][]+)]"

然后,通过检查子匹配长度来访问正确的Submatches索引(由于您有替换,因此子匹配中的任何一个都将为空),然后就可以了.只需使用

"\[([^\][]+)]"

更改您的for循环

For Each oMatch In .Execute(SourceString)
    ReDim Preserve arrMatches(lngCount)
    If Len(oMatch.SubMatches(0)) > 0 Then
        arrMatches(lngCount) = oMatch.SubMatches(0)
    Else
        arrMatches(lngCount) = oMatch.SubMatches(1)
    End If
    ' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
    lngCount = lngCount + 1
Next

In sentences like:

"[x] Alpha

[33] Beta"

I extract an array of bracketed data as ([x], [33])

using VBA regex Pattern:

"(\[x\])|(\[\d*\])"

I cannot extract directly the array of un-bracketed data as (x, 33)

using web resources advice for pattern

"(?<=\[)(.*?)(?=\])"

Is this a VBA specific problem (i.e. limits on its implementation of Regex) or did I misunderstand 'looking forward and backward' patterns?

Public Function Regx( _
  ByVal SourceString As String, _
  ByVal Pattern As String, _
  Optional ByVal IgnoreCase As Boolean = True, _
  Optional ByVal MultiLine As Boolean = True, _
  Optional ByVal MatchGlobal As Boolean = True) _
  As Variant

Dim oMatch As Match
Dim arrMatches
Dim lngCount As Long

' Initialize to an empty array
arrMatches = Array()
With New RegExp
    .MultiLine = MultiLine
    .IgnoreCase = IgnoreCase
    .Global = MatchGlobal
    .Pattern = Pattern
    For Each oMatch In .Execute(SourceString)
        ReDim Preserve arrMatches(lngCount)
        arrMatches(lngCount) = oMatch.Value
        lngCount = lngCount + 1
    Next
End With


Sub testabove()
    Call Regx("[x] Alpha" & Chr(13) & _
      "[33] Beta", "(\[x\])|(\[\d*\])")
End Sub

解决方案

Use capturing around the subpatterns that will fetch you your required value.

Use

"\[(x)\]|\[(\d*)\]"

(or \d+ if you need to match at least 1 digit, as * means zero or more occurrences, and + means one or more occurrences).

Or, use the generic pattern to extract anything inside the square brackets without the brackets:

"\[([^\][]+)]"

Then, access the right Submatches index by checking the submatch length (since you have an alternation, either of the submatch will be empty), and there you go. Just change your for loop with

For Each oMatch In .Execute(SourceString)
    ReDim Preserve arrMatches(lngCount)
    If Len(oMatch.SubMatches(0)) > 0 Then
        arrMatches(lngCount) = oMatch.SubMatches(0)
    Else
        arrMatches(lngCount) = oMatch.SubMatches(1)
    End If
    ' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
    lngCount = lngCount + 1
Next

这篇关于使用正则表达式提取括号之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆