VBScript RegEx - 在模式之间查找数据块 [英] VBScript RegEx - Find block of data between a pattern

查看:23
本文介绍了VBScript RegEx - 在模式之间查找数据块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 RegEx 从多行字符串中获取数据块.

要搜索的字符串

<前>***** a.txt17=xxx570=N55=年***** b.TXT17=XXX570=是55=年********** a.txt38=10500.000000711=1311=0000000006630265***** b.TXT38=10500.000000311=0000000006630265*****

我需要什么 - *****块之间的任何东西

<前>17=xxx570=N55=年17=XXX570=是55=年38=10500.000000711=1311=000000000663026538=10500.000000311=0000000006630265

到目前为止我的代码

<前>Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = TrueobjRegEx.MultiLine = TrueobjRegEx.IgnoreCase = TrueobjRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*"设置 strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())如果 strMatches.Count > 0 那么对于每个 strMatch 在 strMatchesWscript.Echo strMatch下一个万一设置 objRegEx = 无

解决方案

您需要将消费模式的最后一个 * 匹配部分转换为正向前瞻.此外,强烈建议去掉 (.|\r|\n)*? 因为它会减慢匹配过程,请使用 [\s\S]*? 代替.

使用

\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})

并抓取Submatches中的第一项.使用 .*[\r\n]+,我建议跳过 ***** 起始行的其余部分.

详情:

  • \*{5} - 5 个星号
  • (?!\s*\*{5}) - 如果有 0+ 个空格后跟 5 个星号,则匹配失败
  • .*[\r\n]+ - 用换行符匹配行的其余部分
  • ([\s\S]*?) - 捕获与任何 0+ 个字符匹配的组 1(其值存储在 Match 对象的 Submatches 属性中)很少有可能直到第一个......
  • (?=\*{5}) - 位置后跟 5 个未消耗的星号,只检查它们的存在.

查看正则表达式演示

如果你展开正则表达式,它看起来会更丑,但效率更高:

\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)

查看另一个正则表达式演示

VBS 代码:

Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = TrueobjRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"设置 strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())如果 strMatches.Count >0 那么对于每个 strMatch 在 strMatchesWscript.Echo strMatch.Submatches(0)下一个万一设置 objRegEx = 无

I am trying to use RegEx to get blocks of data from a multi-line string.

String to search

***** a.txt
17=xxx
570=N
55=yyy
***** b.TXT
17=XXX
570=Y
55=yyy
*****

***** a.txt
38=10500.000000
711=1
311=0000000006630265
***** b.TXT
38=10500.000000
311=0000000006630265
*****

What I need - anything between ***** block

17=xxx
570=N
55=yyy

17=XXX
570=Y
55=yyy

38=10500.000000
711=1
311=0000000006630265

38=10500.000000
311=0000000006630265

My code so far

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.MultiLine = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch
    Next
End If
Set objRegEx = Nothing

解决方案

You need to turn the last * matching part of your consuming pattern into a positive lookahead. Also, it is highly recommendable to get rid of the (.|\r|\n)*? since it slows down the matching process, use [\s\S]*? instead.

Use

\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})

and grab the first item in Submatches. With .*[\r\n]+, I advise to skip the rest of the ***** starting line.

Details:

  • \*{5} - 5 asterisks
  • (?!\s*\*{5}) - fail the match if there are 0+ whitespaces followed with 5 asterisks
  • .*[\r\n]+ - match the rest of the line with line breaks
  • ([\s\S]*?) - Capturing group 1 (its value is stored in Submatches property of the Match object) matching any 0+ chars as few as posssible up to the first....
  • (?=\*{5}) - location followed with 5 asterisks that are not consumed, just their presence is checked.

See the regex demo

If you unroll the regex, it will look uglier, but it is much more efficient:

\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)

See another regex demo

VBS code:

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch.Submatches(0)
    Next
End If
Set objRegEx = Nothing

这篇关于VBScript RegEx - 在模式之间查找数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆