VBScript RegEx - 在模式之间查找数据块 [英] VBScript RegEx - Find block of data between a pattern
问题描述
我正在尝试使用 RegEx 从多行字符串中获取数据块.
要搜索的字符串
<前>***** a.txt17=xxx570=N55=年***** b.TXT17=XXX570=是55=年********** a.txt38=10500.000000711=1311=0000000006630265***** b.TXT38=10500.000000311=0000000006630265*****我需要什么 - *****块之间的任何东西
<前>17=xxx570=N55=年17=XXX570=是55=年38=10500.000000711=1311=000000000663026538=10500.000000311=0000000006630265到目前为止我的代码
<前>Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = TrueobjRegEx.MultiLine = TrueobjRegEx.IgnoreCase = TrueobjRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*"设置 strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())如果 strMatches.Count > 0 那么对于每个 strMatch 在 strMatchesWscript.Echo strMatch下一个万一设置 objRegEx = 无您需要将消费模式的最后一个 *
匹配部分转换为正向前瞻.此外,强烈建议去掉 (.|\r|\n)*?
因为它会减慢匹配过程,请使用 [\s\S]*?
代替.
使用
\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})
并抓取Submatches
中的第一项.使用 .*[\r\n]+
,我建议跳过 *****
起始行的其余部分.
详情:
\*{5}
- 5 个星号(?!\s*\*{5})
- 如果有 0+ 个空格后跟 5 个星号,则匹配失败.*[\r\n]+
- 用换行符匹配行的其余部分([\s\S]*?)
- 捕获与任何 0+ 个字符匹配的组 1(其值存储在 Match 对象的Submatches
属性中)很少有可能直到第一个......(?=\*{5})
- 位置后跟 5 个未消耗的星号,只检查它们的存在.
查看正则表达式演示
如果你展开正则表达式,它看起来会更丑,但效率更高:
\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)
VBS 代码:
Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = TrueobjRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"设置 strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())如果 strMatches.Count >0 那么对于每个 strMatch 在 strMatchesWscript.Echo strMatch.Submatches(0)下一个万一设置 objRegEx = 无
I am trying to use RegEx to get blocks of data from a multi-line string.
String to search
***** a.txt 17=xxx 570=N 55=yyy ***** b.TXT 17=XXX 570=Y 55=yyy ***** ***** a.txt 38=10500.000000 711=1 311=0000000006630265 ***** b.TXT 38=10500.000000 311=0000000006630265 *****
What I need - anything between ***** block
17=xxx 570=N 55=yyy 17=XXX 570=Y 55=yyy 38=10500.000000 711=1 311=0000000006630265 38=10500.000000 311=0000000006630265
My code so far
Set objRegEx = CreateObject("VBScript.RegExp") objRegEx.Global = True objRegEx.MultiLine = True objRegEx.IgnoreCase = True objRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*" Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll()) If strMatches.Count > 0 Then For Each strMatch In strMatches Wscript.Echo strMatch Next End If Set objRegEx = Nothing
You need to turn the last *
matching part of your consuming pattern into a positive lookahead. Also, it is highly recommendable to get rid of the (.|\r|\n)*?
since it slows down the matching process, use [\s\S]*?
instead.
Use
\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})
and grab the first item in Submatches
. With .*[\r\n]+
, I advise to skip the rest of the *****
starting line.
Details:
\*{5}
- 5 asterisks(?!\s*\*{5})
- fail the match if there are 0+ whitespaces followed with 5 asterisks.*[\r\n]+
- match the rest of the line with line breaks([\s\S]*?)
- Capturing group 1 (its value is stored inSubmatches
property of the Match object) matching any 0+ chars as few as posssible up to the first....(?=\*{5})
- location followed with 5 asterisks that are not consumed, just their presence is checked.
See the regex demo
If you unroll the regex, it will look uglier, but it is much more efficient:
\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)
VBS code:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
For Each strMatch In strMatches
Wscript.Echo strMatch.Submatches(0)
Next
End If
Set objRegEx = Nothing
这篇关于VBScript RegEx - 在模式之间查找数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!