正则表达式以多字符匹配开头和结尾 [英] Regex starting and ending with multi-character match

查看:162
本文介绍了正则表达式以多字符匹配开头和结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用C#构造正则表达式来解析以下内容?

我不在乎块外的文本
开始
需要捕获此文本1
END
我不在乎块外的文本
开始
需要捕获此文本2 $ @#%*
结束
我不在乎块外的文本"

我正在寻找一种模式,该模式将在(START)和(END)上以不区分大小写的匹配开始和结束捕获,排除那些块外的字符,然后生成一个字符串数组:
[0]:需要捕获此文本1
[1]:需要捕获此文本2 $ @#%*
(也捕获[A-Za-z0-9]之外的任何字符,但在单词"END"处停止捕获)

查看插入符号说明符,它似乎可以处理字符集,但无法弄清楚如何排除确切的字符串... [^ END]匹配E,N和D,但不匹配单词"END" '';而(^(END))似乎是不正确的语法

谢谢

How does one construct a regex in C# to parse the following?

"I don''t care about text outside a block
START
Need this text captured1
END
I don''t care about text outside a block
start
Need this text captured2$@#%*
end
I don''t care about text outside a block"

I''m looking for a pattern that would start and end a capture with case-insensitive matches on (START) and (END), excluding characters outside those blocks, then produce an array of strings:
[0]: Need this text captured1
[1]: Need this text captured2$@#%*
(Capturing any characters outside of [A-Za-z0-9] as well, but stopping a capture at the word ''END'')

Looking at the caret-specifier, it seems to work with sets of characters, but can''t figure out how to exclude an exact string... [^END] matches E, N and D, but not the word ''END''; whereas (^(END)) seems to be incorrect syntax

Thanks

推荐答案

@#%*
结束
我不在乎块外的文本"

我正在寻找一种模式,该模式将在(START)和(END)上以不区分大小写的匹配开始和结束捕获,排除那些块外的字符,然后生成一个字符串数组:
[0]:需要捕获此文本1
[1]:需要捕获此文本2
@#%*
end
I don''t care about text outside a block"

I''m looking for a pattern that would start and end a capture with case-insensitive matches on (START) and (END), excluding characters outside those blocks, then produce an array of strings:
[0]: Need this text captured1
[1]: Need this text captured2


@#%*
(也捕获[A-Za-z0-9]之外的任何字符,但在单词"END"处停止捕获)

查看插入符号说明符,它似乎可以处理字符集,但无法弄清楚如何排除确切的字符串... [^ END]匹配E,N和D,但不匹配单词"END" '';而(^(END))似乎语法不正确

谢谢
@#%*
(Capturing any characters outside of [A-Za-z0-9] as well, but stopping a capture at the word ''END'')

Looking at the caret-specifier, it seems to work with sets of characters, but can''t figure out how to exclude an exact string... [^END] matches E, N and D, but not the word ''END''; whereas (^(END)) seems to be incorrect syntax

Thanks


您需要一个?"字符:
You need a "?" character:
private Regex regex = new Regex("start(?<Data>.*?)end",
                                RegexOptions.IgnoreCase
                                | RegexOptions.CultureInvariant
                                | RegexOptions.IgnorePatternWhitespace
                                | RegexOptions.Compiled);



[edit]对不起,今天早上我有点着急!
加上?"将.*"从任何字符,任意数量的重复"更改为任何字符,任意数量的重复,并尽可能减少"之后.这意味着,当它与第一个结束"文本匹配时,匹配结束.没有?"默认行为是匹配最后一个结束"文本-这就是贪婪的表达式.[/edit]



[edit]Sorry, I was a bit rushed this morning!
The addition of the "?" after the ".*" changes it from "Any character, any number of repetitions", to "Any character, any number of repetitions, as few as possible". This means that when it matches the first "end" text, the match ends. Without the "?" the default behaviour is to match up to the last "end" text - this is known as a greedy expression.[/edit]


这篇关于正则表达式以多字符匹配开头和结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆