正则表达式匹配关键字不在引号 [英] regex match keywords that are not in quotes
问题描述
我如何能够寻找不属于字符串中kewords
How will I be able to look for kewords that are not inside a string.
例如,如果我有文字:
您好这段文字就是一个例子。
Hello this text is an example.
唧唧歪歪这段文字是一个字符串里面的
bla bla bla "this text is inside a string"
随机字符串更多的文字唧唧歪歪富
"random string" more text bla bla bla "foo"
我会喜欢要能够匹配所有不在里面,
字文本
。在其他的我会想匹配的:
I will like to be able to match all the words text
that are not inside " "
. In other I will like to match:
请注意,我不希望这是在红色突出显示文本匹配,因为它是一个字符串
note I do not want to match the text that is highlighted on red because it is inside a string
可能的解决方法:
我一直在努力,这就是我到目前为止:(?S)
I been working on it and this is what I have so far:
(?(小于q>中?。?)|文)((q)*| )
注意,正则表达式使用if语句为:(?(谓语)正确的选择|假替代)
note that regex uses the if statement as: (?(predicate) true alternative|false alternative)
所以正则表达式将读取:
so the regex will read:
查找或文字。如果你找。然后继续选择,直到你找到了(。*?)如果您发现文,然后什么也不做......
find " or text. If you find " then continue selecting until you find " again (.*?") if you find text then do nothing...
当我运行的正则表达式匹配我的整个字符串虽然。我问学习的目的这个问题。我知道,我可以删除所有字符串然后寻找我需要的东西。
when I run that regex I match the whole string though. I am asking this question for purposes of learning. I know I can remove all strings then look for what I need.
推荐答案
下面是一个答案:
(?<=^([^"]|"[^"]*")*)text
这意味着:
(?<= # preceded by...
^ # the start of the string, then
([^"] # either not a quote character
|"[^"]*" # or a full string
)* # as many times as you want
)
text # then the text
您可以轻松地扩展这个处理含逃逸以及字符串
You can easily extend this to handle strings containing escapes as well.
在C#代码:
Regex.Match("bla bla bla \"this text is inside a string\"",
"(?<=^([^\"]|\"[^\"]*\")*)text", RegexOptions.ExplicitCapture);
添加了从评论的讨论 - 扩展版(比赛在每行的基础和处理逃逸)。使用 RegexOptions.Multiline
此:
Added from comment discussion - extended version (match on a per-line basis and handle escapes). Use RegexOptions.Multiline
for this:
(?<=^([^"\r\n]|"([^"\\\r\n]|\\.)*")*)text
在一个C#字符串,这看起来像:
In a C# string this looks like:
"(?<=^([^\"\r\n]|\"([^\"\\\\\r\n]|\\\\.)*\")*)text"
既然你现在想使用 **
而不是这里是一个版本:
Since you now want to use **
instead of "
here is a version for that:
(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text
说明:
(?<= # preceded by
^ # start of line
( # either
[^*\r\n]| # not a star or line break
\*(?!\*)| # or a single star (star not followed by another star)
\*\* # or 2 stars, followed by...
([^*\\\r\n] # either: not a star or a backslash or a linebreak
|\\. # or an escaped char
|\*(?!\*) # or a single star
)* # as many times as you want
\*\* # ended with 2 stars
)* # as many times as you want
)
text # then the text
由于此版本不包含字符,它的清洁剂使用一个字符串:
Since this version doesn't contain "
characters it's cleaner to use a literal string:
@"(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text"
这篇关于正则表达式匹配关键字不在引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!