正则表达式匹配关键字不在引号 [英] regex match keywords that are not in quotes

查看:142
本文介绍了正则表达式匹配关键字不在引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何能够寻找不属于字符串中kewords

How will I be able to look for kewords that are not inside a string.

例如,如果我有文字:

您好这段文字就是一个例子。

Hello this text is an example.

唧唧歪歪这段文字是一个字符串里面的

bla bla bla "this text is inside a string"

随机字符串更多的文字唧唧歪歪富

"random string" more text bla bla bla "foo"

我会喜欢要能够匹配所有不在里面文本。在其他的我会想匹配的:

I will like to be able to match all the words text that are not inside " ". In other I will like to match:

请注意,我不希望这是在红色突出显示文本匹配,因为它是一个字符串

note I do not want to match the text that is highlighted on red because it is inside a string

可能的解决方法:

我一直在努力,这就是我到目前为止:(?S)

I been working on it and this is what I have so far:

(?(小于q>中?。?)|文)((q)*| )

注意,正则表达式使用if语句为:(?(谓语)正确的选择|假替代)

note that regex uses the if statement as: (?(predicate) true alternative|false alternative)

所以正则表达式将读取:

so the regex will read:

查找或文字。如果你找。然后继续选择,直到你找到了(。*?)如果您发现文,然后什么也不做......

find " or text. If you find " then continue selecting until you find " again (.*?") if you find text then do nothing...

当我运行的正则表达式匹配我的整个字符串虽然。我问学习的目的这个问题。我知道,我可以删除所有字符串然后寻找我需要的东西。

when I run that regex I match the whole string though. I am asking this question for purposes of learning. I know I can remove all strings then look for what I need.

推荐答案

下面是一个答案:

(?<=^([^"]|"[^"]*")*)text

这意味着:

(?<=       # preceded by...
^          # the start of the string, then
([^"]      # either not a quote character
|"[^"]*"   # or a full string
)*         # as many times as you want
)
text       # then the text

您可以轻松地扩展这个处理含逃逸以及字符串

You can easily extend this to handle strings containing escapes as well.

在C#代码:

Regex.Match("bla bla bla \"this text is inside a string\"",
            "(?<=^([^\"]|\"[^\"]*\")*)text", RegexOptions.ExplicitCapture);

添加了从评论的讨论 - 扩展版(比赛在每行的基础和处理逃逸)。使用 RegexOptions.Multiline 此:

Added from comment discussion - extended version (match on a per-line basis and handle escapes). Use RegexOptions.Multiline for this:

(?<=^([^"\r\n]|"([^"\\\r\n]|\\.)*")*)text

在一个C#字符串,这看起来像:

In a C# string this looks like:

"(?<=^([^\"\r\n]|\"([^\"\\\\\r\n]|\\\\.)*\")*)text"

既然你现在想使用 ** 而不是这里是一个版本:

Since you now want to use ** instead of " here is a version for that:

(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text

说明:

(?<=       # preceded by
^          # start of line
 (         # either
 [^*\r\n]| #  not a star or line break
 \*(?!\*)| #  or a single star (star not followed by another star)
  \*\*     #  or 2 stars, followed by...
   ([^*\\\r\n] # either: not a star or a backslash or a linebreak
   |\\.        # or an escaped char
   |\*(?!\*)   # or a single star
   )*          # as many times as you want
  \*\*     # ended with 2 stars
 )*        # as many times as you want
)
text      # then the text

由于此版本不包含字符,它的清洁剂使用一个字符串:

Since this version doesn't contain " characters it's cleaner to use a literal string:

@"(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text"

这篇关于正则表达式匹配关键字不在引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆