正则表达式匹配不包含单词的行 [英] Regular expression to match a line that doesn't contain a word

查看:97
本文介绍了正则表达式匹配不包含单词的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道可以匹配一个单词,然后使用其他工具(例如 grep -v)反转匹配.但是,是否可以匹配不包含特定单词的行,例如hede,使用正则表达式?

I know it's possible to match a word and then reverse the matches using other tools (e.g. grep -v). However, is it possible to match lines that do not contain a specific word, e.g. hede, using a regular expression?

hoho
hihi
haha
hede

代码:

grep "<Regex for 'doesn't contain hede'>" input

所需的输出:

hoho
hihi
haha

推荐答案

正则表达式不支持反向匹配的观点并不完全正确.您可以通过使用负面环视来模仿这种行为:

The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:

^((?!hede).)*$

上面的正则表达式将匹配任何字符串或没有换行符的行,包含(子)字符串hede".如前所述,这不是正则表达式擅长"(或应该做)的事情,但它仍然成为可能.

The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.

如果您还需要匹配换行符,请使用 DOT-ALL 修饰符(以下模式中的尾随 s):

And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s in the following pattern):

/^((?!hede).)*$/s

或内联使用:

/(?s)^((?!hede).)*$/

(其中 /.../ 是正则表达式分隔符,即不是模式的一部分)

(where the /.../ are the regex delimiters, i.e., not part of the pattern)

如果 DOT-ALL 修饰符不可用,您可以使用字符类 [\s\S] 模仿相同的行为:

If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [\s\S]:

/^((?!hede)[\s\S])*$/

说明

字符串只是 n 个字符的列表.在每个字符之前和之后,都有一个空字符串.所以 n 个字符的列表将有 n+1 个空字符串.考虑字符串 "ABhedeCD":

Explanation

A string is just a list of n characters. Before, and after each character, there's an empty string. So a list of n characters will have n+1 empty strings. Consider the string "ABhedeCD":

    ┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
    └──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘

index    0      1      2      3      4      5      6      7

其中 e 是空字符串.正则表达式 (?!hede). 向前看是否没有子字符串 "hede" 可以看到,如果是这样的话(所以看到了其他东西),然后 .(点)将匹配除换行符以外的任何字符.环视也称为零宽度断言,因为它们不消耗任何字符.他们只断言/验证某些东西.

where the e's are the empty strings. The regex (?!hede). looks ahead to see if there's no substring "hede" to be seen, and if that is the case (so something else is seen), then the . (dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.

因此,在我的示例中,在 使用字符之前,首先验证每个空字符串以查看前面是否没有 "hede". (dot).正则表达式 (?!hede). 只会做一次,所以它被包裹在一个组中,并重复零次或多次:((?!hede).)*代码>.最后,锚定输入的开始和结束以确保消耗整个输入:^((?!hede).)*$

So, in my example, every empty string is first validated to see if there's no "hede" up ahead, before a character is consumed by the . (dot). The regex (?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$

如您所见,输入 "ABhedeCD" 将失败,因为在 e3 上,正则表达式 (?!hede) 失败(有 "hede" 在前面!).

As you can see, the input "ABhedeCD" will fail because on e3, the regex (?!hede) fails (there is "hede" up ahead!).

这篇关于正则表达式匹配不包含单词的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆