Perl:不包含PATTERN的匹配字符串 [英] Perl: Matching string not containing PATTERN
问题描述
在使用Perl正则表达式将字符串切成可用的片段时,我需要匹配除特定模式之外的所有内容.在 Perl僧侣:
While using Perl regex to chop a string down into usable pieces I had the need to match everything except a certain pattern. I solved it after I found this hint on Perl Monks:
/^(?:(?!PATTERN).)*$/; # Matches strings not containing PATTERN
尽管我解决了最初的问题,但对于它的实际工作方式却一无所知.我检查了 perlre ,但这有点太正式了.
Although I solved my initial problem, I have little clue about how it actually works. I checked perlre, but it is a bit too formal to grasp.
正则表达式以匹配不包含该行的行一个单词?有助于理解,但是为什么我的示例中的.
和?:
以及外括号如何工作?
Regular expression to match a line that doesn't contain a word? helps a lot in understanding, but why is the .
in my example and the ?:
and how do the outer parentheses work?
有人可以分解正则表达式并用简单的词来解释它是如何工作的吗?
Can someone break up the regex and explain in simple words how it works?
推荐答案
逐个构建它(并假设字符串或PATTERN中没有换行符):
Building it up piece by piece (and throughout assuming no newlines in the string or PATTERN):
这匹配任何字符串:
/^.*$/
但是我们不希望.
匹配以PATTERN开头的字符,因此请替换
But we don't want .
to match a character that starts PATTERN, so replace
.
使用
(?!PATTERN).
这将使用否定的超前功能来测试给定的模式,而不会实际消耗任何字符串,并且仅当模式在字符串中的给定点不匹配时才成功.所以就像说:
This uses a negative look-ahead that tests a given pattern without actually consuming any of the string and only succeeds if the pattern does not match at the given point in the string. So it's like saying:
if PATTERN doesn't match at this point,
match the next character
此操作需要对字符串中的每个字符进行,因此*
用于从字符串的开头到结尾匹配零次或多次.
This needs to be done for every character in the string, so *
is used to match zero or more times, from the beginning to the end of the string.
要使*
适用于否定的前瞻性和.
的组合,而不仅是.
,还需要用括号括起来,并且由于没有理由要捕获它们,因此应该非捕获括号(?: )
:
To make the *
apply to the combination of the negative look-ahead and .
, not just the .
, it needs to be surrounded by parentheses, and since there's no reason to capture, they should be non-capturing parentheses (?: )
:
(?:(?!PATTERN).)*
放回锚点以确保我们在字符串的每个位置进行测试:
And putting back the anchors to make sure we test at every position in the string:
/^(?:(?!PATTERN).)*$/
请注意,此解决方案作为较大匹配项的一部分特别有用;例如匹配任何具有foo
和更高版本baz
但之间没有bar
的字符串:
Note that this solution is particularly useful as part of a larger match; e.g. to match any string with foo
and later baz
but no bar
in between:
/foo(?:(?!bar).)*baz/
如果没有这样的考虑,您可以简单地做:
If there aren't such considerations, you can simply do:
/^(?!.*PATTERN)/
检查PATTERN是否与字符串中的任何地方都不匹配.
to check that PATTERN does not match anywhere in the string.
关于换行符:正则表达式和换行符存在两个问题.首先,.
与换行符不匹配,因此即使字符串不包含baz,"foo\nbar" =~ /^(?:(?!baz).)*$/
也不匹配.您需要添加/s标志以使.
匹配任何字符; "foo\nbar" =~ /^(?:(?!baz).)*$/s
正确匹配.其次,$
不仅在字符串末尾匹配,还可以在字符串末尾的换行符之前匹配.所以"foo\n" =~ /^(?:(?!\s).)*$/s
确实匹配,即使字符串包含空格并且您试图仅匹配不带空格的字符串也是如此. \z
始终仅在末尾匹配,因此"foo\n" =~ /^(?:(?!\s).)*\z/s
正确地无法匹配实际上包含\s
的字符串.因此正确的通用正则表达式为/^(?:(?!PATTERN).)*\z/s
About newlines: there are two problems with your regex and newlines. First, .
doesn't match newlines, so "foo\nbar" =~ /^(?:(?!baz).)*$/
doesn't match, even though the string does not contain baz. You need to add the /s flag to make .
match any character; "foo\nbar" =~ /^(?:(?!baz).)*$/s
correctly matches. Second, $
doesn't match just at the end of the string, it also can match before a newline at the end of the string. So "foo\n" =~ /^(?:(?!\s).)*$/s
does match, even though the string contains whitespace and you are attempting to only match strings with no whitespace; \z
always only matches at the end, so "foo\n" =~ /^(?:(?!\s).)*\z/s
correctly fails to match the string that does in fact contain a \s
. So the correct general purpose regex is /^(?:(?!PATTERN).)*\z/s
这篇关于Perl:不包含PATTERN的匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!