Perl:不包含PATTERN的匹配字符串 [英] Perl: Matching string not containing PATTERN

查看:418
本文介绍了Perl:不包含PATTERN的匹配字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Perl正则表达式将字符串切成可用的片段时,我需要匹配除特定模式之外的所有内容.在 Perl僧侣:

While using Perl regex to chop a string down into usable pieces I had the need to match everything except a certain pattern. I solved it after I found this hint on Perl Monks:

/^(?:(?!PATTERN).)*$/;    # Matches strings not containing PATTERN

尽管我解决了最初的问题,但对于它的实际工作方式却一无所知.我检查了 perlre ,但这有点太正式了.

Although I solved my initial problem, I have little clue about how it actually works. I checked perlre, but it is a bit too formal to grasp.

正则表达式以匹配不包含该行的行一个单词?有助于理解,但是为什么我的示例中的.?:以及外括号如何工作?

Regular expression to match a line that doesn't contain a word? helps a lot in understanding, but why is the . in my example and the ?: and how do the outer parentheses work?

有人可以分解正则表达式并用简单的词来解释它是如何工作的吗?

Can someone break up the regex and explain in simple words how it works?

推荐答案

逐个构建它(并假设字符串或PATTERN中没有换行符):

Building it up piece by piece (and throughout assuming no newlines in the string or PATTERN):

这匹配任何字符串:

/^.*$/

但是我们不希望.匹配以PATTERN开头的字符,因此请替换

But we don't want . to match a character that starts PATTERN, so replace

.

使用

(?!PATTERN).

这将使用否定的超前功能来测试给定的模式,而不会实际消耗任何字符串,并且仅当模式在字符串中的给定点不匹配时才成功.所以就像说:

This uses a negative look-ahead that tests a given pattern without actually consuming any of the string and only succeeds if the pattern does not match at the given point in the string. So it's like saying:

if PATTERN doesn't match at this point,
    match the next character

此操作需要对字符串中的每个字符进行,因此*用于从字符串的开头到结尾匹配零次或多次.

This needs to be done for every character in the string, so * is used to match zero or more times, from the beginning to the end of the string.

要使*适用于否定的前瞻性和.的组合,而不仅是.,还需要用括号括起来,并且由于没有理由要捕获它们,因此应该非捕获括号(?: ):

To make the * apply to the combination of the negative look-ahead and ., not just the ., it needs to be surrounded by parentheses, and since there's no reason to capture, they should be non-capturing parentheses (?: ):

(?:(?!PATTERN).)*

放回锚点以确保我们在字符串的每个位置进行测试:

And putting back the anchors to make sure we test at every position in the string:

/^(?:(?!PATTERN).)*$/

请注意,此解决方案作为较大匹配项的一部分特别有用;例如匹配任何具有foo和更高版本baz但之间没有bar的字符串:

Note that this solution is particularly useful as part of a larger match; e.g. to match any string with foo and later baz but no bar in between:

/foo(?:(?!bar).)*baz/

如果没有这样的考虑,您可以简单地做:

If there aren't such considerations, you can simply do:

/^(?!.*PATTERN)/

检查PATTERN是否与字符串中的任何地方都不匹配.

to check that PATTERN does not match anywhere in the string.

关于换行符:正则表达式和换行符存在两个问题.首先,.与换行符不匹配,因此即使字符串不包含baz,"foo\nbar" =~ /^(?:(?!baz).)*$/也不匹配.您需要添加/s标志以使.匹配任何字符; "foo\nbar" =~ /^(?:(?!baz).)*$/s正确匹配.其次,$不仅在字符串末尾匹配,还可以在字符串末尾的换行符之前匹配.所以"foo\n" =~ /^(?:(?!\s).)*$/s确实匹配,即使字符串包含空格并且您试图仅匹配不带空格的字符串也是如此. \z始终仅在末尾匹配,因此"foo\n" =~ /^(?:(?!\s).)*\z/s正确地无法匹配实际上包含\s的字符串.因此正确的通用正则表达式为/^(?:(?!PATTERN).)*\z/s

About newlines: there are two problems with your regex and newlines. First, . doesn't match newlines, so "foo\nbar" =~ /^(?:(?!baz).)*$/ doesn't match, even though the string does not contain baz. You need to add the /s flag to make . match any character; "foo\nbar" =~ /^(?:(?!baz).)*$/s correctly matches. Second, $ doesn't match just at the end of the string, it also can match before a newline at the end of the string. So "foo\n" =~ /^(?:(?!\s).)*$/s does match, even though the string contains whitespace and you are attempting to only match strings with no whitespace; \z always only matches at the end, so "foo\n" =~ /^(?:(?!\s).)*\z/s correctly fails to match the string that does in fact contain a \s. So the correct general purpose regex is /^(?:(?!PATTERN).)*\z/s

这篇关于Perl:不包含PATTERN的匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆