了解 Perl 正则表达式修饰符/m 和/s [英] Understanding Perl regular expression modifers /m and /s

查看:56
本文介绍了了解 Perl 正则表达式修饰符/m 和/s的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读带有修饰符 s m 和 g 的 perl 正则表达式.我知道//g 是一个全局匹配,它将是一个贪婪的搜索.

I have been reading perl regular expression with modifier s m and g. I understand that //g is a global matching where it will be a greedy search.

但我对修饰符 s 和 m 感到困惑.任何人都可以用代码示例解释 s 和 m 之间的区别,以说明它有何不同?我试图在线搜索,它只给出了链接 http://perldoc.perl.org/perlre.html 中的解释#修饰符.在 stackoverflow 中,我什至看到人们同时使用 s 和 m.s 不是 m 的反义词吗?

But I am confused with the modifier s and m. Can anyone explain the difference between s and m with code example to show how it can be different? I have tried to search online and it only gives explanation as in the link http://perldoc.perl.org/perlre.html#Modifiers. In stackoverflow I have even seen people using s and m together. Isn't s is the opposite of m?

//s 
//m 
//g

我无法使用 m 匹配多行.

I am not able to match multiple line using using m.

use warnings;
use strict;
use 5.012;

my $file; 
{ 
 local $/ = undef; 
 $file = <DATA>; 
};
my @strings = $file =~ /".*"/mg; #returns all except the last string across multiple lines
#/"String"/mg; tried with this as well and returns nothing except String
say for @strings;

__DATA__
"This is string"
"1!=2"
"This is \"string\""
"string1"."string2"
"String"
"S
t
r
i
n
g"

推荐答案

您链接到自己的文档对我来说似乎很清楚.如果你能解释一下你在理解它时遇到了什么问题,以及你是如何认为 /s/m 是对立的,那将会很有帮助.

The documentation that you link to yourself seems very clear to me. It would help if you would explain what problem you had with understanding it, and how you came to think that /s and /m were opposites.

非常简单,/s 改变了点元字符 . 的行为,使其完全匹配任何字符.通常它匹配除换行符 "\n" 之外的任何内容,因此即使字符串包含换行符,也将其视为 s 单行.

Very briefly, /s changes the behaviour of the dot metacharacter . so that it matches any character at all. Normally it matches anything except a newline "\n", and so treats the string as a single line even if it contains newlines.

/m 修改插入符号 ^ 和美元 $ 元字符,以便它们在字符串的换行符处匹配, 将其视为 m 多行字符串.通常它们只会在字符串的开头和结尾匹配.

/m modifies the caret ^ and dollar $ metacharacters so that they match at newlines within the string, treating it as a multi-line string. Normally they will match only at the beginning and end of the string.

您不应该对贪婪"的 /g 修饰符感到困惑.它用于 g 全局匹配,它将在字符串中找到 所有 出现的模式.术语贪婪通常用于描述模式中量词的行为.例如 .* 被认为是贪婪的,因为它会匹配尽可能多的字符,而不是 .*? 会匹配 few 尽可能使用字符.

You shouldn't get confused with the /g modifier being "greedy". It is for global matches which will find all occurrences of the pattern within the string. The term greedy is usually user for the behaviour of quantifiers within the pattern. For instance .* is said to be greedy because it will match as many characters as possible, as opposed to .*? which will match as few characters as possible.

更新

在您修改后的问题中,您使用的是 /".*"/mg,其中 /m 无关紧要,因为如上所述,该修饰符仅更改$^ 元字符的行为,并且您的模式中没有.

In your modified question you are using /".*"/mg, in which the /m is irrelevant because, as noted above, that modifier alters only the behaviour of the $ and ^ metacharacters, and there are none in your pattern.

将其更改为 /".*"/sg 有所改善,因为 . 现在可以匹配每行末尾的换行符,因此模式可以匹配多行字符串.(请注意,object 字符串在这里被认为是单行" - 即匹配的行为就像其中没有换行符一样. 关注.)然而,这里是 greedy 的传统含义,因为该模式现在匹配从第一行的第一个双引号到最后一个双引号的所有内容最后一行的结尾.我想这不是你想要的.

Changing it to /".*"/sg improves things a little in that the . can now match the newline at the end of each line and so the pattern can match multi-line strings. (Note that it is the object string that is considered to be "single line" here - i.e. the match behaves just as if there were no newlines in it as far as . is concerned.) Hower here is the conventional meaning of greedy, because the pattern now matches everything from the first double-quote in the first line to the last double-quote at the end of the last line. I assume that isn't what you want.

有几种方法可以解决这个问题.我建议更改您的模式,以便您想要的字符串是双引号,后跟任何字符序列除了双引号,然后是另一个双引号.这是写成 /"[^"]*"/g (注意 /s 修饰符不再需要,因为现在模式中没有点)并且非常除了转义的双引号被视为结束模式之外,几乎可以满足您的要求.

There are a few ways to fix this. I recommend changing your pattern so that the string you want is a double-quote, followed by any sequence of characters except double-quotes, followed by another double quote. This is written /"[^"]*"/g (note that the /s modifier is no longer necessary as there are now no dots in the pattern) and very nearly does what you want except that the escaped double-quotes are seen as ending the pattern.

看一下这个程序和它的输出,注意我在每个匹配的开头都放了一个 V 形 >> 以便区分

Take a look at this program and its output, noting that I have put a chevron >> at the start of each match so that they can be distinguished

use strict;
use warnings;

my $file = do {
  local $/;
  <DATA>; 
};

my @strings = $file =~ /"[^"]*"/g;

print ">> $_\n\n", for @strings;

__DATA__
"This is string"
"1!=2"
"This is \"string\""
"string1"."string2"
"String"
"S
t
r
i
n
g"

输出

>> "This is string"

>> "1!=2"

>> "This is \"

>> ""

>> "string1"

>> "string2"

>> "String"

>> "S
t
r
i
n
g"

正如你所看到的,除了在 "This is \"string\"" 中找到了两个匹配项,"This is \",一切都井然有序,和 "".修复它可能比您想要的更复杂,但这是完全可能的.如果您也需要修复,请说出来.

As you can see everything is now in order except that in "This is \"string\"" it has found two matches, "This is \", and "". Fixing that may be more complicated than you want to go but it's perfectly possible. Please say so if you need that fixed too.

更新

我不妨把这件事做完.要忽略转义的双引号并将它们视为字符串的一部分,我们需要接受 either \" 除双引号外的任何字符引用.这是使用正则表达式替换运算符 | 完成的,并且必须分组在非捕获括号 (?: ... ) 内.最终结果是 /"(?:\\"|[^"])*"/g (反斜杠本身必须被转义,因此它被加倍),当放入上述程序时,会产生这个输出,我假设就是你想要的.

I may as well finish this off. To ignore escaped double-quotes and treat them as just part of the string, we need to accept either \" or any character except double-quote. That is done using the regex alternation operator | and must be grouped inside non-capturing parentheses (?: ... ). The end result is /"(?:\\"|[^"])*"/g (the backslash itself must be escaped so it is doubled up) which, when put into the above program, produces this output, which I assume is what you wanted.

>> "This is string"

>> "1!=2"

>> "This is \"string\""

>> "string1"

>> "string2"

>> "String"

>> "S
t
r
i
n
g"

这篇关于了解 Perl 正则表达式修饰符/m 和/s的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆