用PHP正则表达式匹配行尾的差异 [英] Difference in matching end of line with PHP regex

查看:223
本文介绍了用PHP正则表达式匹配行尾的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出代码:

$my_str = '
Rollo is*
My dog*
And he\'s very*
Lovely*
';

preg_match_all('/\S+(?=\*$)/m', $my_str, $end_words);
print_r($end_words);

在PHP 7.3.2(XAMPP)中,我得到了意外的输出

In PHP 7.3.2 (XAMPP) I get the unexpected output

Array ( [0] => Array ( ) )

在PHP 7.0.33的 PHPFiddle 中,我得到了我所期望的:

Whereas in PHPFiddle, on PHP 7.0.33, I get what I expected:

Array ( [0] => Array ( [0] => is [1] => dog [2] => very [3] => Lovely ) ) 

有人能告诉我为什么我会得到这种区别吗,7.0.33之后的REGEX行为是否有所改变?

Can anyone tell me why I'm getting this difference, whether something changed in REGEX behaviour after 7.0.33?

推荐答案

似乎在您所拥有的环境中,PCRE库是在没有PCRE_NEWLINE_ANY选项的情况下编译的,而在多行模式下的$仅在LF符号和.匹配除LF之外的任何符号.

It seems that in the environment you have, the PCRE library was compiled without the PCRE_NEWLINE_ANY option, and $ in the multiline mode only matches before the LF symbol and . matches any symbol but LF.

您可以使用PCRE (*ANYCRLF)动词来修复它:

You can fix it by using the PCRE (*ANYCRLF) verb:

'~(*ANYCRLF)\S+(?=\*$)~m'

(*ANYCRLF)指定换行符:(*CR)(*LF)(*CRLF),并且等效于PCRE_NEWLINE_ANY选项.请参见 PCRE文档:

(*ANYCRLF) specifies a newline convention: (*CR), (*LF) or (*CRLF) and is equivalent to PCRE_NEWLINE_ANY option. See the PCRE documentation:

PCRE_NEWLINE_ANY指定应识别任何Unicode换行符序列.

PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be recognized.

最后,此PCRE动词使.能够匹配任何字符,但CR和LF符号匹配,并且$会在这两个字符中的任何一个之前匹配.

In the end, this PCRE verb enables . to match any char BUT a CR and LF symbols and $ will match right before either of these two chars.

rexegg.com 上了解有关此动词和其他动词的更多信息:

See more about this and other verbs at rexegg.com:

默认情况下,编译PCRE时,您告诉它遇到.时应将其视为换行符(因为点与换行符不匹配,除非在

By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a . (as the dot it doesn't match line breaks unless in dotall mode), as well the ^ and $ anchors' behavior in multiline mode. You can override this default with the following modifiers:

(*CR)只有回车符被认为是换行符
(*LF)仅将换行符视为换行符(如Unix上一样)
(*CRLF)仅将回车符后跟换行符视为换行符(如Windows上一样)
(*ANYCRLF)以上三个条件中的任何一个都被视为换行符
(*ANY)任何Unicode换行序列都被视为换行符

(*CR) Only a carriage return is considered to be a line break
(*LF) Only a line feed is considered to be a line break (as on Unix)
(*CRLF) Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
(*ANYCRLF) Any of the above three is considered to be a line break
(*ANY) Any Unicode newline sequence is considered to be a line break

例如,(*CR)\w+.\w+匹配 Line1 \ nLine2 ,因为该点能够匹配 \ n ,这不认为是换行符.请参见演示.

For instance, (*CR)\w+.\w+ matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See demo.

这篇关于用PHP正则表达式匹配行尾的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆