如何以最聪明的方式替换PHP中的不同换行符样式? [英] How to replace different newline styles in PHP the smartest way?

查看:88
本文介绍了如何以最聪明的方式替换PHP中的不同换行符样式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可能具有不同换行符样式的文本. 我想将所有换行符'\ r \ n','\ n','\ r'替换为同一换行符(在本例中为\ r \ n).

I have a text which might have different newline styles. I want to replace all newlines '\r\n', '\n','\r' with the same newline (in this case \r\n ).

最快的方法是什么?我当前的解决方案如下所示:

What's the fastest way to do this? My current solution looks like this which is way sucky:

    $sNicetext = str_replace("\r\n",'%%%%somthing%%%%', $sNicetext);
    $sNicetext = str_replace(array("\r","\n"),array("\r\n","\r\n"), $sNicetext);
    $sNicetext = str_replace('%%%%somthing%%%%',"\r\n", $sNicetext);

问题是您不能一次替换,因为\ r \ n将被复制到\ r \ n \ r \ n.

Problem is that you can't do this with one replace because the \r\n will be duplicated to \r\n\r\n .

谢谢您的帮助!

推荐答案

$string = preg_replace('~\R~u', "\r\n", $string);

如果您不想替换所有Unicode换行符,而仅替换CRLF样式的换行符,请使用:

If you don't want to replace all Unicode newlines but only CRLF style ones, use:

$string = preg_replace('~(*BSR_ANYCRLF)\R~', "\r\n", $string);

\R匹配这些换行符,u是将输入字符串视为UTF-8的修饰符.

\R matches these newlines, u is a modifier to treat the input string as UTF-8.

来自 PCRE文档:

\R的匹配项

What \R matches

默认情况下,模式中的序列\ R与任何Unicode换行符匹配 序列,无论选择什么作为行结束序列.如果 您指定

By default, the sequence \R in a pattern matches any Unicode newline sequence, whatever has been selected as the line ending sequence. If you specify

     --enable-bsr-anycrlf

更改默认值,以便\ R仅匹配CR,LF或CRLF.建立PCRE时选择的任何内容都可以在库中被覆盖 函数被调用.

the default is changed so that \R matches only CR, LF, or CRLF. Whatever is selected when PCRE is built can be overridden when the library functions are called.

换行符

在字符类之外,默认情况下,转义序列\ R匹配 任何Unicode换行符序列.在非UTF-8模式下,\ R等效于 以下:

Outside a character class, by default, the escape sequence \R matches any Unicode newline sequence. In non-UTF-8 mode \R is equivalent to the following:

    (?>\r\n|\n|\x0b|\f|\r|\x85)

这是原子团"的一个示例,其详细信息已给出 以下.该特定组匹配两个字符的序列 CR,后跟LF,或单个字符LF(换行符, U + 000A),VT(垂直标签,U + 000B),FF(换页,U + 000C),CR(托架) 返回值,U + 000D)或NEL(下一行,U + 0085).两个字符的序列 被视为无法拆分的单个单元.

This is an example of an "atomic group", details of which are given below. This particular group matches either the two-character sequence CR followed by LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next line, U+0085). The two-character sequence is treated as a single unit that cannot be split.

在UTF-8模式下,两个额外的字符的代码点更大 最多添加255个:LS(行分隔符,U + 2028)和PS(段分隔符,U + 2029).不需要Unicode字符属性支持 这些字符才能被识别.

In UTF-8 mode, two additional characters whose codepoints are greater than 255 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). Unicode character property support is not needed for these characters to be recognized.

可以限制\ R仅匹配CR,LF或CRLF(而不是 完整的Unicode行尾集),方法是设置 在编译时或匹配模式时,PCRE_BSR_ANYCRLF. (BSR是反斜杠R"的缩写.)可以将其设置为默认值. 建立PCRE时;如果是这种情况,其他行为可以是 通过PCRE_BSR_UNICODE选项请求.也有可能 通过使用以下任一模式启动模式字符串来指定这些设置 以下顺序:

It is possible to restrict \R to match only CR, LF, or CRLF (instead of the complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched. (BSR is an abbrevation for "backslash R".) This can be made the default when PCRE is built; if this is the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option. It is also possible to specify these settings by starting a pattern string with one of the following sequences:

    (*BSR_ANYCRLF)   CR, LF, or CRLF only
    (*BSR_UNICODE)   any Unicode newline sequence

这些将覆盖默认值和给pcre_compile()或 pcre_compile2(),但是它们可以由给定的选项覆盖 pcre_exec()或pcre_dfa_exec().请注意,这些特殊设置 与Perl不兼容,仅在 模式,并且它们必须为大写.如果其中一个以上 存在时,使用最后一个.它们可以结合改变 换行符约定;例如,模式可以以以下内容开头:

These override the default and the options given to pcre_compile() or pcre_compile2(), but they can be overridden by options given to pcre_exec() or pcre_dfa_exec(). Note that these special settings, which are not Perl-compatible, are recognized only at the very start of a pattern, and that they must be in upper case. If more than one of them is present, the last one is used. They can be combined with a change of newline convention; for example, a pattern can start with:

    (*ANY)(*BSR_ANYCRLF)

它们也可以与(* UTF8)或(* UCP)特殊序列结合使用. 在字符类中,\ R被视为无法识别的转义 序列,因此默认情况下与字母"R"匹配,但会导致错误 如果设置了PCRE_EXTRA.

They can also be combined with the (*UTF8) or (*UCP) special sequences. Inside a character class, \R is treated as an unrecognized escape sequence, and so matches the letter "R" by default, but causes an error if PCRE_EXTRA is set.

这篇关于如何以最聪明的方式替换PHP中的不同换行符样式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆