不间断的utf-8 0xc2a0空间和preg_replace奇怪的行为 [英] non-breaking utf-8 0xc2a0 space and preg_replace strange behaviour
问题描述
在我的字符串中,我有utf-8不间断空格(0xc2a0),我想用其他内容替换它.
In my string I have utf-8 non-breaking space (0xc2a0) and I want to replace it with something else.
当我使用
$str=preg_replace('~\xc2\xa0~', 'X', $str);
可以.
但是当我使用
$str=preg_replace('~\x{C2A0}~siu', 'W', $str);
没有发现(并替换)不间断的空间.
non-breaking space is not found (and replaced).
为什么?第二个正则表达式有什么问题?
Why? What is wrong with second regexp?
格式\x{C2A0}
是正确的,我也使用了u
标志.
The format \x{C2A0}
is correct, also I used u
flag.
推荐答案
实际上,有关PHP中转义序列的文档是错误的.当您使用\xc2\xa0
语法时,它将搜索UTF-8字符.但是使用\x{c2a0}
语法,它将尝试将Unicode序列转换为UTF-8编码的字符.
Actually the documentation about escape sequences in PHP is wrong. When you use \xc2\xa0
syntax, it searches for UTF-8 character. But with \x{c2a0}
syntax, it tries to convert the Unicode sequence to UTF-8 encoded character.
一个不间断的空格是U+00A0
(Unicode),但在UTF-8中编码为C2A0
.因此,如果您尝试使用模式~\x{00a0}~siu
,它将按预期工作.
A non breaking space is U+00A0
(Unicode) but encoded as C2A0
in UTF-8. So if you try with the pattern ~\x{00a0}~siu
, it will work as expected.
这篇关于不间断的utf-8 0xc2a0空间和preg_replace奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!