如何替换已解码的不可破坏空间(nbsp) [英] How to replace decoded Non-breakable space (nbsp)
问题描述
假设我有一个"a s d d"
的字符串,并且htmlentities
将其变成
"a s d d"
.
如何在不将其编码为实体的情况下替换(使用preg_replace)?
我尝试了preg_replace('/[\xa0]/', '', $string);
,但是它不起作用.我正在尝试从字符串中删除那些特殊字符,因为我不需要它们
除了regexp以外还有什么可能?
编辑
我想解析的字符串: http://pastebin.com/raw/7eNT9sZr
具有功能preg_replace('/[\r\n]+/', "[##]", $text)
在以后的implode("</p><p>", explode("[##]", $text))
我的问题不完全是如何"做到这一点(因为我可以编码实体,删除不需要的实体并解码实体).但是,如何仅使用str_replace或preg_replace删除它们.
问题是您指定了 请注意,在 相反, UTF-8编码称为可变宽度字符编码,这意味着字符代码由一个至多四个(8位)字节组成.通常,使用频率更高的字符具有较短的代码,而更多的奇异字符具有较长的代码. Assuming I have a sting which is How to replace (using preg_replace) it without encoding it to entities? I tried What are possibilities beyond regexp? Edit
String I want to parse: http://pastebin.com/raw/7eNT9sZr My question is not exactly "how" to do this (since I could encode entities, remove entities i don't need and decode entities). But how to remove those with just str_replace or preg_replace. The problem is that you are specifying the non-breakable space in a wrong way. The proper code of the non-breakable space in UTF-8 encoding is You can replace it using the simple (and fast) Note that in case of In contrast, the The UTF-8 encoding is so called variable width character encoding, that means character codes consist from one up to four (8 bit) bytes. In general, more frequently used characters have shorter codes while more exotic characters have longer codes. 这篇关于如何替换已解码的不可破坏空间(nbsp)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!str_replace
的情况下,,您必须使用双引号("
)括住搜索字符串,因为它不理解字符代码的文本表示形式,因此它需要首先将这些代码转换为实际字符.这是由PHP自动完成的,因为正在处理用双引号引起来的字符串,并且特殊序列(例如,换行符\n
,字符代码的文本表示等)被实际字符(例如,UTF中的0x0A
表示\n
)替换了-8)在使用字符串值之前.preg_replace
函数本身理解字符代码的文本表示形式,因此您不需要PHP即可将其转换为实际字符,并且可以使用撇号(单引号,'
)将搜索字符串括起来这种情况."a s d d"
and htmlentities
turns it into
"a s d d"
.preg_replace('/[\xa0]/', '', $string);
, but it's not working. I'm trying to remove those special characters from my string as I don't need them
with function preg_replace('/[\r\n]+/', "[##]", $text)
for later implode("</p><p>", explode("[##]", $text))
0xC2A0
, it consists of two bytes - C2
(194
) and A0
(160
), you're specifying only the half of the character's code.str_replace
or using a more flexible regular expression, depending on your needs:// faster solution
$regular_spaces = str_replace("\xc2\xa0", ' ', $original_string);
// more flexible solution
$regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string);
str_replace
, you have to use double quotes ("
) to enclose the search string because it doesn't understand textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character \n
, textual representation of character codes, etc.) are replaced by actual characters (e.g. 0x0A
for \n
in UTF-8) before the string value is being used.preg_replace
function itself understands textual representation of the character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, '
) to enclose the search string in this case.