如何替换已解码的不可破坏空间(nbsp) [英] How to replace decoded Non-breakable space (nbsp)

查看:93
本文介绍了如何替换已解码的不可破坏空间(nbsp)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个"a s d d"的字符串,并且htmlentities将其变成
"a s d d".

如何在不将其编码为实体的情况下替换(使用preg_replace)?

我尝试了preg_replace('/[\xa0]/', '', $string);,但是它不起作用.我正在尝试从字符串中删除那些特殊字符,因为我不需要它们

除了regexp以外还有什么可能?

编辑 我想解析的字符串: http://pastebin.com/raw/7eNT9sZr
具有功能preg_replace('/[\r\n]+/', "[##]", $text)
在以后的implode("</p><p>", explode("[##]", $text))

我的问题不完全是如何"做到这一点(因为我可以编码实体,删除不需要的实体并解码实体).但是,如何仅使用str_replace或preg_replace删除它们.

解决方案

问题是您指定了

请注意,在 str_replace 的情况下,,您必须使用双引号(")括住搜索字符串,因为它不理解字符代码的文本表示形式,因此它需要首先将这些代码转换为实际字符.这是由PHP自动完成的,因为正在处理用双引号引起来的字符串,并且特殊序列(例如,换行符\n,字符代码的文本表示等)被实际字符(例如,UTF中的0x0A表示\n)替换了-8)在使用字符串值之前.

相反, preg_replace 函数本身理解字符代码的文本表示形式,因此您不需要PHP即可将其转换为实际字符,并且可以使用撇号(单引号,')将搜索字符串括起来这种情况.

UTF-8编码称为可变宽度字符编码,这意味着字符代码由一个至多四个(8位)字节组成.通常,使用频率更高的字符具有较短的代码,而更多的奇异字符具有较长的代码.

Assuming I have a sting which is "a s d d" and htmlentities turns it into
"a&nbsp;s&nbsp;d&nbsp;d".

How to replace (using preg_replace) it without encoding it to entities?

I tried preg_replace('/[\xa0]/', '', $string);, but it's not working. I'm trying to remove those special characters from my string as I don't need them

What are possibilities beyond regexp?

Edit String I want to parse: http://pastebin.com/raw/7eNT9sZr
with function preg_replace('/[\r\n]+/', "[##]", $text)
for later implode("</p><p>", explode("[##]", $text))

My question is not exactly "how" to do this (since I could encode entities, remove entities i don't need and decode entities). But how to remove those with just str_replace or preg_replace.

解决方案

The problem is that you are specifying the non-breakable space in a wrong way. The proper code of the non-breakable space in UTF-8 encoding is 0xC2A0, it consists of two bytes - C2 (194) and A0 (160), you're specifying only the half of the character's code.

You can replace it using the simple (and fast) str_replace or using a more flexible regular expression, depending on your needs:

// faster solution
$regular_spaces = str_replace("\xc2\xa0", ' ', $original_string);

// more flexible solution
$regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string);

Note that in case of str_replace, you have to use double quotes (") to enclose the search string because it doesn't understand textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character \n, textual representation of character codes, etc.) are replaced by actual characters (e.g. 0x0A for \n in UTF-8) before the string value is being used.

In contrast, the preg_replace function itself understands textual representation of the character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, ') to enclose the search string in this case.

The UTF-8 encoding is so called variable width character encoding, that means character codes consist from one up to four (8 bit) bytes. In general, more frequently used characters have shorter codes while more exotic characters have longer codes.

这篇关于如何替换已解码的不可破坏空间(nbsp)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆