PHP htmlspecialchars函数中的Unicode替换字符 [英] Unicode Replacement Characters in the PHP htmlspecialchars function

查看:198
本文介绍了PHP htmlspecialchars函数中的Unicode替换字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在htmlspecialchars函数中,如果设置ENT_SUBSTITUTE标志,则应该替换一些无效字符.

In the htmlspecialchars function, if you set the ENT_SUBSTITUTE flag, it is supposed to replace some invalid characters.

将替换哪些字符?无效字符和用来替换无效字符的映射是什么?

What characters are replaced? And what is the mapping between the invalid characters and the ones that are used to replace it?

推荐答案

只有一个通用替换字符:U + FFFD.如果要写出UTF-8,则此代码点已正确编码.如果没有,您将获得相应的字符引用�.

There is only one, universal replacement character: U+FFFD. If you are writing out UTF-8, then this codepoint is appropriately encoded. If not, you get the corresponding character reference � instead.

没有可逆映射.根据定义,原始字节序列为无效,即它没有具有值(有效=具有值).

There is no reversible mapping. By definition, the original byte sequence was invalid, i.e. it does not have a value (valid = has a value).

替换的字节(不是真正的字符")是在假定的源编码中无效的字节.例如,如果您的源编码是UTF-16,并且您有一个单独的代理,那将是无效的"(尽管从技术上讲,任何文本处理器都应该在这种情况下致命地中止).更好的例子是,如果源编码是ASCII,则127以上的任何值都是无效字符.

Bytes (not really "characters") that are replaced are those that are not valid in the assumed source encoding. For example, if your source encoding was UTF-16 and you had a lone surrogate, that would be "invalid" (though technically any text processor is supposed to abort fatally in that situation). As a better example, if the source encoding is ASCII, then any value above 127 is an invalid character.

这篇关于PHP htmlspecialchars函数中的Unicode替换字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆