如何在PHP中检测到畸形的utf-8字符串? [英] How to detect malformed utf-8 string in PHP?
问题描述
注意:
iconv()[function.iconv]:
在[...]
中输入字符串中检测到不完整的多字节字符有没有方法来检测utf-8字符串中是否存在非法字符,然后再将数据转换为不一致?
首先,无法检测文本是否属于特定的不需要的编码。您只能检查字符串是否在给定的编码中有效。
您可以使用在 [PHP手册] sup> ,因为PHP 4.3.5。如果给出无效的字符串,它将返回 0
(没有其他信息)
$ isUTF8 = preg_match('// u',$ string);
另一种可能性是 mb_check_encoding
[PHP手册] : / p>
$ validUTF8 = mb_check_encoding($ string,'UTF-8');
您可以使用的另一个功能是 mb_detect_encoding
[PHP手册] :
$ validUTF8 =! (false === mb_detect_encoding($ string,'UTF-8',true));
将 strict
参数设置为 true
。
另外, iconv
[PHP手册] 允许您更改/删除无效序列在飞行中(但是,如果 iconv
遇到这样一个序列,它会生成一个通知;这种行为不能被更改。)
echo'TRANSLIT:',iconv(UTF-8,ISO-8859-1 // TRANSLIT,$ string),PHP_EOL;
echo'IGNORE:',iconv(UTF-8,ISO-8859-1 // IGNORE,$ string),PHP_EOL;
您可以使用 @
并检查长度的返回字符串:
strlen($ string)=== strlen(@iconv('UTF-8','UTF -8 // IGNORE',$ string));
查看 iconv
手册页上的示例以及。
您尚未共享通知来源的源代码。如果您想要更具体的建议,您应该添加它。
iconv function sometimes gives me an error:
Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]
Is there a way to detect that there are illegal characters in utf-8 string before putting data to inconv ?
First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.
You can make use of the UTF-8 validity check that is available in preg_match
[PHP Manual] since PHP 4.3.5. It will return 0
(with no additional information) if an invalid string is given:
$isUTF8 = preg_match('//u', $string);
Another possibility is mb_check_encoding
[PHP Manual]:
$validUTF8 = mb_check_encoding($string, 'UTF-8');
Another function you can use is mb_detect_encoding
[PHP Manual]:
$validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));
It's important to set the strict
parameter to true
.
Additionally, iconv
[PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv
encounters such a sequence, it generates a notification; this behavior cannot be changed.)
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;
You can use @
and check the length of the return string:
strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));
Check the examples on the iconv
manual page as well.
You have not shared the source code where the notice is resulting from. You should add it if you want a more concrete suggestion.
这篇关于如何在PHP中检测到畸形的utf-8字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!