如何在PHP中检测到畸形的utf-8字符串? [英] How to detect malformed utf-8 string in PHP?

查看:317
本文介绍了如何在PHP中检测到畸形的utf-8字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 注意:
iconv()[function.iconv]:
在[...]

中输入字符串中检测到不完整的多字节字符有没有方法来检测utf-8字符串中是否存在非法字符,然后再将数据转换为不一致?

解决方案

首先,无法检测文本是否属于特定的不需要的编码。您只能检查字符串是否在给定的编码中有效。



您可以使用在 [PHP手册] sup> ,因为PHP 4.3.5。如果给出无效的字符串,它将返回 0 (没有其他信息)

  $ isUTF8 = preg_match('// u',$ string); 

另一种可能性是 mb_check_encoding [PHP手册] : / p>

  $ validUTF8 = mb_check_encoding($ string,'UTF-8'); 

您可以使用的另一个功能是 mb_detect_encoding [PHP手册]

  $ validUTF8 =! (false === mb_detect_encoding($ string,'UTF-8',true)); 

strict 参数设置为 true



另外, iconv [PHP手册] 允许您更改/删除无效序列在飞行中(但是,如果 iconv 遇到这样一个序列,它会生成一个通知;这种行为不能被更改。)

  echo'TRANSLIT:',iconv(UTF-8,ISO-8859-1 // TRANSLIT,$ string),PHP_EOL; 
echo'IGNORE:',iconv(UTF-8,ISO-8859-1 // IGNORE,$ string),PHP_EOL;

您可以使用 @ 并检查长度的返回字符串:

  strlen($ string)=== strlen(@iconv('UTF-8','UTF -8 // IGNORE',$ string)); 

查看 iconv 手册页上的示例以及。



您尚未共享通知来源的源代码。如果您想要更具体的建议,您应该添加它。


iconv function sometimes gives me an error:

Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]

Is there a way to detect that there are illegal characters in utf-8 string before putting data to inconv ?

解决方案

First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.

You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid string is given:

$isUTF8 = preg_match('//u', $string);

Another possibility is mb_check_encoding [PHP Manual]:

$validUTF8 = mb_check_encoding($string, 'UTF-8');

Another function you can use is mb_detect_encoding [PHP Manual]:

$validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));

It's important to set the strict parameter to true.

Additionally, iconv [PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv encounters such a sequence, it generates a notification; this behavior cannot be changed.)

echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;

You can use @ and check the length of the return string:

strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));

Check the examples on the iconv manual page as well.

You have not shared the source code where the notice is resulting from. You should add it if you want a more concrete suggestion.

这篇关于如何在PHP中检测到畸形的utf-8字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆