如何在 PHP 中检测格式错误的 UTF-8 字符串? [英] How can I detect a malformed UTF-8 string in PHP?

查看:37
本文介绍了如何在 PHP 中检测格式错误的 UTF-8 字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

iconv 函数有时会给我一个错误:

The iconv function sometimes gives me an error:

Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]

有没有办法在将数据发送到 inconv() 之前检测 UTF-8 字符串中是否存在非法字符?

Is there a way to detect that there are illegal characters in a UTF-8 string before sending data to inconv()?

推荐答案

首先,请注意,无法检测文本是否属于特定的不需要的编码.您只能检查字符串在给定编码中是否有效.

First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.

您可以使用 中提供的 UTF-8 有效性检查preg_match [PHP 手册] 自 PHP 4.3.5 起.如果给出无效字符串,它将返回 0(没有附加信息):

You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid string is given:

$isUTF8 = preg_match('//u', $string);

另一种可能是mb_check_encoding[PHP 手册]:

$validUTF8 = mb_check_encoding($string, 'UTF-8');

您可以使用的另一个函数是 mb_detect_encoding [PHP 手册]:

Another function you can use is mb_detect_encoding [PHP Manual]:

$validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));

strict 参数设置为 true 很重要.

It's important to set the strict parameter to true.

此外,iconv [PHP 手册] 允许您即时更改/删除无效序列.(但是,如果 iconv 遇到这样的序列,它会生成一个通知;此行为无法更改.)

Additionally, iconv [PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv encounters such a sequence, it generates a notification; this behavior cannot be changed.)

echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;

您可以使用@并检查返回字符串的长度:

You can use @ and check the length of the return string:

strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));

也请查看 iconv 手册页上的示例.

Check the examples on the iconv manual page as well.

这篇关于如何在 PHP 中检测格式错误的 UTF-8 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆