iconv()–如何检测冒犯性人物? [英] iconv() – how to detect offending character?

查看:97
本文介绍了iconv()–如何检测冒犯性人物?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 iconv()将CSV数据从 UTF-8 转换为 Windows-1252

I use iconv() to convert CSV data from UTF-8 to Windows-1252.

$converted = iconv("UTF-8", "Windows-1252", $csvData);

在某些情况下, iconv()失败静静地返回 false

In some cases, iconv() failed quietly, returning false.

我也尝试过使用 // TRANSLIT ,但`iconv()´也会在此返回 false

I also tried using //TRANSLIT but `iconv()´ returns false here as well.

当我添加 // IGNORE时声明到目标字符集,转换成功,但这意味着一个或多个字符丢失了。

When i add the //IGNORE statement to the target charset, the conversion succeeds, but that means one or more character(s) got lost.

我可以坚持 // IGNORE ,但我想找出是哪个字符引起了问题。

I can stick to //IGNORE but i would like to find out which character(s) are causing the problem.

我该怎么办

推荐答案

将字符串用作char数组是不好的主意(请参阅问题注释),因为 php字符串类型

It was bad idea to work with string as char array (see question comments) because php string type


在内部,PHP字符串是字节数组。结果,使用数组括号访问或修改字符串不是多字节安全的,并且仅应使用单字节编码的字符串(例如ISO-8859-1)来完成。

Internally, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.

所以我们可以对utf-8使用 mb_substr 并使用符号而不是字节

So we can use mb_substr for utf-8 and work with symbols not bytes

error_reporting('E_ALL & !E_NOTICE');
$yourString = "test bad ☺ string";
$convertString = '';
$badChars = [];

if (iconv("UTF-8", "Windows-1252", $yourString) === false) {       
    for($i = 0, $stringLength = mb_strlen($yourString); $i < $stringLength; $i++) {
        $char = mb_substr($yourString, $i, 1);
        $convertChar = iconv("UTF-8", "Windows-1252", $char);

        if ($convertChar === false) {
            $badChars[$i] = $char;
        } else {
            $convertString .= $convertChar;
        }   
    }
} else {
    $convertString = iconv("UTF-8", "Windows-1252", $yourString);
}

var_dump($badChars, $convertString);

结果 array(1){[9] => string(3)☺} string(16)测试错误的字符串

P.S。下次,我将用代码给出更详细的答案。我的错误

P.S. The next time I will give a more detailed answer with the code. My mistake

这篇关于iconv()–如何检测冒犯性人物?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆