错误:“输入不是正确的 UTF-8,请指示编码!";使用 PHP 的 simplexml_load_string [英] Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string

查看:25
本文介绍了错误:“输入不是正确的 UTF-8,请指示编码!";使用 PHP 的 simplexml_load_string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到错误:

解析器错误:输入的 UTF-8 不正确,请指示编码!字节数:0xED 0x6E 0x2C 0x20

尝试使用来自 3rd 方来源的 simplexml_load_string 处理 XML 响应时.原始 XML 响应确实声明了内容类型:

When trying to process an XML response using simplexml_load_string from a 3rd party source. The raw XML response does declare the content type:

然而,XML 似乎并不是真正的 UTF-8.XML 内容的语言是西班牙语,并在 XML 中包含诸如 Dublín 之类的词.

Yet it seems that the XML is not really UTF-8. The langauge of the XML content is Spanish and contain words like Dublín in the XML.

我无法让第 3 方整理他们的 XML.

I'm unable to get the 3rd party to sort out their XML.

如何预处理 XML 并修复编码不兼容问题?

How can I pre-process the XML and fix the encoding incompatibilities?

有没有办法检测 XML 文件的正确编码?

Is there a way to detect the correct encoding for a XML file?

推荐答案

您的 0xED 0x6E 0x2C 0x20 字节对应于 ISO-8859-1 中的ín",所以看起来您的内容是 ISO-8859-1,不是 UTF-8.告诉您的数据提供商并要求他们修复它,因为如果它对您不起作用,那么它可能也不适用于其他人.

Your 0xED 0x6E 0x2C 0x20 bytes correspond to "ín, " in ISO-8859-1, so it looks like your content is in ISO-8859-1, not UTF-8. Tell your data provider about it and ask them to fix it, because if it doesn't work for you it probably doesn't work for other people either.

现在有几种方法可以解决这个问题,只有在无法正常加载 XML 时才应该使用.其中之一是使用 utf8_encode().缺点是,如果该 XML 包含有效的 UTF-8 和一些 ISO-8859-1,那么结果将包含 莫吉巴克.或者您可以尝试使用 iconv() 或 mbstring 将字符串从 UTF-8 转换为 UTF-8,并希望他们能为您修复它.(他们不会,但您至少可以忽略无效字符,以便您可以加载您的 XML)

Now there are a few ways to work it around, which you should only use if you cannot load the XML normally. One of them would be to use utf8_encode(). The downside is that if that XML contains both valid UTF-8 and some ISO-8859-1 then the result will contain mojibake. Or you can try to convert the string from UTF-8 to UTF-8 using iconv() or mbstring, and hope they'll fix it for you. (they won't, but you can at least ignore the invalid characters so you can load your XML)

或者您可以走很长很长的路,自己验证/修复序列.这将需要一段时间,具体取决于您对 UTF-8 的熟悉程度.也许有图书馆可以做到这一点,尽管我不知道.

Or you can take the long, long road and validate/fix the sequences by yourself. That will take you a while depending on how familiar you are with UTF-8. Perhaps there are libraries out there that would do that, although I don't know any.

无论哪种方式,请通知您的数据提供者他们正在发送无效数据,以便他们进行修复.

Either way, notify your data provider that they're sending invalid data so that they can fix it.

这是部分修复.它肯定不会解决所有问题,但会解决其中的一些问题.希望足以让您度过难关,直到您的提供商修复他们的东西.

Here's a partial fix. It will definitely not fix everything, but will fix some of it. Hopefully enough for you to get by until your provider fix their stuff.

function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($str)
{
    return preg_replace_callback('#[\xA1-\xFF](?![\x80-\xBF]{2,})#', 'utf8_encode_callback', $str);
}

function utf8_encode_callback($m)
{
    return utf8_encode($m[0]);
}

这篇关于错误:“输入不是正确的 UTF-8,请指示编码!";使用 PHP 的 simplexml_load_string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆