错误:“输入不正确UTF-8,表示编码!”使用PHP的simplexml_load_string [英] Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string

查看:502
本文介绍了错误:“输入不正确UTF-8,表示编码!”使用PHP的simplexml_load_string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到错误:

解析器错误:输入不正确UTF-8,表示编码!字节:0xED 0x6E 0x2C 0x20

当尝试使用 simplexml_load_string 来自第三方来源。原始XML响应确实声明了内容类型:

When trying to process an XML response using simplexml_load_string from a 3rd party source. The raw XML response does declare the content type:

<?xml version =1.0encoding =UTF-8?>

然而似乎XML并不是UTF-8。 XML内容的语言是西班牙语,并且包含XML中的Dublín之类的单词。

Yet it seems that the XML is not really UTF-8. The langauge of the XML content is Spanish and contain words like Dublín in the XML.

我无法获得第三方整理XML。

I'm unable to get the 3rd party to sort out their XML.

如何预处理XML并修复编码不兼容?

How can I pre-process the XML and fix the encoding incompatibilities?

有没有办法检测XML文件的正确编码?

Is there a way to detect the correct encoding for a XML file?

推荐答案

您的0xED 0x6E 0x2C 0x20字节对应到ISO-8859-1中的ín,所以看起来您的内容是ISO-8859-1而不是UTF-8。告诉你的数据提供商,并要求他们解决它,因为如果它不适合你,它可能不适用于其他人。

Your 0xED 0x6E 0x2C 0x20 bytes correspond to "ín, " in ISO-8859-1, so it looks like your content is in ISO-8859-1, not UTF-8. Tell your data provider about it and ask them to fix it, because if it doesn't work for you it probably doesn't work for other people either.

现在有有几种方法可以使用它,如果您无法正常加载XML,那么您只能使用 。其中一个是使用 utf8_encode()。缺点是,如果该XML包含有效的UTF-8和一些ISO-8859-1,则结果将包含变为乱码。或者您可以尝试使用 iconv()或mbstring将UTF-8的字符串转换为UTF-8,并希望他们可以为您解决。 (他们不会,但你可以至少忽略无效的字符,以便您可以加载您的XML)

Now there are a few ways to work it around, which you should only use if you cannot load the XML normally. One of them would be to use utf8_encode(). The downside is that if that XML contains both valid UTF-8 and some ISO-8859-1 then the result will contain mojibake. Or you can try to convert the string from UTF-8 to UTF-8 using iconv() or mbstring, and hope they'll fix it for you. (they won't, but you can at least ignore the invalid characters so you can load your XML)

或者您可以采取漫长的漫长的道路和验证/修复自己的序列这将需要你一段时间,这取决于你对UTF-8的熟悉程度。也许有些图书馆会这样做,虽然我不知道。

Or you can take the long, long road and validate/fix the sequences by yourself. That will take you a while depending on how familiar you are with UTF-8. Perhaps there are libraries out there that would do that, although I don't know any.

无论哪种方式,通知您的数据提供商他们正在发送无效数据,以便他们可以修复它。

Either way, notify your data provider that they're sending invalid data so that they can fix it.

这是一个部分修复。它绝对不会修复一切,但会修复一些。希望足够让你得到,直到你的提供商修复他们的东西。

Here's a partial fix. It will definitely not fix everything, but will fix some of it. Hopefully enough for you to get by until your provider fix their stuff.

function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($str)
{
    return preg_replace_callback('#[\\xA1-\\xFF](?![\\x80-\\xBF]{2,})#', 'utf8_encode_callback', $str);
}

function utf8_encode_callback($m)
{
    return utf8_encode($m[0]);
}

这篇关于错误:“输入不正确UTF-8,表示编码!”使用PHP的simplexml_load_string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆