错误:“输入不是适当的UTF-8,指示编码!使用PHP的simplexml_load_string [英] Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string

查看:890
本文介绍了错误:“输入不是适当的UTF-8,指示编码!使用PHP的simplexml_load_string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到错误:



解析器错误:输入不正确的UTF-8,表示编码!当尝试使用 simplexml_load_string 来处理XML响应时,字节:0xED 0x6E 0x2C 0x20


$ b <从第三方来源。原始XML响应声明了内容类型:



<?xml version =1.0encoding =UTF-8?&



但似乎XML并不是UTF-8。 XML内容的语言是西班牙语,并在XML中包含诸如Dublín之类的单词。





如何预处理XML并解决编码不兼容问题?



有没有办法检测XML文件的正确编码?

解决方案

您的0xED 0x6E 0x2C 0x20字节对应到ín,因此看起来您的内容是在ISO-8859-1,而不是UTF-8。告诉你的数据提供者,并要求他们修复它,因为如果它不适合你,它可能不适用于其他人。



现在有是几种工作方式,如果您无法正常加载XML,则只能使用 。其中一个是使用 utf8_encode()。缺点是,如果该XML包含有效的UTF-8和一些ISO-8859-1,那么结果将包含 mojibake 。或者你可以尝试使用 iconv()或mbstring将字符串从UTF-8转换为UTF-8,希望他们能为你解决这个问题。 (它们不会,但您至少可以忽略无效字符,以便您可以加载XML)



或者您可以采取长,漫长的道路,验证/修复序列自己。这将需要一段时间,取决于你对UTF-8的熟悉程度。



无论如何,通知你的数据提供者他们正在发送无效数据,这样就可以他们可以修复它。






这里有一个部分修复。它肯定不会解决一切,但会解决一些。希望足够你可以直到你的提供者修复他们的东西。

  function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($ str)
{
return preg_replace_callback('#[\\xA1-\\xFF](?![\\x80-\\xBF] {2,})#','utf8_encode_callback',$ str);
}

function utf8_encode_callback($ m)
{
return utf8_encode($ m [0]);
}


I'm getting the error:

parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xED 0x6E 0x2C 0x20

When trying to process an XML response using simplexml_load_string from a 3rd party source. The raw XML response does declare the content type:

<?xml version="1.0" encoding="UTF-8"?>

Yet it seems that the XML is not really UTF-8. The langauge of the XML content is Spanish and contain words like Dublín in the XML.

I'm unable to get the 3rd party to sort out their XML.

How can I pre-process the XML and fix the encoding incompatibilities?

Is there a way to detect the correct encoding for a XML file?

解决方案

Your 0xED 0x6E 0x2C 0x20 bytes correspond to "ín, " in ISO-8859-1, so it looks like your content is in ISO-8859-1, not UTF-8. Tell your data provider about it and ask them to fix it, because if it doesn't work for you it probably doesn't work for other people either.

Now there are a few ways to work it around, which you should only use if you cannot load the XML normally. One of them would be to use utf8_encode(). The downside is that if that XML contains both valid UTF-8 and some ISO-8859-1 then the result will contain mojibake. Or you can try to convert the string from UTF-8 to UTF-8 using iconv() or mbstring, and hope they'll fix it for you. (they won't, but you can at least ignore the invalid characters so you can load your XML)

Or you can take the long, long road and validate/fix the sequences by yourself. That will take you a while depending on how familiar you are with UTF-8. Perhaps there are libraries out there that would do that, although I don't know any.

Either way, notify your data provider that they're sending invalid data so that they can fix it.


Here's a partial fix. It will definitely not fix everything, but will fix some of it. Hopefully enough for you to get by until your provider fix their stuff.

function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($str)
{
    return preg_replace_callback('#[\\xA1-\\xFF](?![\\x80-\\xBF]{2,})#', 'utf8_encode_callback', $str);
}

function utf8_encode_callback($m)
{
    return utf8_encode($m[0]);
}

这篇关于错误:“输入不是适当的UTF-8,指示编码!使用PHP的simplexml_load_string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆