XMLReader编码错误 [英] XMLReader encoding error
问题描述
我有一个PHP脚本正在尝试解析巨大的XML文件.为此,我使用XMLReader库.在解析期间,我遇到了此编码错误:
I have a PHP script which is trying to parse a huge XML file. To do this I'm using the XMLReader library. During the parsing, I have this encoding error:
输入不正确的UTF-8,请指示编码!字节:0xA0 0x32 0x36 0x30
Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x32 0x36 0x30
我想知道它们是否是一种跳过带有不良字符的记录的方法.
I would like to know if they are a way to skip records with bad characters.
谢谢!
推荐答案
首先,请确保您的XML文件确实是UTF-8编码的.如果未将编码指定为 XMLReader :: open()
.
First of all, make sure that your XML file is indeed UTF-8 encoded. If not specify the encoding as the second parameter to XMLReader::open()
.
如果编码错误是由于UTF-8文档中的字节序列格式错误,并且您使用的是PHP> 5.2.0,则可以传递 LIBXML_NOERROR
和/或(取决于错误级别) LIBXML_NOWARNING
作为第三个参数的位掩码> XMLReader :: open() :
If the encoding error is due a real malformed byte sequence in an UTF-8 document and if you're using PHP > 5.2.0 you could pass LIBXML_NOERROR
and/or (depending on the error level) LIBXML_NOWARNING
as a bitmask to the third parameter of XMLReader::open()
:
$xml = new XMLReader();
$xml->open('myxml.xml', null, LIBXML_NOERROR | LIBXML_NOWARNING);
如果您使用的是PHP> 5.1.0,则可以调整 libXML
错误处理.
If your're using PHP > 5.1.0 you can tweak the libXML
error-handling.
// enable user error handling
libxml_use_internal_errors(true);
/* ... do your XML processing ... */
$errors = libxml_get_errors();
foreach ($errors as $error) {
// handle errors here
}
libxml_clear_errors();
我实际上不知道前面的两个变通办法是否真的允许 XMLReader
在出现错误的情况下继续读取,或者它们是否仅抑制错误输出.但这值得一试.
I actually don't know if the preceding two work-arounds actually allow XMLReader
to continue reading in case of an error or if they only suppress the error output. But it's worth a try.
回复评论:
libXML
定义 XML_PARSE_RECOVER 代码>
(1),但ext/libxml并未将此常量公开为PHP常量.也许可以将整数值 1
传递给 $ options
参数.
$xml = new XMLReader();
$xml->open('myxml.xml', null, LIBXML_NOERROR | LIBXML_NOWARNING | 1);
这篇关于XMLReader编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!