大文件上的simplexml_load_string错误在一个系统上发生,而在另一个系统上不发生 [英] simplexml_load_string errors on big files occur on one system but not another

查看:384
本文介绍了大文件上的simplexml_load_string错误在一个系统上发生,而在另一个系统上不发生的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个我无法编辑的第三方PHP库,并且已经运行了将近一年.它在来自远程服务器的响应上使用simplexml_load_string.最近,它一直在引起人们的广泛反响.这是用于房地产列表的数据供稿,其格式如下所示:

I'm dealing with a third party PHP library that I can't edit, and it's been working fine for almost a year. It uses simplexml_load_string on the response from a remote server. Lately it's been choking on large responses. This is a data feed for real estate listings, and the format looks something like this:

<?xml version="1.0"?>
<RETS ReplyCode="0" ReplyText="Operation Successful Reference ID: 9bac803e-b507-49b7-ac7c-d8e8e3f3aa89">
<COUNT Records="9506" />
<DELIMITER value="09" />
<COLUMNS>   sysid   1   2   3   4   5   6   </COLUMNS>
<DATA>  252370080   Residential 0.160   No  ADDR0   06051</DATA>
<DATA>  252370081   Residential 0.440   Yes ADDR0   06043</DATA>
<DATA>  252370082   Residential 1.010   No  ADDR0   06023</DATA>
<DATA>More tab delimited text</DATA>
<!-- snip 9000+ lines -->
</RETS>

我下载了响应的样本文件(约22MB),在这里我完成了调试和理智的工作.两台服务器都运行PHP版本5.3.8,但请注意不同的结果.我可以肯定的是两个文件都是相同的(我想不同的文件大小,strlen和最后50个字符可以由Windows换行符附加一个回车符来解释).测试脚本:

I downloaded a sample file of a response (about 22MB), here's where I ended up with my debugging and sanity. Both servers are running PHP Version 5.3.8, but note the different results. I'm as certain as I can be that both files are the same (I suppose the different filesize, strlen, and last 50 chars can be explained by Windows newlines having an extra carriage return character). Test script:

error_reporting(-1);
ini_set('display_errors', 1);
$file = 'error-example.xml';
$xml = file_get_contents($file);

echo 'filesize:              ';
var_dump(filesize($file));

echo 'strlen:                ';
var_dump(strlen($xml));

echo 'simplexml object?      ';
var_dump(is_object(simplexml_load_string($xml)));

echo 'Last 50 characters:    ';
var_dump(substr($xml, -50));

在Windows上本地输出:

Output locally on Windows:

filesize:              int(21893604)
strlen:                int(21893604)
simplexml object?      bool(true)
Last 50 characters:    string(50) "RD DR    CT  Watertown   203-555-5555            </DATA>
</RETS>"

在远程UNIX服务器上的输出:

Output on remote UNIX server:

filesize:              int(21884093)
strlen:                int(21884093)
simplexml object?      
Warning: simplexml_load_string(): Entity: line 9511: parser error : internal error in /path/to/test.php on line 19

Warning: simplexml_load_string(): AULTED CEILING IN FOYER, BRICK FP IN FR, NEW FLOORING IN LR DR FR FOYER KITCHEN  in /path/to/test.php on line 19

Warning: simplexml_load_string():                                                                                ^ in /path/to/test.php on line 19

Warning: simplexml_load_string(): Entity: line 9511: parser error : Extra content at the end of the document in /path/to/test.php on line 19

Warning: simplexml_load_string(): AULTED CEILING IN FOYER, BRICK FP IN FR, NEW FLOORING IN LR DR FR FOYER KITCHEN  in /path/to/test.php on line 19

Warning: simplexml_load_string():                                                                                ^ in /path/to/test.php on line 19
bool(false)
Last 50 characters:    string(50) "ORD DR   CT  Watertown   203-555-5555            </DATA>
</RETS>"

一些评论和其他信息的回复:

Some replies to comments and additional info:

  • 据我所知,XML本身似乎是有效的(并且确实在我的系统上工作).

  • The XML itself appears to be valid as far as I can tell (and it does work on my system).

magic_quotes_runtime肯定关闭了.

正在运行的服务器具有libxml版本2.7.7,而另一个具有2.7.6.真的可以有所作为吗?我找不到libxml更改日志,但似乎不太可能.

The working server has libxml Version 2.7.7 while the other has 2.7.6. Could that really make the difference? I could not find a libxml change log but it seems unlikely.

这似乎仅在响应/文件超过一定大小时才会发生,并且错误总是发生在倒数第二行.

This seems to only happen when the response/file is over a certain size, and the error always occurs at the next-to-last line.

我没有遇到内存问题,测试脚本会立即运行.

I am not running into memory issues, the test script runs instantly.

如果我知道哪些是相关的,我可以发布PHP配置中的差异.知道可能是什么问题,或者知道我可能要检查的其他内容?

There are differences in the PHP configurations which I can post if I knew which ones were relevant. Any idea what the problem could be, or know of anything else I might want to check?

推荐答案

libxml2更改日志包含"608773在xmlGROW(Daniel Veillard)中添加缺少的支票" ,这似乎与输入缓冲有关.注意,我对libxml2的内部知识一无所知,但是似乎可以想象到,您已经勾选了一个2.7.6的错误,该错误已在2.7.7中修复.

The libxml2 changelog contains "608773 add a missing check in xmlGROW (Daniel Veillard)", which seems to be related to input buffering. Note I don't know anything about libxml2 internals, but it seems conceivable that you have tickled a 2.7.6 bug fixed in 2.7.7.

直接使用simplexml_load_file()时检查行为是否有所不同,并尝试设置与libxml解析器相关的选项,例如

Check if the behavior is any different when you use simplexml_load_file() directly, and try setting libxml parser-related options, e.g.

simplexml_load_string($xml, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE)

具体来说,您可能要尝试使用LIBXML_PARSEHUGE标志.

Specifically, you might want to try the LIBXML_PARSEHUGE flag.

http://php.net/manual/en/libxml.constants.php
XML_PARSE_HUGE标志放宽解析器中的任何硬编码限制.这会影响文档的最大深度或实体递归等限制,以及文本节点大小的限制.

http://php.net/manual/en/libxml.constants.php
XML_PARSE_HUGE flag relaxes any hardcoded limit from the parser. This affects limits like maximum depth of a document or the entity recursion, as well as limits of the size of text nodes.

这篇关于大文件上的simplexml_load_string错误在一个系统上发生,而在另一个系统上不发生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆