使用xml2解析小型网页会引发XML_PARSE_HUGE错误 [英] Parsing small web page with xml2 throws XML_PARSE_HUGE error
问题描述
最近,我在R中的rNOMADS软件包的用户开始遇到意外错误:
Recently a user of my rNOMADS package in R began getting unexpected errors:
Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]
我们已将此问题归结为以下命令:
We tracked the issue down to this command:
html.tmp <- xml2::read_html("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120")
在该链接上,似乎要解析的网页不大于其他可以正常运行的网页,并且远远小于应使用XML_PARSE_HUGE选项的1兆字节的限制.此外,
Upon following the link, it appears that the web page to be parsed is no larger than other ones that work fine, and much less than the 1 megabyte limit that should require the XML_PARSE_HUGE option. Furthermore,
xml2::read_html
实际上实际上没有XML_PARSE_HUGE选项.唯一可能的解决方案,此处 ,不适用于正式的R包.
actually has no XML_PARSE_HUGE option anyway. The only other potential solution, described here, is not appropriate for an official R package.
此错误的原因是什么,是否有可能在不诉诸于官方CRAN信息库之外的解决方案的情况下解决该错误?
What is the cause of this error, and is it possible to resolve it without resorting to solutions outside the official CRAN repository?
推荐答案
到目前为止,我能做的最好的就是安装 shabbychef 的分叉版本的xml2,强制使用XML_PARSE_HUGE.您可以通过
The best I can do so far is to install shabbychef's forked version of xml2 that forces XML_PARSE_HUGE. You can install this version of xml2 via
library(drat)
drat:::add("shabbychef")
install.packages('xml2')
目前,如果您在rNOMADS中遇到XML_PARSE_HUGE错误,请使用此解决方法.
For the time being, please use this work around if you encounter XML_PARSE_HUGE errors in rNOMADS.
这篇关于使用xml2解析小型网页会引发XML_PARSE_HUGE错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!