使用XML包在R中解析很大(10 GB)的XML文件时出错 [英] Error while parsing a very large (10 GB) XML file in R, using the XML package
问题描述
上下文
我目前正在从事一个涉及osm数据(开放式街道地图)的项目.为了操作地理对象,我必须将数据(一个osm xml文件)转换为一个对象. osmar软件包允许我执行此操作,但是它无法解析原始xml数据.
Context
I'm currently working on a project involving osm data (Open Street Map). In order to manipulate geographic objects, I have to convert the data (an osm xml file) into an object. The osmar package lets me do this, but it fails to parse the raw xml data.
错误
粘贴错误(文件,折叠="\ n"):结果将超过2 ^ 31-1个字节
Error in paste(file, collapse = "\n") : result would exceed 2^31-1 bytes
代码
require(osmar)
osmar_obj <- get_osm("anything", source = osmsource_file("my filename"))
在get_osm函数内部,代码调用ret <- xmlParse(raw)
,几秒钟后触发错误.
Inside the get_osm function, the code calls ret <- xmlParse(raw)
, which triggers the error after a few seconds.
问题
知道我有64G的内存,我应该如何读取一个较大的XML文件(此处为10GB)?
The question
How am I supposed to read a large XML file (here 10GB), knowing that I have 64G of memory ?
非常感谢!
Thanks a lot !
推荐答案
这是我想出的解决方案,即使不是100%令人满意.
This is the solution I came up with, even though it is not 100% satisfying.
- 通过删除shell中的所有换行符(最后一行除外)来转换.osm文件
- 运行与以前完全相同的代码,跳过不再需要的粘贴(因为您只是在shell中进行了等效操作)
利润:)
很显然,我对此并不满意,因为在shell中修改数据文件比实际解决方案更容易:(
Obviously, I'm not very happy with it because modifying the data file in shell is more a trick that an actual solution :(
这篇关于使用XML包在R中解析很大(10 GB)的XML文件时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!