open-uri和sax解析大型XML文档 [英] open-uri and sax parsing for a giant xml document

查看:149
本文介绍了open-uri和sax解析大型XML文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要连接到外部XML文件进行下载和处理(超过300MB). 然后遍历XML文档并将元素保存在数据库中.

I need to connect to an external XML file to download and process (300MB+). Then run through the XML document and save elements in the database.

我已经在使用 Saxerator 的生产服务器上做到这一点没问题,以节省内存.效果很好.现在是我的问题-

I am already doing this no problem on a production server with Saxerator to be gentle on memory. It works great. Here is my issue now --

我需要使用open-uri(尽管可能有其他解决方案?)来抓取要解析的文件.问题是open-uri必须先解析整个文件,然后再开始解析,这违背了使用SAX解析器保存在内存中的整个目的.我可以只读取外部XML文档吗?我无法加载整个文件,否则将导致服务器崩溃,并且由于该文档每30分钟更新一次,因此我不能仅将其副本保存在服务器上(尽管这是我目前正在执行的操作,以确保所有id都能正常工作).

I need to use open-uri (though there could be alternative solutions?) to grab the file to parse through. This problem is that open-uri has to load the whole file before anything starts parsing, which defeats the entire purpose of using a SAX Parser to save on memory... any work arounds? Can I just read from the external XML document? I cannot load the entire file or it crashes my server, and since the document is updated every 30 minutes, I can't just save a copy of it on my server (though this is what I am doing currently to make sure everything id working).

我正在做这个Ruby,p.s.

I am doing this Ruby, p.s.

推荐答案

您可能想尝试Net :: HTTP的

You may want to try Net::HTTP's streaming interface instead of open-URI. This will give Saxerator (via the underlying Nokogiri::SAX::Parser) an IO object rather than the entire file.

这篇关于open-uri和sax解析大型XML文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆