在perl中连续处理XML数据 [英] Serially process XML data in perl

查看:139
本文介绍了在perl中连续处理XML数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道哪些XML解析器的人会是最好的在我的情况下Perl。我做了很多阅读,并尝试了 XML :: LibXML XML :: SAX 。第一次使用了太多的内存,第二个似乎没有那么快对我(即使关闭纯perl解析器)。

I'm wondering which XML parser people thing would be best in my situation for Perl. I've done a lot of reading and have tried the XML::LibXML and XML::SAX. The first used up too much memory and the second didn't appear to be that quick to me (even after switching off the pure perl parser).

我的需求相当具体。我通过 Net :: SSH 库接收到高达50MB的大量响应。我希望将此数据传递到一个XML库,因为我收到它,以保持最小的数据量在内存中。我需要在某些标签中查找数据,并做任何事情,在某些情况下,总和一堆值,在其他情况下,只是提取值,并将它们写入文件或任何。所以我需要一个XML解析器,可以连续工作,工作快,使用最小的内存。我得到的数据是最多1024字节的块,所以我想能够做一些像 $ myparser-> sendData($ mynewData)然后有当新标签打开或关闭时调用的函数类似于 XML :: SAX

My needs are fairly specific. I am receiving a largish response of up to 50MB via the Net::SSH library. I would like to pass this data to an XML library as I receive it so as to keep the minimum amount of data in memory. I need to then look for data in certain tags and do whatever with it, in some cases sum a bunch of values, in other cases just extract values and write them to files or whatever. So I need an XML parser that can work serially, works quick and uses the minimum of memory. The data I get is in chunks of up to 1024 bytes so I would like to be able to just do something like $myparser->sendData($mynewData) and then have functions called when a new tag is opened or closed similar to what XML::SAX does.

我不一定需要XPath或XSLT。

I don't necessarily need XPath or XSLT.

推荐答案

您也可以使用原来的 XML :: Parser ,这几乎是你要求的:

You could also go with plain old XML::Parser, which does pretty much just what you ask for:


这个模块提供了解析XML文档的方法。它建立在XML :: Parser :: Expat之上,它是James Clark的expat库的低级接口,每一次调用其中一个解析方法都会创建一个XML :: Parser :: Expat的新实例,然后使用可以在创建XML :: Parser对象时提供Expat选项,然后在每个语法分析调用中将这些选项传递给Expat对象,它们也可以作为语法分析的额外参数,

"This module provides ways to parse XML documents. It is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library. Each call to one of the parsing methods creates a new instance of XML::Parser::Expat which is then used to parse the document. Expat options may be provided when the XML::Parser object is created. These options are then passed on to the Expat object on each parse call. They can also be given as extra arguments to the parse methods, in which case they override options given at XML::Parser creation time."

Expat是一个基于事件的解析器,当解析器识别文档的一部分时,结束标记为XML元素),那么使用合适的参数调用为该类型的事件注册的任何处理程序。

"Expat is an event based parser. As the parser recognizes parts of the document (say the start or end tag for an XML element), then any handlers registered for that type of an event are called with suitable parameters."

ve用于解析维基百科XML转储,即使在压缩后也有几个GB的大小,并发现它的工作非常好。相比之下,一个50 MB的文件应该是一块蛋糕。

I've used it for parsing Wikipedia XML dumps, which are several GB in size even after compression, and found it to work very well for that. Compared to that, a 50 MB file should be a piece of cake.

这篇关于在perl中连续处理XML数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆