Perl + SAX2 =慢? [英] Perl + SAX2 = slow?

查看:67
本文介绍了Perl + SAX2 =慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问候属于XML民谣。


我刚刚开始在Perl中制作SAX过滤器。我希望以这种方式构建一个XML模板引擎,但是

XML :: SAX :: Expat和XML :: SAX :: Writer *的性能出现了*不可思议的糟糕。


此代码:


XML :: SAX :: Expat-> new(Handler => XML: :SAX :: Writer-> new(输出=>''> - ''

)) - > parse_uri(" test.xml");


需要1秒才能在我的机器上解析5千字节的XML。

500Mhz,即每千兆赫兹10千字节。


这是否正常?我希望能够以这么快的速度处理XML大约一百倍,或者每千兆赫每秒1mb,或者每消耗一个字节大约一千个时钟周期。我认为

听起来很合理,因为在像Perl这样的zippy
语言中只能解析/写入XML ...所以我不得不假设我正在做一些非常的事情。 >
我的设置非常错误:(


所以我以某种方式获取PurePerl解析器而不是Expat?我是

要求按名称扩展并且死($ parser)给了我

" XML :: SAX :: Expat = HASH(0x83b0090)"。


此外,我简直无法相信这些结果是典型的.SAX是为了处理多兆字节的文件而发明的,DOM不能适合内存中的b $ b,但是按照这些速率这意味着需要花费20分钟才能解析一个100兆字节的XML

文件,然后将其写回磁盘。这将是什么时候发生的事情。 />
你想插入一个任何优点的管道吗?我不确定如何快速DOM,但大型服务器可以有3千兆字节的RAM(100mb * 30x)

表示dom内存膨胀),我知道我的网络浏览器读取XHTML和

以超过10千兆/千兆赫秒的速度构建DOM树。

所以我的结果必须,必须某种程度上是有缺陷的。


有谁知道可能出现什么问题,或者代码有多快?
片段应该解析XML?如果我在等待,那么只需要转换符合特定条件的嵌套XML节点(自定义标签

名称,属性名称,甚至可能只是一个自定义命名空间) ,排序

就像一个模板引擎(用数据替换假模板数据

从数据库中提取,或者今天的日期)这意味着那里

是一个比SAX更有效的XML解决方案吗?可能是Twig,

还是Essex? (关于埃塞克斯,我找不到太多内容)


我希望成为一名XML传播者,因为我喜欢所有关于它的东西(甚至是b $ b)理解名称空间和编码;)但是这些结果让我觉得我的泡沫破灭了。我认识的每个人

一直在用Regex来骚扰他们的XML项目,这对我来说似乎就像用一匹马匹拖着一辆不起作用的汽车一样。


任何见解都将受到赞赏。


- - Jesse Thompson

Lightsecond Technologies
http://www.lightsecond.com/

Greetings fell XML folk.

I''ve just gotten started making SAX filters in Perl. I was hoping to
build an XML templating engine this way, but the performance of
XML::SAX::Expat and XML::SAX::Writer *appear* to be unthinkably bad.

This code:

XML::SAX::Expat->new(Handler => XML::SAX::Writer->new( Output => ''>-''
))->parse_uri("test.xml");

takes 1 second to parse a 5 kilobyte peice of XML on my machine. Being
a 500Mhz, that''s 10kilobytes per gigahertz second.

Is this in any way normal? I was hoping to be able to process XML
about a hundred times this fast, maybe 1mb per gigahertz second, or
about a thousand clock cycles per byte of consumed XML. I think that
sounds reasonable for the bare parsing/writing of XML in a zippy
language like Perl.. so I have to assume I am doing something very
very wrong in my setup :(

So am I somehow getting the PurePerl parser instead of Expat? I''m
asking for Expat by name and die($parser) gives me
"XML::SAX::Expat=HASH(0x83b0090)".

Further, I simply cannot believe these results are typical. SAX was
invented to handle multi-megabyte documents that DOM can''t fit
in-memory, but at these rates that would mean it would take a dual
4Ghz Xeon server twenty minutes just to parse a 100 megabyte XML
document and write it back out to disk unaltered. What happens when
you want to plug in a pipeline of any merit? I''m not really sure how
fast DOM is, but big servers can have 3 gigabytes of ram (100mb * 30x
for dom memory bloat), and I know my web browser reads XHTML and
builds DOM trees out of it at better than 10 kb per gigahertz second..
So my results must, must be flawed somehow.

Does anyone know what could be going wrong, or how fast that code
snippet should be parsing XML? If I''m waiting around and then only
transforming nested XML nodes that match certain criterion (custom tag
names, attribute names, or maybe even just a custom namespace), sort
of like a templating engine (replace fake template data with data
pulled from a DB, or todays date for instance) would that mean there
is an XML solution more efficient for my goals than SAX? Twig maybe,
or Essex? (I can''t find much to read about Essex)

I was hoping to become an XML evangelist because I love everything
about it (and even understand namespaces and encodings ;) but these
results kind of made it feel like my bubble had burst. Everyone I know
keeps molesting their XML projects with Regex, which just seems to me
so much like towing a inoperative car around with a team of horses.

Any insight will be appreciated.

- - Jesse Thompson
Lightsecond Technologies
http://www.lightsecond.com/

推荐答案

parser)给了我

" XML :: SAX :: Expat = HASH(0x83b0090)"。


此外,我根本无法相信这些结果是典型的。 SAX是为了处理多兆字节文件而发明的,这些文件是DOM不能适合内存中的b $ b,但是在这些速率下,这意味着它需要一个双重的/>
4Ghz Xeon服务器只需要解析一个100兆字节的XML

文件并将其写回磁盘不变。

你想插入任何优点的管道会发生什么?我不太确定快速DOM是怎么回事,但大型服务器可以有3千兆字节的RAM(100mb * 30x

用于dom内存膨胀),我知道我的网页浏览器读取XHTML并且

以超过每千兆赫兹10 kb的速度构建DOM树。

因此,我的结果必须以某种方式存在缺陷。 />

有谁知道可能出现什么问题,或者代码

片段应该解析XML的速度有多快?如果我在等待,那么只需要转换符合特定条件的嵌套XML节点(自定义标签

名称,属性名称,甚至可能只是一个自定义命名空间) ,排序

就像一个模板引擎(用数据替换假模板数据

从数据库中提取,或者今天的日期)这意味着那里

是一个比SAX更有效的XML解决方案吗?可能是Twig,

还是Essex? (关于埃塞克斯,我找不到太多内容)


我希望成为一名XML传播者,因为我喜欢所有关于它的东西(甚至是b $ b)理解名称空间和编码;)但是这些结果让我觉得我的泡沫破灭了。我认识的每个人

一直在用Regex来骚扰他们的XML项目,这对我来说似乎就像用一匹马匹拖着一辆不起作用的汽车一样。


任何见解都将受到赞赏。


- - Jesse Thompson

Lightsecond Technologies
http://www.lightsecond.com/
parser) gives me
"XML::SAX::Expat=HASH(0x83b0090)".

Further, I simply cannot believe these results are typical. SAX was
invented to handle multi-megabyte documents that DOM can''t fit
in-memory, but at these rates that would mean it would take a dual
4Ghz Xeon server twenty minutes just to parse a 100 megabyte XML
document and write it back out to disk unaltered. What happens when
you want to plug in a pipeline of any merit? I''m not really sure how
fast DOM is, but big servers can have 3 gigabytes of ram (100mb * 30x
for dom memory bloat), and I know my web browser reads XHTML and
builds DOM trees out of it at better than 10 kb per gigahertz second..
So my results must, must be flawed somehow.

Does anyone know what could be going wrong, or how fast that code
snippet should be parsing XML? If I''m waiting around and then only
transforming nested XML nodes that match certain criterion (custom tag
names, attribute names, or maybe even just a custom namespace), sort
of like a templating engine (replace fake template data with data
pulled from a DB, or todays date for instance) would that mean there
is an XML solution more efficient for my goals than SAX? Twig maybe,
or Essex? (I can''t find much to read about Essex)

I was hoping to become an XML evangelist because I love everything
about it (and even understand namespaces and encodings ;) but these
results kind of made it feel like my bubble had burst. Everyone I know
keeps molesting their XML projects with Regex, which just seems to me
so much like towing a inoperative car around with a team of horses.

Any insight will be appreciated.

- - Jesse Thompson
Lightsecond Technologies
http://www.lightsecond.com/


Jesse Thompson写道:
Jesse Thompson wrote:
此外,我简直无法相信这些结果是典型的。 SAX被发明用于处理DOM无法满足的多兆字节文件


使用SAX读取几千兆字节长度的XML文件

-implementation(Expat)花了我一些

分钟。我已经测试了这个,同时将

Expat整合到GNU Awk中。

我希望成为一名XML传播者,因为我喜欢它的一切(甚至理解命名空间)和编码;)但这些结果让我觉得我的泡沫破裂了。我认识的每个人都在用Regex来骚扰他们的XML项目,这对我而言似乎就像用马队拖着一辆不起作用的汽车一样。
Further, I simply cannot believe these results are typical. SAX was
invented to handle multi-megabyte documents that DOM can''t fit
Reading XML files of several GigaBytes length
with a SAX-implementation (Expat) took me some
minutes. I have tested this while integrating
Expat into GNU Awk.
I was hoping to become an XML evangelist because I love everything
about it (and even understand namespaces and encodings ;) but these
results kind of made it feel like my bubble had burst. Everyone I know
keeps molesting their XML projects with Regex, which just seems to me
so much like towing a inoperative car around with a team of horses.




我无法帮助你使用Perl,但也许xmlgawk

可以帮到你。这是GNU Awk扩展(实验性)

与Expat。



I cannot help you with Perl, but maybe xmlgawk
can help you. This is GNU Awk extended (experimentally)
with Expat.


JürgenKahrs< Ju *********** **********@vr-web.de>在消息新闻中写道:< 2q ************ @ uni-berlin.de> ...
Jürgen Kahrs <Ju*********************@vr-web.de> wrote in message news:<2q************@uni-berlin.de>...
读取几个GigaBytes长度的XML文件
SAX实施(Expat)花了我一些时间。我已经测试了这个,同时将Expat整合到GNU Awk中。
是的,在1千兆赫兹的机器上几分钟(比如说1千兆字节/分钟每分钟b $ b)几千兆字节就是16Mb / Ghz * s。那就是这么快......我不知道该怎么办自己:一千六百美元比我的结果显示的快b $ b $(10kb / Ghz * s) 。打破了我的b
,你呢? :)

我无法帮助你使用Perl,但也许xmlgawk
可以帮到你。这是GNU Awk扩展(实验性的)
与Expat。
Reading XML files of several GigaBytes length
with a SAX-implementation (Expat) took me some
minutes. I have tested this while integrating
Expat into GNU Awk. Yeah, well several gigabytes in several minutes (lets say 1 gigabyte
per minute) on a 1 Gigahertz machine would be 16Mb/Ghz*s. That is so
fast I wouldn''t know what to do with myself: one thousand six hundred
times faster than my results are showing (10kb/Ghz*s). Break me off a
peice, would you? :)
I cannot help you with Perl, but maybe xmlgawk
can help you. This is GNU Awk extended (experimentally)
with Expat.




有趣的是,听起来我对Awk一无所知..但是如果

它在awk中速度很快它在Perl中也必须快速。我只需要

相信我的结果对于Perl :: SAX来说是非典型的。


- - Jesse



As interesting as that sounds I don''t know anything about Awk.. But if
it''s fast in awk it must also be fast in Perl. I simply have to
believe my results are atypical for Perl::SAX.

- - Jesse


这篇关于Perl + SAX2 =慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆