lisp是DOM解析比赛的赢家! 8] [英] lisp is winner in DOM parsing contest! 8-]

查看:47
本文介绍了lisp是DOM解析比赛的赢家! 8]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好!


i有3MB长XML文档,大约有150000行(我认为它有大约

200000元素)我想要解析到DOM可以使用。

首先我认为不会有任何问题,但有..

首先我尝试了Python ..有特殊兴趣小组希望制作/ b $ b,Python成为XML处理的主要语言所以我想在那里

这个文件没有问题..

i使用xml.dom.minidom解析它..吃了400兆的RAM后我杀了

it - 我不想要这样的处理..我认为这是因为那个胖的b / b $ b类强制 - 可能Python每个都有一些重要的开销

对象实例,或类似的东西..


然后我asdf安装的s-xml包并尝试使用它。对于lxml表示,它只吃了25美元b $ b megs。我认为实习元素名称帮了很多..

CLISP里面有unicode,所以我觉得它可能更少

没有unicode ..


然后我尝试了C ++ - TinyXML。它速度很快,但吃了65 megs ..你看起来好像

实习帮助了很多8-]


然后我尝试了Perl XML :: DOM。它比python好 - 大约180megs,但是

它是最慢的......至少它消耗的内存比python 8更慢......


和java。使用默认解析器它需要45mbs ..也许它实际上是字符串,

但是有来自类的开销 - 存储树绝对是为了8-优化的'b $ b lisp' ]


所以lisp是赢家..但它没有标准的方式(即使没有非标准但是简单的

)编写二进制IEEE浮点表示的方式,如此常见

lisp吮吸,我会用c ++来完成我的任务.. 8-]]]


最好的问候,Alex''ikeler_storm'' Mizrahi。

Hello, All!

i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.
first i thought there will be no problems, but there were..
first i tried Python.. there''s special interest group that wants to "make
Python become the premier language for XML processing" so i thought there
will be no problems with this document..
i used xml.dom.minidom to parse it.. after it ate 400 meg of RAM i killed
it - i don''t want such processing.. i think this is because of that fat
class impementation - possibly Python had some significant overhead for each
object instance, or something like this..

then i asdf-installed s-xml package and tried it with it. it ate only 25
megs for lxml representation. i think interning element names helped a lot..
it was CLISP that has unicode inside, so i think it could be even less
without unicode..

then i tried C++ - TinyXML. it was fast, but ate 65 megs.. ye, looks like
interning helps a lot 8-]

then i tried Perl XML::DOM.. it was better than python - about 180megs, but
it was slowest.. at least it consumed mem slower than python 8-]

and java.. with default parser it took 45mbs.. maybe it interned strings,
but there was overhead from classes - storing trees is definitely what''s
lisp optimized for 8-]

so lisp is winner.. but it has not standard way (even no non-standard but
simple) way to write binary IEEE floating point representation, so common
lisp suck and i will use c++ for my task.. 8-]]]

With best regards, Alex ''killer_storm'' Mizrahi.

推荐答案

Alex Mizrahi写道:
Alex Mizrahi wrote:
我有3MB长的XML文档,大约有150000行(我认为它有大约200000个元素)我想解析DOM来处理它。
首先我认为不会有任何问题,但有...... ..首先我尝试了Python ..有特殊兴趣小组想要 ; make
Python成为XML处理的首要语言所以我认为这个文件没有问题..
我用xml.dom.minidom来解析它...吃了400兆的内存之后我杀了它 - 我不喜欢它我不想要这样的处理..我认为这是因为那个胖的类命令 - 可能Python对每个
对象实例都有一些显着的开销,或类似的东西......
i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.
first i thought there will be no problems, but there were..
first i tried Python.. there''s special interest group that wants to "make
Python become the premier language for XML processing" so i thought there
will be no problems with this document..
i used xml.dom.minidom to parse it.. after it ate 400 meg of RAM i killed
it - i don''t want such processing.. i think this is because of that fat
class impementation - possibly Python had some significant overhead for each
object instance, or something like this..




您是否尝试过ElementTree?

http://effbot.org/zone/element-index.htm


HTH,


-

Hans Nowak(ha**@zephyrfalcon.org)
http://zephyrfalcon.org/



Have you tried ElementTree?

http://effbot.org/zone/element-index.htm

HTH,

--
Hans Nowak (ha**@zephyrfalcon.org)
http://zephyrfalcon.org/


(消息(Hello''Hans)

(你:写道:''(太阳,2004年7月11日21:32:11 -0400))


(message (Hello ''Hans)
(you :wrote :on ''(Sun, 11 Jul 2004 21:32:11 -0400))
(
i拥有3mb长的XML文档有大约150000行(我认为它有大约200000个元素)我想解析到DOM工作
首先我认为没有问题,但有..
首先我尝试使用Python ..有一个特殊的兴趣小组想要使Python成为XML处理的主要语言。所以我想这个文件没有任何问题..
我用xml.dom.minidom来解析它..吃了400兆的RAM之后我就把它杀了 - 我不知道了我不想要这样的处理..我认为这是因为胖类强制 - 可能Python对每个对象实例都有一些重要的开销,或类似的东西......
i have 3mb long XML document with about 150000 lines (i think it has
about 200000 elements there) which i want to parse to DOM to work
with.
first i thought there will be no problems, but there were..
first i tried Python.. there''s special interest group that wants to
"make
Python become the premier language for XML processing" so i thought
there will be no problems with this document..
i used xml.dom.minidom to parse it.. after it ate 400 meg of RAM i
killed it - i don''t want such processing.. i think this is because of
that fat class impementation - possibly Python had some significant
overhead for each object instance, or something like this..




HN>你有没有尝试过ElementTree?


没有..

只是尝试过它 - 它吃了133 megs并解析了很长时间,但是

它的工作原理..

我会考虑使用它,因为在c ++中处理xml似乎很痛苦

ass ..



(最好的问候''(Alex Mizrahi):又名''killer_storm)

(prin1" Jane约会Lisp程序员" )



HN> Have you tried ElementTree?

no..
just tried it - it eats 133 megs and parses for quite a long time, however
it works..
i''ll consider using it because processing xml in c++ appears to be pain in
ass..

)
(With-best-regards ''(Alex Mizrahi) :aka ''killer_storm)
(prin1 "Jane dates only Lisp programmers"))


Alex Mizrahi写道:
Alex Mizrahi wrote:
我有3MB长XML文档,大约有150000行(我认为它有关于
200000个元素)我想解析到DOM来处理。
i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.




通常,使用
$ b来降低性能问题$ b错误的算法,或使用错误的架构来解决手头上的

问题。


你绝对肯定使用完整的内存中的DOM吗? />
表示最适合您的问题?看来

对我来说真的不太可能......


例如,有些方法可以在
$ b中读取$ b文件递增(我不只是在这里谈SAX),

而不是一次读完整个东西。


我发现你的分析相当简单,总的来说......


-Peter



Often, problems with performance come down the using the
wrong algorithm, or using the wrong architecture for the
problem at hand.

Are you absolutely certain that using a full in-memory DOM
representation is the best for your problem? It seems
very unlikely to me that it really is...

For example, there are approaches which can read in the
document incrementally (and I''m not just talking SAX here),
rather than read the whole thing at once.

I found your analysis fairly simplistic, on the whole...

-Peter


这篇关于lisp是DOM解析比赛的赢家! 8]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆