cElementTree清晰的语义 [英] cElementTree clear semantics

查看:74
本文介绍了cElementTree清晰的语义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我想了解cElementTree的清晰工作原理:我有一个

(相对)大的XML文件,我不想加载到内存中。

所以,当然,我尝试过这样的事情:

来自cElementTree的
导入iterparse

for event,elem in iterparse(" data.xml"):

if elem.tag ==" schnappi":

count + = 1

elem.clear()


....导致内存中所有元素的缓存除了

< schnappi> (即该过程的内存占用增长更多

及更多)。然后我虽然清楚地知道我所没有的所有元素,但是b $ b确实需要:

来自cElementTree的
导入iterparse

for event,elem in iterparse(" data.xml"):

if elem.tag ==" schnappi":

count + = 1
elem.clear()


....这给了一个适当的小内存空间,*但是因为

< schnappi>有许多子元素,我订阅了

''结束'' - 事件,< schnappi>在读取所有

子元素并清除()后,返回元素。所以,我确实看到了

< schnappi>元素,但调用它的getiterator()给了我完全

空子元素,这不是我想要的:(


最后,我想跟踪什么时候通过订阅开始和结束元素来清除和什么时候没有通过订阅开始和结束元素(这样我将收集

整个< schnappi> -subtree在内存中而且只是发布it):

来自cElementTree的
导入iterparse

clear_flag = True

为事件,elem在iterparse中(data.xml ;,(开始,结束)):

if event ==" start" and elem.tag ==" schnappi":

#start collect elements

clear_flag = False

if event ==" end" and elem.tag ==" schnappi":

clear_flag = True

#用elem做什么

#除非我们收集元素,清除()

if clear_flag:

elem.clear()


这给了我理想的表现r,但是:


*它看起来非常*丑陋

*它的速度是看到''end''的版本的两倍 - 事件只有。


现在,有*是*更好的方式。我错过了什么?


提前致谢,


ivr

-

...但它是HDTV - 它的分辨率比现实世界更好。

- Fry,当外星人攻击时


Hi,
I am trying to understand how cElementTree''s clear works: I have a
(relatively) large XML file, that I do not wish to load into memory.
So, naturally, I tried something like this:

from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
if elem.tag == "schnappi":
count += 1
elem.clear()

.... which resulted in caching of all elements in memory except for
those named <schnappi> (i.e. the process'' memory footprint grew more
and more). Then I though about clear()''ing all elements that I did not
really need:

from cElementTree import iterparse
for event, elem in iterparse("data.xml"):
if elem.tag == "schnappi":
count += 1
elem.clear()

.... which gave a suitably small memory footprint, *BUT* since
<schnappi> has a number of subelements, and I subscribe to
''end''-events, the <schnappi> element is returned after all of its
subelements have been read and clear()''ed. So, I see indeed a
<schnappi> element, but calling its getiterator() gives me completely
empty subelements, which is not what I wanted :(

Finally, I thought about keeping track of when to clear and when not
to by subscribing to start and end elements (so that I would collect
the entire <schnappi>-subtree in memory and only than release it):

from cElementTree import iterparse
clear_flag = True
for event, elem in iterparse("data.xml", ("start", "end")):
if event == "start" and elem.tag == "schnappi":
# start collecting elements
clear_flag = False
if event == "end" and elem.tag == "schnappi":
clear_flag = True
# do something with elem
# unless we are collecting elements, clear()
if clear_flag:
elem.clear()

This gave me the desired behaviour, but:

* It looks *very* ugly
* It''s twice as slow as version which sees ''end''-events only.

Now, there *has* to be a better way. What am I missing?

Thanks in advance,

ivr
--
"...but it''s HDTV -- it''s got a better resolution than the real world."
-- Fry, "When aliens attack"

推荐答案

Igor V. Rafienko写道:
Igor V. Rafienko wrote:
这给了我理想的行为,但是:

*它看起来非常*丑陋
*它的速度是看到结束的版本的两倍 - 仅限活动。

现在,*有*更好办法。我缺少什么?
This gave me the desired behaviour, but:

* It looks *very* ugly
* It''s twice as slow as version which sees ''end''-events only.

Now, there *has* to be a better way. What am I missing?




尝试通过电子邮件向作者发送支持。



Try emailing the author for support.


DH写道:
Igor V. Rafienko写道:
Igor V. Rafienko wrote:
这给了我理想的行为,但是:

*它看起来非常*丑陋
*它''比看到结束的版本慢两倍 - 仅限活动。

现在,*有*是更好的方式。我错过了什么?
This gave me the desired behaviour, but:

* It looks *very* ugly
* It''s twice as slow as version which sees ''end''-events only.

Now, there *has* to be a better way. What am I missing?



请尝试通过电子邮件向作者发送支持。



Try emailing the author for support.




我不认为'是需要。他是最活跃的成员之一

of clpy,你应该知道自己。


Reinhold



I don''t think that''s needed. He is one of the most active members
of c.l.py, and you should know that yourself.

Reinhold


Reinhold Birkenfeld写道:
Reinhold Birkenfeld wrote:
DH写道:
D H wrote:
Igor V. Rafienko写道:
Igor V. Rafienko wrote:
这给了我理想的行为,但是:

*它看起来非常*丑陋
*它的速度是看到''end''的版本的两倍 - 仅限事件。

现在,有*是*更好的方式。我错过了什么?
This gave me the desired behaviour, but:

* It looks *very* ugly
* It''s twice as slow as version which sees ''end''-events only.

Now, there *has* to be a better way. What am I missing?



尝试通过电子邮件向作者发送电子邮件以获得支持。



Try emailing the author for support.



我认为不需要。他是clpy中最活跃的成员之一,你应该知道自己。


I don''t think that''s needed. He is one of the most active members
of c.l.py, and you should know that yourself.




我建议你发电子邮件给你的图书馆作者关于该库有一个

的问题。你也应该知道自己。



I would recommend emailing the author of a library when you have a
question about that library. You should know that yourself as well.


这篇关于cElementTree清晰的语义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆