清除子处理程序中的XML Twig [英] Purge XML Twig inside sub handler
问题描述
我正在使用 XML ::解析大型XML文件(超过60GB): Twig ,并在OO(驼鹿)脚本中使用它.我正在使用twig_handlers
选项来将元素读入内存后对其进行解析.但是,我不确定如何处理Element和Twig.
I am parsing large XML files (60GB+) with XML::Twig and using it in a OO (Moose) script. I am using the twig_handlers
option to parse elements as soon as they're read into memory. However, I'm not sure how I can deal with the Element and Twig.
在我使用Moose(和OO)之前,我的脚本如下(并且可以正常工作):
Before I used Moose (and OO altogether), my script looked as follows (and worked):
my $twig = XML::Twig->new(
twig_handlers => {
$outer_tag => \&_process_tree,
}
);
$twig->parsefile($input_file);
sub _process_tree {
my ($fulltwig, $twig) = @_;
$twig->cut;
$fulltwig->purge;
# Do stuff with twig
}
现在我要这样做.
my $twig = XML::Twig->new(
twig_handlers => {
$self->outer_tag => sub {
$self->_process_tree($_);
}
}
);
$twig->parsefile($self->input_file);
sub _process_tree {
my ($self, $twig) = @_;
$twig->cut;
# Do stuff with twig
# But now the 'full twig' is not purged
}
问题是,现在我发现我缺少对fulltwig
的清除.我认为-在第一个非OO版本中-清除将有助于节省内存:我会尽快摆脱掉完整的枝条.但是,当使用OO(并且必须依赖处理程序中的显式sub{}
)时,我看不到如何清除整个树枝,因为文档中说
The thing is that I now see that I am missing the purging of the fulltwig
. I figured that - in the first, non-OO version - purging would help on saving memory: getting rid of the fulltwig as soon as I can. However, when using OO (and having to rely on an explicit sub{}
inside the handler) I don't see how I can purge the full twig because the documentation says that
$ _也设置为元素,因此很容易编写内联处理程序 喜欢
$_ is also set to the element, so it is easy to write inline handlers like
para => sub { $_->set_tag( 'p'); }
因此,他们谈论的是您要处理的元素,而不是完整树枝本身.那么,如果不传递给子例程,该如何删除呢?
So they talk about the Element you want to process, but not the fulltwig itself. So how can I delete that if it is not passed to the subroutine?
推荐答案
处理程序仍会获得完整的树枝,您只是不使用它(而是使用$ _).
The handler still gets the full twig, you're just not using it (using $_ instead).
事实证明,您仍然可以在树枝上调用purge
(我通常在文档中称其为元素"或elt
):$_->purge
将按预期工作,将整个树枝清除到$ _中的当前元素;
As it turns out you can still call purge
on the twig (which I usually call "element", or elt
in the docs): $_->purge
will work as expected, purging the full twig up to the current element in $_;
一种更简洁的(IMHO)方法是实际获取所有参数并显式清除整个树枝:
A cleaner (IMHO) way would be to actually get all of the parameters and purge the full twig expicitely:
my $twig = XML::Twig->new(
twig_handlers => {
$self->outer_tag => sub {
$self->_process_tree(@_); # pass _all_ of the arguments
}
}
);
$twig->parsefile($self->input_file);
sub _process_tree {
my ($self, $full_twig, $twig) = @_; # now you see them!
$twig->cut;
# Do stuff with twig
$full_twig->purge; # now you don't
}
这篇关于清除子处理程序中的XML Twig的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!