清除子处理程序中的XML Twig [英] Purge XML Twig inside sub handler

查看:82
本文介绍了清除子处理程序中的XML Twig的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 XML ::解析大型XML文件(超过60GB): Twig ,并在OO(驼鹿)脚本中使用它.我正在使用twig_handlers选项来将元素读入内存后对其进行解析.但是,我不确定如何处理Element和Twig.

I am parsing large XML files (60GB+) with XML::Twig and using it in a OO (Moose) script. I am using the twig_handlers option to parse elements as soon as they're read into memory. However, I'm not sure how I can deal with the Element and Twig.

在我使用Moose(和OO)之前,我的脚本如下(并且可以正常工作):

Before I used Moose (and OO altogether), my script looked as follows (and worked):

my $twig = XML::Twig->new(
  twig_handlers => {
    $outer_tag => \&_process_tree,
  }
);
$twig->parsefile($input_file);


sub _process_tree {
  my ($fulltwig, $twig) = @_;

  $twig->cut;
  $fulltwig->purge;
  # Do stuff with twig
}

现在我要这样做.

my $twig = XML::Twig->new(
  twig_handlers => {
    $self->outer_tag => sub {
      $self->_process_tree($_);
    }
  }
);
$twig->parsefile($self->input_file);

sub _process_tree {
  my ($self, $twig) = @_;

  $twig->cut;
  # Do stuff with twig
  # But now the 'full twig' is not purged
}

问题是,现在我发现我缺少对fulltwig的清除.我认为-在第一个非OO版本中-清除将有助于节省内存:我会尽快摆脱掉完整的枝条.但是,当使用OO(并且必须依赖处理程序中的显式sub{})时,我看不到如何清除整个树枝,因为文档中说

The thing is that I now see that I am missing the purging of the fulltwig. I figured that - in the first, non-OO version - purging would help on saving memory: getting rid of the fulltwig as soon as I can. However, when using OO (and having to rely on an explicit sub{} inside the handler) I don't see how I can purge the full twig because the documentation says that

$ _也设置为元素,因此很容易编写内联处理程序 喜欢

$_ is also set to the element, so it is easy to write inline handlers like

para => sub { $_->set_tag( 'p'); }

因此,他们谈论的是您要处理的元素,而不是完整树枝本身.那么,如果不传递给子例程,该如何删除呢?

So they talk about the Element you want to process, but not the fulltwig itself. So how can I delete that if it is not passed to the subroutine?

推荐答案

处理程序仍会获得完整的树枝,您只是不使用它(而是使用$ _).

The handler still gets the full twig, you're just not using it (using $_ instead).

事实证明,您仍然可以在树枝上调用purge(我通常在文档中称其为元素"或elt):$_->purge将按预期工作,将整个树枝清除到$ _中的当前元素;

As it turns out you can still call purge on the twig (which I usually call "element", or elt in the docs): $_->purge will work as expected, purging the full twig up to the current element in $_;

一种更简洁的(IMHO)方法是实际获取所有参数并显式清除整个树枝:

A cleaner (IMHO) way would be to actually get all of the parameters and purge the full twig expicitely:

my $twig = XML::Twig->new(
  twig_handlers => {
    $self->outer_tag => sub {
      $self->_process_tree(@_); # pass _all_ of the arguments
    }
  }
);
$twig->parsefile($self->input_file);

sub _process_tree {
  my ($self, $full_twig, $twig) = @_; # now you see them!

  $twig->cut;
  # Do stuff with twig
  $full_twig->purge;  # now you don't
}

这篇关于清除子处理程序中的XML Twig的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆