Perl,XML :: Twig,如何读取具有相同标签的字段 [英] Perl, XML::Twig, how to reading field with the same tag

查看:95
本文介绍了Perl,XML :: Twig,如何读取具有相同标签的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理从合作伙伴处收到的XML文件.我对更改此xml文件的组成没有任何影响. XML的摘录为:

I'm working on processing a XML file I receive from a partner. I do not have any influence on changing the makeup of this xml file. An extract of the XML is:

<?xml version="1.0" encoding="UTF-8"?>
<objects>
  <object>
    <id>VW-XJC9</id>
    <name>Name</name>
    <type>House</type>
    <description>
    <![CDATA[<p>some descrioption of the house</p>]]> </description>
    <localcosts>
      <localcost>
        <type>mandatory</type>
        <name>What kind of cost</name>
        <description>
          <![CDATA[Some text again, different than the first tag]]>
        </description>
      </localcost>
    </localcosts>
  </object>
</objects>

我使用Twig的原因是该XML大约11GB,大约100000个不同的对象).问题是当我到达localcosts部分时,这3个字段(类型,名称和描述)被跳过了,可能是因为这些名称以前已经被使用过.

The reason I use Twig is that this XML is about 11GB big, about 100000 different objects) . The problem is when I reach the localcosts part, the 3 fields (type, name and description) are skipped, probably because these names are already used before.

我用于遍历xml文件的代码如下:

The code I use to go through the xml file is as follows:

my $twig= new XML::Twig( twig_handlers => { 
                 id                            => \&get_ID,
                 name                          => \&get_Name,
                 type                          => \&get_Type,
                 description                   => \&get_Description,
                 localcosts                    => \&get_Localcosts
});

$lokaal="c:\\temp\\data3.xml";
getstore($xml, $lokaal);
$twig->parsefile("$lokaal");

sub get_ID          { my( $twig, $data)= @_;  $field[0]=$data->text; $twig->purge; } 
sub get_Name        { my( $twig, $data)= @_;  $field[1]=$data->text; $twig->purge; }
sub get_Type        { my( $twig, $data)= @_;  $field[3]=$data->text; $twig->purge; }
sub get_Description { my( $twig, $data)= @_;  $field[8]=$data->text; $twig->purge; }
sub get_Localcosts{

  my ($t, $item) = @_;

  my @localcosts = $item->children;
  for my $localcost ( @localcosts ) {
    print "$field[0]: $localcost->text\n";
    my @costs = $localcost->children;
    for my $cost (@costs) {
      $Type       =$cost->text if $cost->name eq q{type};
      $Name       =$cost->text if $cost->name eq q{name};
      $Description=$cost->text if $cost->name eq q{description};
      print "Fields: $Type, $Name, $Description\n";
    }
  }
  $t->purge;    
}

当我运行这段代码时,主字段读取没有问题,但是当代码到达"localcosts"部分时,第二个for-next循环不会执行.当我将xml中的字段名称更改为唯一的名称时,此代码可以正常工作.

when I run this code, the main fields are read without issues, but when the code arrives at the 'localcosts' part, the second for-next loop is not executed. When I change the field names in the xml to unique ones, this code works perfectly.

有人可以帮我吗?

谢谢

推荐答案

问题在于,正在同时执行idnametypedescription处理程序 发生.您会发现@fields的内容来自localcost值,因为object值中的数据已被覆盖.

The problem is that the id, name, type and description handlers are being executed for both occurrences. You will find that the contents of the @fields is from the localcost values, as the data from the object values has been overwritten.

此外,在处理localcost元素时,处理程序执行了$ twig-> purge,该操作将从内存中删除数据.因此,当调用localcosts处理程序时,它会发现元素为空

Also, in handling the localcost elements, the handlers have done a $twig->purge, which removes the data from memory. So when the localcosts handler is called it finds the element empty

我认为最简单的方法是编写一个单个处理程序,该处理程序可以一次性处理每个object节点,然后将其清除

I think the easiest way to do this is to write a single handler that processes each object node in one go and then purges it

该程序演示.请注意,我仅使用Data::Dumper,以便在@fields填充后即可看到其内容

This program demonstrates. Note that I have used Data::Dumper only so that you can see the contents of @fields once it has been populated

每个 Perl程序的顶部use strictuse warnings非常重要,特别是在寻求帮助的时候.这是一种简单的措施,可以揭示许多直接的错误,否则您可能会浪费大量时间进行搜索

It is very important that you use strict and use warnings at the top of every Perl program, especially if you are asking for help with it. It is a simple measure that can reveal many straightforward errors that you may otherwise waste a lot of time searching for

还请注意,不建议使用方法的间接对象"形式:您应编写XML::Twig->new(...)而不是new XML::Twig (...).

Note also that the "indirect object" form of method calls is discouraged: you should write XML::Twig->new(...) instead of new XML::Twig (...).

如果您使用单引号而不是双引号,则字符串中的反斜杠不需要加倍,除非它是字符串的最后一个字符.但是,即使您在Windows上使用正斜杠作为路径分隔符,Perl也会很高兴

And if you use single quotes instead of double quotes then a backslash inside a string doesn't need to be doubled-up unless it is the last character of the string. But Perl is quite happy if you use forward slashes as a path separator, even on Windows

我希望这对您有帮助

use strict;
use warnings;

use XML::Twig;
use Data::Dumper;
$Data::Dumper::Useqq = 1;

my $twig= XML::Twig->new( twig_handlers => { object => \&get_Object });

my $lokaal = 'c:\temp\data3.xml';

my @fields;
$twig->parsefile($lokaal);


sub get_Object {

  my ($twig, $object) = @_;

  $fields[0] = $object->findvalue('id');
  $fields[1] = $object->findvalue('name');
  $fields[3] = $object->findvalue('type');
  $fields[8] = $object->findvalue('description');

  print Dumper \@fields;

  my @localcosts = $object->findnodes('localcosts/localcost');

  for my $localcost (@localcosts) {

    my $type        = $localcost->findvalue('type');
    my $name        = $localcost->findvalue('name');
    my $description = $localcost->findvalue('description');

    print "$type, $name, $description\n";
  }

  $twig->purge;    
}

输出

$VAR1 = [
          "VW-XJC9",
          "Name",
          undef,
          "House",
          undef,
          undef,
          undef,
          undef,
          "<p>some descrioption of the house</p> "
        ];
mandatory, What kind of cost, Some text again, different than the first tag

这篇关于Perl,XML :: Twig,如何读取具有相同标签的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆