用Perl XMLSimple解析XML文件 [英] Parsing a XML file with Perl XMLSimple

查看:143
本文介绍了用Perl XMLSimple解析XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用以下结构解析 类文件:



编辑:
我试图省略大部分巨大的xml文件来简化所有操作,但是错误地使用了c / p-ed。以下是实际存在此问题的完整文件(900kb!): https:/ /docs.google.com/file/d/0B3ustNI1qZh1UURrYWZJQk0wVlU/edit?usp=sharing

 < CIM CIMVERSION =2.0DTDVERSION =2.0> 

< DECLARATION>
< DECLGROUP>
< LOCALNAMESPACEPATH>
< NAMESPACE NAME =signalingsystem/>
< / LOCALNAMESPACEPATH>

< VALUE.OBJECT>
< INSTANCE CLASSNAME =SharedGtTranslator>
< PROPERTY NAME =NameTYPE =string>
< VALUE> AUC $ 4,1,6,4,26202 * - > AUC RemoteSPC:300 SSN:10< / VALUE>
< / PROPERTY>
< PROPERTY NAME =NatureOfAddressTYPE =sint32>
< VALUE> 4< / VALUE>
< / PROPERTY>
< / INSTANCE>
< /VALUE.OBJECT>

< VALUE.OBJECT>
< INSTANCE CLASSNAME =SharedGtTranslator>
< PROPERTY NAME =NameTYPE =string>
< VALUE> AUC $ 4,2,6,4,26202 * - > AUC AUC LocalSPC:410 SSN:10< / VALUE>
< / PROPERTY>
< PROPERTY NAME =NatureOfAddressTYPE =sint32>
< VALUE> 4< / VALUE>
< / PROPERTY>
< VALUE> 2< / VALUE>
< / PROPERTY>
< / INSTANCE>
< /VALUE.OBJECT>
< / DECLGROUP>

< / DECLARATION>
< / CIM>

我使用XMLSimple来解析该结构。
如果 CLASSNAME =SharedGtTranslator,我需要获取所有PROPERTY的值 NAME =Name >



这就是我想要做的:

 #!/ usr / bin / perl 
use strict;
使用警告;
#使用模块
使用XML :: Simple;
使用Data :: Dumper;

my $ file1 = $ ARGV [0];
#创建对象
my $ xml = new XML :: Simple;

#读取XML文件
my $ data = $ xml-> XMLin($ file1);
foreach my $ object(@ {$ data-> {DECLARATION} - > {DECLGROUP} - > {'VALUE.OBJECT'}}){
if($ object-> { INSTANCE} - > {CLASSNAME} eq'SharedGtTranslator'){
foreach my $ property(@ {$ object-> {INSTANCE} - > {PROPERTY}}){
if($ property - > {NAME} eq'Name'){
print $ property-> {VALUE}。 \\\
;
}
}

}
}



获取


伪哈希已弃用



$
$ b


$ b

非常感谢帮助!

解决方案

div>

你的代码对我来说工作正常。这是完整的程序吗?该代码中没有使用伪哈希。



我能看到的唯一问题是您的XML数据格式不正确。有一个虚假的

 < VALUE> 2< / VALUE> 
< / PROPERTY>

在最后一个 INSTANCE 元素末尾。一旦这个问题得到解决,你的程序就可以正常运行了。

XML :: Simple 似乎对你有用,可能适合坚持下去。但我通常不建议人们使用这个模块。它的工作起来并不简单,它所构建的结构并不能完全反映XML数据,所以像
XML :: Twig

XML :: LibXML
通常要好很多。






更新 使用您的真实数据,由 XML :: Simple 生成的结构看起来与为简短示例生成的结构完全不同。有一些数组与之前没有的哈希混合在一起。

这个程序似乎可以产生你需要的东西。它产生170行输出。

  use strict; 
使用警告;

使用XML :: Simple;

my $ file1 ='active_7v19.om.cim';

my $ xml = new XML :: Simple;
my $ data = $ xml-> XMLin($ file1); (@ {$ data-> {DECLARATION} {DECLGROUP}}){

foreach my $ object(@ {$ declgroup-> { 'VALUE.OBJECT'}}){

my $ instance = $ object-> {INSTANCE};
my $ classname = $ instance-> {CLASSNAME};
my $ properties = $ instance-> {PROPERTY};

next除非$ classname eq'SharedGtTranslator';

for my $ property(@ $ properties){

my $ name = $ property-> {NAME};
my $ value = $ property-> {VALUE};

打印$ value,\\\
if $ name eq'Name';
}
}
}

但是,我更确定现在你可以用一个真正的XML库变得更好。该代码使用 XML :: LibXML 生成相同的输出。

 严格使用; 
使用警告;

使用XML :: LibXML;
$ b $ my $ doc = XML :: LibXML-> load_xml(location => $ file1,no_blanks => 1);
$ b $ my @properties = $ doc-> findnodes('// INSTANCE [@CLASSNAME =SharedGtTranslator] / PROPERTY [@NAME =Name]');

for my $ property(@properties){
print $ property-> textContent('VALUE'),\\\
;
}

所有工作都由XPath表达式完成,该表达式选择所有<$ c具有 NAME 属性的$ c> PROPERTY 元素名称 code> INSTANCE 元素具有 CLASSNAME 属性 SharedGtTranslator 的文档中的任何位置>。随后的 for 循环会在每个 PROPERTY VALUE 元素的值$ C>。它显然更简洁,运行速度也更快,如果您需要提取不同的信息,则更加灵活。


I'm trying to parse a XML-like file with the following structure:

Edit: I tried to omit most of the huge xml file to simplify everything but c/p-ed wrongly. Here's the full file (900kb!) that actually has this issue: https://docs.google.com/file/d/0B3ustNI1qZh1UURrYWZJQk0wVlU/edit?usp=sharing

<CIM CIMVERSION="2.0" DTDVERSION="2.0">

  <DECLARATION>
    <DECLGROUP>
      <LOCALNAMESPACEPATH>
        <NAMESPACE NAME="signalingsystem"/>
      </LOCALNAMESPACEPATH>

      <VALUE.OBJECT>
        <INSTANCE CLASSNAME="SharedGtTranslator">
          <PROPERTY NAME="Name" TYPE="string">
            <VALUE>AUC$4,1,6,4,26202*-->AUC RemoteSPC: 300 SSN: 10</VALUE>
          </PROPERTY>
          <PROPERTY NAME="NatureOfAddress" TYPE="sint32">
            <VALUE>4</VALUE>
          </PROPERTY>
        </INSTANCE>
      </VALUE.OBJECT>

      <VALUE.OBJECT>
        <INSTANCE CLASSNAME="SharedGtTranslator">
          <PROPERTY NAME="Name" TYPE="string">
            <VALUE>AUC$4,2,6,4,26202*-->AUC AUC LocalSPC: 410 SSN: 10</VALUE>
          </PROPERTY>
          <PROPERTY NAME="NatureOfAddress" TYPE="sint32">
            <VALUE>4</VALUE>
          </PROPERTY>
            <VALUE>2</VALUE>
          </PROPERTY>
        </INSTANCE>
      </VALUE.OBJECT>
    </DECLGROUP>

  </DECLARATION>
</CIM>

I'm using XMLSimple to parse that structure. I need to get all the Values for the PROPERTY NAME="Name" if CLASSNAME="SharedGtTranslator".

This is what I'm trying to do:

#!/usr/bin/perl
use strict;
use warnings;
# use module
use XML::Simple;
use Data::Dumper;

my $file1 = $ARGV[0];
# create object
my $xml = new XML::Simple;

# read XML file
my $data = $xml->XMLin($file1);
foreach my $object (@{$data->{DECLARATION}->{DECLGROUP}->{'VALUE.OBJECT'}}) {
        if ($object->{INSTANCE}->{CLASSNAME} eq 'SharedGtTranslator') {
                foreach my $property (@{$object->{INSTANCE}->{PROPERTY}}) {
                        if ($property->{NAME} eq 'Name') {
                                print $property->{VALUE} . "\n";
                        }
                }

        }
}

Getting

"Pseudo-hashes are deprecated"

and nothing happens.

Help is highly appreciated!

解决方案

Your code works fine for me as it stands. Is that the full program? There is no use of pseudo-hashes in that code.

The only problem I can see is that your XML data isn't well-formed. There is a spurious

  <VALUE>2</VALUE>
</PROPERTY>

at the end of the last INSTANCE element. Once this is fixed your program runs fine.

XML::Simple seems to be working for you, so it's probably appropriate to stick with it. But I don't generally recommend that people use this module. It can be far from simple to get working, and the structure it builds doesn't fully reflect the XML data, so something like XML::Twig or XML::LibXML is often much better.


Update

Working with your real data, the structure generated by XML::Simple looks quite unlike what is generated for the short example. There are arrays intermingled with the hashes that weren't there before.

This program seems to generate what you need. It produces 170 lines of output.

use strict;
use warnings;

use XML::Simple;

my $file1 = 'active_7v19.om.cim';

my $xml  = new XML::Simple;
my $data = $xml->XMLin($file1);

for my $declgroup (@{ $data->{DECLARATION}{DECLGROUP} }) {

    foreach my $object (@{ $declgroup->{'VALUE.OBJECT'} }) {

        my $instance   = $object->{INSTANCE};
        my $classname  = $instance->{CLASSNAME};
        my $properties = $instance->{PROPERTY};

        next unless $classname eq 'SharedGtTranslator';

        for my $property (@$properties) {

            my $name  = $property->{NAME};
            my $value = $property->{VALUE};

            print $value, "\n" if $name eq 'Name';
        }
    }
}

However, I am more sure now that you would be better off with a "real" XML library. THis code uses XML::LibXML to produce the same output.

use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml(location => $file1, no_blanks => 1);

my @properties = $doc->findnodes('//INSTANCE[@CLASSNAME = "SharedGtTranslator"]/PROPERTY[@NAME = "Name"]');

for my $property (@properties) {
    print $property->textContent('VALUE'), "\n";
}

All the work is done by the XPath expression, which selects all PROPERTY elements with a NAME attribute of Name that are children of an INSTANCE element anywhere in the document that has a CLASSNAME attribute of SharedGtTranslator. The subsequent for loop prints the value of the VALUE element within each PROPERTY. It is clearly a lot more concise, and it is also faster to run, and more flexible if you need to extract different information.

这篇关于用Perl XMLSimple解析XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆