使用SAX在Perl中解析XML [英] Parsing XML in Perl using SAX

查看:147
本文介绍了使用SAX在Perl中解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用SAX将一个XML文件解析成Perl,用于执行以下电子邮件验证检查。




  • 如果Id只包含字母数字字符,长度介于5到10之间

  • 如果LastLoginDate不早于CreationDate

  • 如果SubscriptionMontlyFee = 0&& 'SubscriptionType'!= free

  • 如果'PaymentMode'未定义&& 'SubscriptionType'!= free

  • 如果Provision< 0

  • 内部邮件是否存在

  • 外部邮件是否存在

  • 如果InternalMail = External Mail



否则,返回一个提醒(打印一个消息通知)。



accounts.xml

 <?xml version =1.0encoding =utf-8?> 
< Accounts locale =en_US>
<帐户>
< Id> abcd< / Id>
< OwnerLastName> asd< / OwnerLastName>
< OwnerFirstName> zxc< / OwnerFirstName>
< Locked> false< / Locked>
<数据库>邮件< / Database>
<客户>邮件< /客户>
< CreationDate year =2011month =8month-name =fevrierday-of-month =19hour-of-day =15minute =23 = dimanche/>
< LastLoginDate year =2015month =04month-name =avrilday-of-month =22hour-of-day =11minute =13 = macredi/>
< LoginsCount> 10405< / LoginsCount>
< Locale> nl< / Locale>
< Country> NL< / Country>
< SubscriptionType> free< / SubscriptionType>
< ActiveSubscriptionType> free< / ActiveSubscriptionType>
< SubscriptionExpiration year =1980month =1month-name =janvierday-of-month =1hour-of-day =0minute =0 = jeudi/>
< SubscriptionMonthlyFee> 0< / SubscriptionMonthlyFee>
< PaymentMode>未定义< / PaymentMode>
< Provision> 0< / Provision>
< InternalMail> asdf@asdf.com< / InternalMail>
< ExternalMail> fdsa@zxczxc.com< / ExternalMail>
< GroupMemberships>
<组> werkgroep X.Y.Z.< / Group>
< / GroupMemberships>
< SynchroCount> 6< / SynchroCount>
< LastSynchroDate year =2003month =12month-name =decembreday-of-month =5hour-of-day =12minute =48 = 狂欢/>
< HasActiveSync> false< / HasActiveSync>
<公司/>
< / Account>
<帐户>
< Id> mnbv< / Id>
< OwnerLastName> cvbb< / OwnerLastName>
< OwnerFirstName> bvcc< / OwnerFirstName>
< Locked> true< / Locked>
<数据库>邮件< / Database>
<客户>邮件< /客户>
< CreationDate year =2012month =10month-name =octobreday-of-month =10hour-of-day =10minute =18 = jeudi/>
< LastLoginDate />
< LoginsCount> 0< / LoginsCount>
< Locale> fr< / Locale>
< Country> BE< / Country>
< SubscriptionType> free< / SubscriptionType>
< ActiveSubscriptionType> free< / ActiveSubscriptionType>
< SubscriptionExpiration year =1970month =1month-name =janvierday-of-month =1hour-of-day =1minute =0 = jeudi/>
< SubscriptionMonthlyFee> 0< / SubscriptionMonthlyFee>
< PaymentMode>未定义< / PaymentMode>
< Provision> 0< / Provision>
< InternalMail />
< ExternalMail> qweqwe@qwe.com< / ExternalMail>
< GroupMemberships />
< SynchroCount> 0< / SynchroCount>
< LastSynchroDate year =1970month =1month-name =janvierday-of-month =1hour-of-day =1minute =0 = jeudi/>
< HasActiveSync> false< / HasActiveSync>
<公司/>
< / Account>
< / Accounts>

我尝试过几次失败的尝试(如下),并将非常感谢您在这方面的帮助。 / p>

尝试进行解析(但无法从内部哈希中检索值)。

 使用警告; 
使用strict;
使用XML :: SAX;
my $ parser = XML :: SAX :: ParserFactory-> parser(Handler => MySAXHandler-> new);
$ parser-> parse_uri(accounts.xml);

包MySAXHandler;
使用基础qw(XML :: SAX :: Base);

sub start_element {
my($ self,$ el)= @_;

我的$ ElementName = $ el-> {Name};
my $ attr =%{$ el-> {Attributes}};
my $ attr_value =%{$ el-> {Attributes} - > {'LocalName'}};

打印我的$ loginID,\\\
;
print $ ElementName,\\\
;
print $ attr_value,\\\
;
}

执行验证检查。

  printCurrently looking into(标量@account)。elements; 

#检查如果登录只包含字母数字字符并且具有可接受的长度
打印ALERT - ID包含无效字符,除非($ login =〜m / [a-zA-Z @。] + $ /);
#或打印ALERT - ID包含无效字符,除非($ accountRef-> {Login} =〜/ ^ [a-zA-Z] $ /);
打印ALERT - ID长度大于8个字符除非(length。$ account [0]> 20);

#print可疑ALERT - 帐户创建和登录时间相同,除非($ account [4]!= $ account [5]);
print可疑ALERT - 上次登录是在帐户创建之前,除非($ account [5]> $ account [4]);
打印可疑ALERT - 不正确的登录计数,除非($ account [6]> 0&  $ account [5]> $ account [4]);

#Checking if Subscription Type&活动订阅类型是相同的 - DISCARDED
#printALERT - 首选订阅和当前订阅类型不一样,除非($ account [9] eq $ account [10]);

#如果订阅费用与订阅类型
匹配,则检查($ account [9] eqfree&& account [12] = 0){
#print 用户订阅是免费订阅,不收费
return 0;
} elsif((account [9] eqlight|| account [9] eqregular|| account [9] eqadvanced)&& account [12]< = 0) {
printALERT - 用户订阅是account [9],他/她没有收费;
} else {
#print用户订阅是$ account [9],他/她正在收费帐户[12];
return 0;
}

#检查付款模式是否未定义,订阅类型不可用
如果($ account [9] nefree&& account [13 ] eq'undefined'){printALERT - 没有定义付款模式,订阅类型不是免费的}};

#检查条件是否小于0
printALERT - 用户余额为负数,除非($ account [14]> = 0);

#检查内部电子邮件是否存在
打印ALERT - 用户没有内部电子邮件地址,除非($ account [15]!=);

#检查外部电子邮件是否存在
打印ALERT - 用户没有外部电子邮件地址,除非($ account [16]!=);

#检查外部电子邮件是否存在
打印ALERT - 用户没有外部和内部电子邮件地址相同,除非($ account [15] ne $ account [ 16]);

}


解决方案

XML :: LibXML :: Reader 提供了类似SAX的界面,但您可以将元素加注需要时,完整的 XML :: LibXML 对象。前两个条件将被测试如下:

 #!/ usr / bin / perl 
使用警告;
使用strict;

使用XML :: LibXML :: Reader;

我的$ r ='XML :: LibXML :: Reader' - > new(location =>'file.xml')或死亡;
while($ r-> nextElement('Account')){
my $ xml = $ r-> copyCurrentNode(1);

我的$ id = $ xml-> findvalue('Id');
if($ id!〜/ ^ [[:alnum:]] + $ / || 5> length $ id || 10< length $ id){
printInvalid Id:$ id.\\\
;
next
}

我的@dates = map $ xml-> findnodes($ _),qw(CreationDate LastLoginDate);
my @date_strings = map sprintf('%4d%02d%02d%02d%02d',
@ $ _ {qw {year month day-of-month hour-of-day minute}}) ,
@dates;
if($ date_strings [0] gt $ date_strings [1]){
print$ id:@ date_strings.\\\
的无效日期;
}

...
}

请注意,robynsa的 LastLoginDate 为空,因此无法与 CreationDate 进行比较。


I need to parse an XML file into Perl using SAX - for performing the following email validation checks.

  • If the 'Id' contains only alphanumeric characters and its length is between 5 and 10
  • If the 'LastLoginDate' is not older than 'CreationDate'
  • If 'SubscriptionMontlyFee' = 0 && 'SubscriptionType'!= free
  • If 'PaymentMode' is undefined && 'SubscriptionType'!= free
  • If Provision < 0
  • Internal Mail exists or not
  • External Mail exists or not
  • If InternalMail = External Mail

Otherwise, return an alert (print a message to notify).

accounts.xml

<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
  <Account>
    <Id>abcd</Id>
    <OwnerLastName>asd</OwnerLastName>
    <OwnerFirstName>zxc</OwnerFirstName>
    <Locked>false</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
    <LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
    <LoginsCount>10405</LoginsCount>
    <Locale>nl</Locale>
    <Country>NL</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail>asdf@asdf.com</InternalMail>
    <ExternalMail>fdsa@zxczxc.com</ExternalMail>
    <GroupMemberships>
      <Group>werkgroep X.Y.Z.</Group>
    </GroupMemberships>
    <SynchroCount>6</SynchroCount>
    <LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
  <Account>
    <Id>mnbv</Id>
    <OwnerLastName>cvbb</OwnerLastName>
    <OwnerFirstName>bvcc</OwnerFirstName>
    <Locked>true</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
    <LastLoginDate/>
    <LoginsCount>0</LoginsCount>
    <Locale>fr</Locale>
    <Country>BE</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail/>
    <ExternalMail>qweqwe@qwe.com</ExternalMail>
    <GroupMemberships/>
    <SynchroCount>0</SynchroCount>
    <LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
</Accounts>

I have tried several unsuccessful attempts (follows) - and will highly appreciate your help in this regard.

Attempt for doing the parsing (but unable to retrieve values from the inner-hash).

use warnings;
use strict;
use XML::SAX;
my $parser = XML::SAX::ParserFactory->parser(Handler => MySAXHandler->new);
$parser->parse_uri("accounts.xml");

package MySAXHandler;
use base qw(XML::SAX::Base);

  sub start_element {
    my ($self, $el) = @_;

    my $ElementName = $el->{Name};
    my $attr = %{$el->{Attributes}};
    my $attr_value = %{$el->{Attributes}->{'LocalName'}};

    print my $loginID, "\n";      
    print $ElementName, "\n";
    print $attr_value, "\n";
  }

For performing the validation checks.

    print "Currently looking into ".(scalar @account)."elements";

    #Checking If Login only includes Alphanumeric characters and has acceptable length
    print "ALERT - ID contains invalid characters" unless ($login =~ m/[a-zA-Z@.]+$/);
    # Or print "ALERT - ID contains invalid characters" unless ($accountRef->{"Login"} =~ /^[a-zA-Z]$/);
    print "ALERT - ID length is greater than 8 characters" unless (length.$account[0] > 20);

    #print "Suspicious ALERT - Account Creation and Login time is same" unless ($account[4] != $account[5]);
    print "Suspicious ALERT - Last Login was before the account creation" unless ($account[5] > $account[4]);
    print "Suspicious ALERT - Incorrect Login Counts" unless ($account[6] > 0 && $account[5] > $account[4]);

    #Checking if Subscription Type & Active Subscription Type is same - DISCARDED
    #print "ALERT - Preferred Subscription & Current Subscription Type is not same" unless ($account[9] eq $account[10]);

    #Checking if Subscription Fee matches the Subscription Type
    if( $account[9] eq "free" && account[12] = 0) {
        #print "The user subscription is on free subscription and there are no charges" 
        return 0;
    } elsif((account[9] eq "light" || account[9] eq "regular" || account[9] eq "advanced") && account[12] <= 0) {
        print "ALERT - The user subscription is" account[9] "and he/she is not getting charged";
    } else {
        #print "The user subscription is " $account[9] "and he/she is getting charged" account[12];
        return 0;
    }

    #Checking if the Payment Mode is undefined and the subscription type is not free
    if($account[9] ne "free" && account[13] eq 'undefined') {print "ALERT - Payment mode is not being defined and the subscription type is not free"};

    #Checking if Provision is less than 0
    print "ALERT - The user balance is in negative" unless ($account[14] >= 0 );

    #Checking if Internal Email Exists or not
    print "ALERT - The user doesn't have an internal email address" unless ($account[15] != "" );

    #Checking if External Email Exists or not
    print "ALERT - The user doesn't have an external email address" unless ($account[16] != "" );

    #Checking if External Email Exists or not
    print "ALERT - The user doesn't have an external and internal email addresses are same" unless ($account[15] ne $account[16]);

    }

解决方案

XML::LibXML::Reader provides a SAX-like interface, but you can inflate the element into a full XML::LibXML object when needed. The first two conditions would be tested as follows:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML::Reader;

my $r = 'XML::LibXML::Reader'->new(location => 'file.xml') or die;
while ($r->nextElement('Account')) {
    my $xml = $r->copyCurrentNode(1);

    my $id = $xml->findvalue('Id');
    if ($id !~ /^[[:alnum:]]+$/ || 5 > length $id || 10 < length $id) {
        print "Invalid Id: $id.\n";
        next
    }

    my @dates = map $xml->findnodes($_), qw( CreationDate LastLoginDate );
    my @date_strings = map sprintf('%4d%02d%02d%02d%02d',
                                   @$_{qw{ year month day-of-month hour-of-day minute }}),
                           @dates;
    if ($date_strings[0] gt $date_strings[1]) {
        print "Invalid dates for $id: @date_strings.\n";
    }

    ...
}

Note that LastLoginDate for "robynsa" is empty, so it can't be compared to CreationDate.

这篇关于使用SAX在Perl中解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆