使用 libXML/XPath 提取和存储 XML 数据 [英] Extracting and Storing XML Data with libXML/XPath

查看:25
本文介绍了使用 libXML/XPath 提取和存储 XML 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

use XML::LibXML;
use Data::Dumper; 

#parsing file
my $dom = XML::LibXML->new->parse_file('sample.xml');

my $context = XML::LibXML::XPathContext->new( $dom->documentElement()  );
$context->registerNs('u', 'http://uniprot.org/uniprot');

#print file to make sure it looks ok
print $dom, "\n";

    #finds shortnames
    my $sn = $context->findnodes('//u:shortName');
    print 'ShortName: '.$sn, "\n";

    #finds dbRefernce ids that are of type EC
    my $ids = $context->findnodes('//u:dbReference[@type="EC"]/@id');   
    my $number =()= $ids =~ /\./gi;
    print 'EC Values: '.$ids, "\n";

    #finds sequences that have a length
    my $seq = $context->findnodes('//u:sequence[@length>1]');
    $seq =~ s/" "/"\n"/;
    print 'Sequence: '.$seq, "\n";

我目前有这个代码,它在这个有 10 个标签的 xml 文件上运行(https://www.dropbox.com/s/dq8ir9f22cnfwrz/Sample.xml).截至目前,它正在提取此 xml 文件中的 10 个条目的短名称、dbReference 和序列,并将它们加在一起进行打印.我想要做什么,它有一个短名称、dbReference 和 xml 文件中每个条目的序列.是否可以让脚本一次为每个条目查找这些数据?我的最终目标是以特定方式格式化它们以进行输出.

I currently have this code, that runs on this xml file that has 10 tags (https://www.dropbox.com/s/dq8ir9f22cnfwrz/Sample.xml). As of now, it is extracting the shortname, dbReference, and sequence of the 10 entries in this xml file and adding them together to print. What I would like to do, it have a shortname, dbReference, and Sequence for each entry in the xml file. Is it possible to have the script look for these data one at a time for each entry? My end goal is to format them in a specific way for output.

我正在考虑让代码在此之前运行,它只会提取条目,然后将它们发送到其余代码以进行数据提取.

I was thinking of having code that runs before this, that will extract only the entries, then send them to the rest of the code for data extraction.

谢谢

推荐答案

你需要查询一个节点集(它返回一个集合):

You need to query for a node-set (which returns a collection):

my @entries = $context->findnodes('//u:entry');

然后,为每个节点运行一个上下文 XPath 表达式 findnodes(expression, context-node),将该节点作为第二个参数传递,例如:

Then, for each node you run a contextual XPath expression findnodes(expression, context-node), passing the node as the second argument, for example:

foreach $entry (@entries) {
    my $entryName  = $context->findnodes('u:name', $entry);
    ...
}

这是使用您的代码的尝试:

Here is an attempt using your code:

use XML::LibXML;
use Data::Dumper; 

#parsing file
my $dom = XML::LibXML->new->parse_file('sample.xml');

my $context = XML::LibXML::XPathContext->new( $dom->documentElement()  );
$context->registerNs('u', 'http://uniprot.org/uniprot');

my @entries = $context->findnodes('//u:entry');
foreach $entry (@entries) {

    my $entryName  = $context->findnodes('u:name', $entry);
    my @shortNames = $context->findnodes('.//u:shortName', $entry);
    my @dbRefs     = $context->findnodes('.//u:dbReference[@type="EC"]/@id', $entry);
    my $sequence   = $context->findnodes('.//u:sequence[@length>1]');

    print "============================================================\n";
    print "\nName: ".$entryName."\n";

    print "\nShort Names: \n";
    $i=0;
    foreach $shortName (@shortNames) {
        print ++$i.': '.$shortName->firstChild, "\n";
    }

    print "\nEC Values: \n";
    $i=0;
    foreach $dbRef (@dbRefs) {
        print ++$i.': '.$dbRef->nodeValue, "\n";
    }

    $sequence =~ s/" "/"\n"/;
    print "\nSequence: ".$sequence, "\n";
}

这篇关于使用 libXML/XPath 提取和存储 XML 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆