使用 libXML/XPath 提取和存储 XML 数据 [英] Extracting and Storing XML Data with libXML/XPath
问题描述
use XML::LibXML;
use Data::Dumper;
#parsing file
my $dom = XML::LibXML->new->parse_file('sample.xml');
my $context = XML::LibXML::XPathContext->new( $dom->documentElement() );
$context->registerNs('u', 'http://uniprot.org/uniprot');
#print file to make sure it looks ok
print $dom, "\n";
#finds shortnames
my $sn = $context->findnodes('//u:shortName');
print 'ShortName: '.$sn, "\n";
#finds dbRefernce ids that are of type EC
my $ids = $context->findnodes('//u:dbReference[@type="EC"]/@id');
my $number =()= $ids =~ /\./gi;
print 'EC Values: '.$ids, "\n";
#finds sequences that have a length
my $seq = $context->findnodes('//u:sequence[@length>1]');
$seq =~ s/" "/"\n"/;
print 'Sequence: '.$seq, "\n";
我目前有这个代码,它在这个有 10 个标签的 xml 文件上运行(https://www.dropbox.com/s/dq8ir9f22cnfwrz/Sample.xml).截至目前,它正在提取此 xml 文件中的 10 个条目的短名称、dbReference 和序列,并将它们加在一起进行打印.我想要做什么,它有一个短名称、dbReference 和 xml 文件中每个条目的序列.是否可以让脚本一次为每个条目查找这些数据?我的最终目标是以特定方式格式化它们以进行输出.
I currently have this code, that runs on this xml file that has 10 tags (https://www.dropbox.com/s/dq8ir9f22cnfwrz/Sample.xml). As of now, it is extracting the shortname, dbReference, and sequence of the 10 entries in this xml file and adding them together to print. What I would like to do, it have a shortname, dbReference, and Sequence for each entry in the xml file. Is it possible to have the script look for these data one at a time for each entry? My end goal is to format them in a specific way for output.
我正在考虑让代码在此之前运行,它只会提取条目,然后将它们发送到其余代码以进行数据提取.
I was thinking of having code that runs before this, that will extract only the entries, then send them to the rest of the code for data extraction.
谢谢
推荐答案
你需要查询一个节点集(它返回一个集合):
You need to query for a node-set (which returns a collection):
my @entries = $context->findnodes('//u:entry');
然后,为每个节点运行一个上下文 XPath 表达式 findnodes(expression, context-node)
,将该节点作为第二个参数传递,例如:
Then, for each node you run a contextual XPath expression findnodes(expression, context-node)
, passing the node as the second argument, for example:
foreach $entry (@entries) {
my $entryName = $context->findnodes('u:name', $entry);
...
}
这是使用您的代码的尝试:
Here is an attempt using your code:
use XML::LibXML;
use Data::Dumper;
#parsing file
my $dom = XML::LibXML->new->parse_file('sample.xml');
my $context = XML::LibXML::XPathContext->new( $dom->documentElement() );
$context->registerNs('u', 'http://uniprot.org/uniprot');
my @entries = $context->findnodes('//u:entry');
foreach $entry (@entries) {
my $entryName = $context->findnodes('u:name', $entry);
my @shortNames = $context->findnodes('.//u:shortName', $entry);
my @dbRefs = $context->findnodes('.//u:dbReference[@type="EC"]/@id', $entry);
my $sequence = $context->findnodes('.//u:sequence[@length>1]');
print "============================================================\n";
print "\nName: ".$entryName."\n";
print "\nShort Names: \n";
$i=0;
foreach $shortName (@shortNames) {
print ++$i.': '.$shortName->firstChild, "\n";
}
print "\nEC Values: \n";
$i=0;
foreach $dbRef (@dbRefs) {
print ++$i.': '.$dbRef->nodeValue, "\n";
}
$sequence =~ s/" "/"\n"/;
print "\nSequence: ".$sequence, "\n";
}
这篇关于使用 libXML/XPath 提取和存储 XML 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!