使用Perl无法解析XML文件 [英] Trouble Parsing XML File Using Perl

查看：120 发布时间：2020/5/5 12:58:27 xml perl parsing manifest

本文介绍了使用Perl无法解析XML文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为一个Articulate电子学习课程(imsmanifest.xml)解析XML清单文件.

下面提供了XML结构的摘录(我正在尝试深入研究adlcp:masteryscore):

<?xml version="1.0" encoding="UTF-8"?>
<manifest xsi:schemaLocation="http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:adlcp="http://www.adlnet.org/xsd/adlcp_rootv1p2" xmlns="http://www.imsproject.org/xsd/imscp_rootv1p1p2" version="1.0" identifier="Electrical_Design_Part_3">
    <metadata/>
    <organizations default="Electrical_Design_Part_3_ORG">
      <organization identifier="Electrical_Design_Part_3_ORG">
        <title>Electrical Design - Part 3</title>
        <item identifier="Electrical_Design_Part_3_SCO" identifierref="Articulate_Presenter_RES" isvisible="true">
          <title>Electrical Design - Part 3</title>
          <adlcp:masteryscore>65</adlcp:masteryscore>
        </item>
      </organization>
    </organizations>
    <resources/>
</manifest>

我尝试使用XML :: Simple和XML :: LibXML.我可以使这些模块与更简单的XML文件一起正常工作，但不能与我实际需要解析的清单文件一起工作.

以下代码显示了我尝试使用XML :: LibXML向下钻取标题标签的情况:

use XML::LibXML;
$filename = "imsmanifest.xml";
$parser = XML::LibXML->new();
$xmldoc = $parser->parse_file($filename);

for my $sample ($xmldoc->findnodes('/manifest/organizations/organization/item/title')) {
    for my $property ($sample->findnodes('./*')) {
        print $property->nodeName(), ": ", $property->textContent(), "\n";
    }
    print "\n"; 
};

如何处理adlcp:masteryscore标记中的冒号?每当我尝试使用此功能时，都会出现错误-但也许我做的不正确.

有人可以告诉我向下钻取adlcp:masteryscore的正确方法吗?

非常感谢您.

解决方案

您要在空名称空间中找到名为manifest的元素，但是您要在http://www.imsproject.org/xsd/imscp_rootv1p1p2名称空间中找到名为manifest的元素.

修复:

use strict;
use warnings;

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

my $xml_qfn = 'imsmanifest.xml';

my $parser = XML::LibXML->new( no_network => 1 );
my $doc = $parser->parse_file($xml_qfn);

my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs( a => "http://www.adlnet.org/xsd/adlcp_rootv1p2" );
$xpc->registerNs( i => "http://www.imsproject.org/xsd/imscp_rootv1p1p2" );

for my $item ($xpc->findnodes('/i:manifest/i:organizations/i:organization/i:item', $doc)) {
    my $title   = $xpc->find('i:title/text()', $item);
    my $mastery = $xpc->find('a:masteryscore/text()', $item);
    print "$title: $mastery\n"; 
}

注意:在XPath(a和i)中使用的前缀的实际选择是任意的.您可以随意选择任何内容，就像编写XML文档时一样.

注意:我添加了no_network => 1，以防止libxml在每次解析XML文档时获取DTD.

I'm trying to parse an XML manifest file for an Articulate eLearning course (imsmanifest.xml).

An excerpt of the XML structure is provided below (I'm trying to drill down to adlcp:masteryscore):

<?xml version="1.0" encoding="UTF-8"?>
<manifest xsi:schemaLocation="http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:adlcp="http://www.adlnet.org/xsd/adlcp_rootv1p2" xmlns="http://www.imsproject.org/xsd/imscp_rootv1p1p2" version="1.0" identifier="Electrical_Design_Part_3">
    <metadata/>
    <organizations default="Electrical_Design_Part_3_ORG">
      <organization identifier="Electrical_Design_Part_3_ORG">
        <title>Electrical Design - Part 3</title>
        <item identifier="Electrical_Design_Part_3_SCO" identifierref="Articulate_Presenter_RES" isvisible="true">
          <title>Electrical Design - Part 3</title>
          <adlcp:masteryscore>65</adlcp:masteryscore>
        </item>
      </organization>
    </organizations>
    <resources/>
</manifest>

I've tried using XML::Simple and XML::LibXML. I can get these modules to work fine with simpler XML files, but not the manifest file I actually need to parse.

The following code shows my attempt to use XML::LibXML to drill down to the title tag:

use XML::LibXML;
$filename = "imsmanifest.xml";
$parser = XML::LibXML->new();
$xmldoc = $parser->parse_file($filename);

for my $sample ($xmldoc->findnodes('/manifest/organizations/organization/item/title')) {
    for my $property ($sample->findnodes('./*')) {
        print $property->nodeName(), ": ", $property->textContent(), "\n";
    }
    print "\n"; 
};

How does one deal with the colon in the adlcp:masteryscore tag? Whenever I try to use this, I get an error - but maybe I'm not doing it right.

Could someone please show me the correct way to drill down to adlcp:masteryscore?

Thank you very much.

解决方案

You're asking to locate elements named manifest in the null namespace, but you want elements named manifest in the http://www.imsproject.org/xsd/imscp_rootv1p1p2 namespace.

Fixes:

use strict;
use warnings;

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

my $xml_qfn = 'imsmanifest.xml';

my $parser = XML::LibXML->new( no_network => 1 );
my $doc = $parser->parse_file($xml_qfn);

my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs( a => "http://www.adlnet.org/xsd/adlcp_rootv1p2" );
$xpc->registerNs( i => "http://www.imsproject.org/xsd/imscp_rootv1p1p2" );

for my $item ($xpc->findnodes('/i:manifest/i:organizations/i:organization/i:item', $doc)) {
    my $title   = $xpc->find('i:title/text()', $item);
    my $mastery = $xpc->find('a:masteryscore/text()', $item);
    print "$title: $mastery\n"; 
}

Note: The actual choice of prefix for use in an XPaths (a and i) is arbitrary. You can pick whatever you want, just like when you compose an XML document.

Note: I added no_network => 1 to prevent libxml from fetching the DTDs every time you parse the XML doc.

这篇关于使用Perl无法解析XML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Perl无法解析XML文件 [英] Trouble Parsing XML File Using Perl

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Perl无法解析XML文件 [英] Trouble Parsing XML File Using Perl

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭