为什么不鼓励 XML::Simple? [英] Why is XML::Simple Discouraged?

查看:25
本文介绍了为什么不鼓励 XML::Simple?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 XML::Simple:

不鼓励在新代码中使用此模块.其他模块可提供更直接和一致的接口.特别推荐使用 XML::LibXML.

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.

这个模块的主要问题是大量的选项以及这些选项交互的任意方式 - 通常会产生意想不到的结果.

The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.

有人能帮我解释一下造成这种情况的主要原因是什么吗?

Can someone clarify for me what the key reasons for this are?

推荐答案

真正的问题是 XML::Simple 主要尝试做的是采用 XML,并将其表示为 perl 数据结构.

The real problem is that what XML::Simple primarily tries to do is take XML, and represent it as a perl data structure.

正如您无疑会从 perldata 中了解到的那样,您可以可用的是 hasharray.

As you'll no doubt be aware from perldata the two key data structures you have available is the hash and the array.

  • 数组是有序的标量.
  • 哈希是无序的键值对.

而且 XML 也没有真正做到.它具有以下元素:

And XML doesn't do either really. It has elements which are:

  • 非唯一命名(这意味着散列不适合").
  • .... 但在文件中是有序的".
  • 可能有属性(可以插入到哈希中)
  • 可能有内容(但可能没有,但可能是一元标签)
  • 可能有孩子(任何深度)

而且这些东西不会直接映射到可用的 perl 数据结构——在简单的层面上,散列的嵌套散列可能适合——但它无法处理具有重复名称的元素.您也无法轻松区分属性和子节点.

And these things don't map directly to the available perl data structures - at a simplistic level, a nested hash of hashes might fit - but it can't cope with elements with duplicated names. Nor can you differentiate easily between attributes and child nodes.

所以 XML::Simple 尝试根据 XML 内容进行猜测,并从各种选项设置中获取提示",然后当您尝试并输出内容,它(试图)反向应用相同的过程.

So XML::Simple tries to guess based on the XML content, and takes 'hints' from the various option settings, and then when you try and output the content, it (tries to) apply the same process in reverse.

因此,对于除最简单 XML 之外的任何内容,它充其量变得笨拙,或者在最坏的情况下丢失数据.

As a result, for anything other than the most simple XML, it becomes unwieldy at best, or loses data at worst.

考虑:

<xml>
   <parent>
       <child att="some_att">content</child>
   </parent>
   <another_node>
       <another_child some_att="a value" />
       <another_child different_att="different_value">more content</another_child>
   </another_node>
</xml>

这 - 当通过 XML::Simple 解析时给你:

This - when parsed through XML::Simple gives you:

$VAR1 = {
          'parent' => {
                      'child' => {
                                 'att' => 'some_att',
                                 'content' => 'content'
                               }
                    },
          'another_node' => {
                            'another_child' => [
                                               {
                                                 'some_att' => 'a value'
                                               },
                                               {
                                                 'different_att' => 'different_value',
                                                 'content' => 'more content'
                                               }
                                             ]
                          }
        };

注意 - 现在你在 parent 下 - 只是匿名哈希,但在 another_node 下你有一个匿名哈希数组.

Note - now you have under parent - just anonymous hashes, but under another_node you have an array of anonymous hashes.

所以为了访问child的内容:

my $child = $xml -> {parent} -> {child} -> {content};

注意你是如何得到一个子"节点的,它下面有一个内容"节点,这不是因为它是……内容.

Note how you've got a 'child' node, with a 'content' node beneath it, which isn't because it's ... content.

但是要访问第一个 another_child 元素下方的内容:

But to access the content beneath the first another_child element:

 my $another_child = $xml -> {another_node} -> {another_child} -> [0] -> {content};

注意如何 - 由于有多个 <another_node> 元素,XML 已被解析为一个数组,而不是一个数组.(如果您确实在其下方有一个名为 content 的元素,那么您最终会得到其他内容).您可以通过使用 ForceArray 来更改它,但最终会得到散列数组散列数组的散列 - 尽管它至少在处理子元素方面是一致的.注意,以下讨论 - 这是一个糟糕的默认值,而不是 XML::Simple 的缺陷.

Note how - because of having multiple <another_node> elements, the XML has been parsed into an array, where it wasn't with a single one. (If you did have an element called content beneath it, then you end up with something else yet). You can change this by using ForceArray but then you end up with a hash of arrays of hashes of arrays of hashes of arrays - although it is at least consistent in it's handling of child elements. Note, following discussion - this is a bad default, rather than a flaw with XML::Simple.

您应该设置:

ForceArray => 1, KeyAttr => [], ForceContent => 1

如果将其应用于上述 XML,则会得到:

If you apply this to the XML as above, you get instead:

$VAR1 = {
          'another_node' => [
                            {
                              'another_child' => [
                                                 {
                                                   'some_att' => 'a value'
                                                 },
                                                 {
                                                   'different_att' => 'different_value',
                                                   'content' => 'more content'
                                                 }
                                               ]
                            }
                          ],
          'parent' => [
                      {
                        'child' => [
                                   {
                                     'att' => 'some_att',
                                     'content' => 'content'
                                   }
                                 ]
                      }
                    ]
        };

这将为您提供一致性,因为您将不再有与多节点不同的单节点元素处理方式.

This will give you consistency, because you will no longer have single node elements handle differently to multi-node.

但你仍然:

  • 有一个 5 引用深度树来获取一个值.

例如:

print $xml -> {parent} -> [0] -> {child} -> [0] -> {content};

您仍然将 contentchild 散列元素视为属性,并且由于散列是无序的,您根本无法重构输入.所以基本上,你必须解析它,然后通过 Dumper 运行它来找出你需要查看的位置.

You still have content and child hash elements treated as if they were attributes, and because hashes are unordered, you simply cannot reconstruct the input. So basically, you have to parse it, then run it through Dumper to figure out where you need to look.

但是使用 xpath 查询,您可以使用以下命令访问该节点:

But with an xpath query, you get at that node with:

findnodes("/xml/parent/child"); 

你在 XML::Simple 中没有得到的,而你在 XML::Twig(我假设 XML::LibXML 但我不太了解):

What you don't get in XML::Simple that you do in XML::Twig (and I presume XML::LibXML but I know it less well):

  • xpath 支持.xpath 是一种表示节点路径的 XML 方式.因此,您可以使用 get_xpath('//child') 在上面找到"一个节点.您甚至可以在 xpath 中使用属性 - 例如 get_xpath('//another_child[@different_att]') 它将准确选择您想要的属性.(您也可以迭代匹配项).
  • cutpaste 来移动元素
  • parsefile_inplace 允许您通过就地编辑修改 XML.
  • pretty_print 选项,用于格式化 XML.
  • twig_handlerspurge - 允许您处理非常大的 XML,而无需将其全部加载到内存中.
  • simplify 如果您确实必须使其与 XML::Simple 向后兼容.
  • 代码通常比尝试遵循对散列和数组的引用的菊花链简单得多,由于结构上的根本差异,这永远无法一致地完成.
  • xpath support. xpath is an XML way of expressing a path to a node. So you can 'find' a node in the above with get_xpath('//child'). You can even use attributes in the xpath - like get_xpath('//another_child[@different_att]') which will select exactly which one you wanted. (You can iterate on matches too).
  • cut and paste to move elements around
  • parsefile_inplace to allow you to modify XML with an in place edit.
  • pretty_print options, to format XML.
  • twig_handlers and purge - which allows you to process really big XML without having to load it all in memory.
  • simplify if you really must make it backwards compatible with XML::Simple.
  • the code is generally way simpler than trying to follow daisy chains of references to hashes and arrays, that can never be done consistently because of the fundamental differences in structure.

它也广泛可用 - 可以从 CPAN 轻松下载,并作为可安装包分发到许多操作系统上.(遗憾的是,这不是默认安装.但是)

It's also widely available - easy to download from CPAN, and distributed as an installable package on many operating systems. (Sadly it's not a default install. Yet)

参见:XML::Twig 快速参考

为了比较:

my $xml = XMLin( *DATA, ForceArray => 1, KeyAttr => [], ForceContent => 1 );

print Dumper $xml;
print $xml ->{parent}->[0]->{child}->[0]->{content};

对比.

my $twig = XML::Twig->parse( *DATA );
print $twig ->get_xpath( '/xml/parent/child', 0 )->text;
print $twig ->root->first_child('parent')->first_child_text('child');

这篇关于为什么不鼓励 XML::Simple?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆