为什么不鼓励使用XML :: Simple? [英] Why is XML::Simple Discouraged?

查看:73
本文介绍了为什么不鼓励使用XML :: Simple?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

XML::Simple 的文档中:

From the documentation of XML::Simple:

不建议在新代码中使用此模块.提供了其他模块,这些模块提供了更加直接和一致的界面.特别是,强烈建议使用XML :: LibXML.

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.

此模块的主要问题是大量的选项以及这些选项交互的任意方式-通常会产生意想不到的结果.

The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.

有人可以为我澄清造成这种情况的主要原因吗?

Can someone clarify for me what the key reasons for this are?

推荐答案

真正的问题是XML::Simple主要尝试做的是采用XML,并将其表示为perl数据结构.

The real problem is that what XML::Simple primarily tries to do is take XML, and represent it as a perl data structure.

毫无疑问,您会从 perldata 中了解到,两个可用的关键数据结构是hasharray.

As you'll no doubt be aware from perldata the two key data structures you have available is the hash and the array.

  • 数组是有序标量.
  • 哈希是无序的键值对.

XML也不是真的.它具有以下元素:

And XML doesn't do either really. It has elements which are:

  • 非唯一命名(这意味着哈希值不适合).
  • ....,但在文件中被排序".
  • 可能具有属性(您可以将其插入到哈希中)
  • 可能有内容(但可能没有,但可能是一元标记)
  • 可能有孩子(任何深度)

这些东西并不能直接映射到可用的perl数据结构-在简单化的层次上,可能适合嵌套的哈希散列-但它不能处理具有重复名称的元素.您也无法轻松地区分属性和子节点.

And these things don't map directly to the available perl data structures - at a simplistic level, a nested hash of hashes might fit - but it can't cope with elements with duplicated names. Nor can you differentiate easily between attributes and child nodes.

因此,XML::Simple尝试基于XML内容进行猜测,并从各种选项设置中获取提示",然后当您尝试输出 内容时,(尝试)将其应用同样的过程则相反.

So XML::Simple tries to guess based on the XML content, and takes 'hints' from the various option settings, and then when you try and output the content, it (tries to) apply the same process in reverse.

因此,对于大多数最简单的 XML而言,它充其量是笨拙的,或者最坏的是丢失数据的.

As a result, for anything other than the most simple XML, it becomes unwieldy at best, or loses data at worst.

考虑:

<xml>
   <parent>
       <child att="some_att">content</child>
   </parent>
   <another_node>
       <another_child some_att="a value" />
       <another_child different_att="different_value">more content</another_child>
   </another_node>
</xml>

这-通过XML::Simple解析后,您会得到:

This - when parsed through XML::Simple gives you:

$VAR1 = {
          'parent' => {
                      'child' => {
                                 'att' => 'some_att',
                                 'content' => 'content'
                               }
                    },
          'another_node' => {
                            'another_child' => [
                                               {
                                                 'some_att' => 'a value'
                                               },
                                               {
                                                 'different_att' => 'different_value',
                                                 'content' => 'more content'
                                               }
                                             ]
                          }
        };

注意-现在您在parent下-只是匿名哈希,但是在another_node下您有一系列匿名哈希.

Note - now you have under parent - just anonymous hashes, but under another_node you have an array of anonymous hashes.

因此为了访问child的内容:

my $child = $xml -> {parent} -> {child} -> {content};

请注意,您如何获得一个子"节点,其下具有一个内容"节点,这并不是因为它是...内容.

Note how you've got a 'child' node, with a 'content' node beneath it, which isn't because it's ... content.

但是要访问第一个another_child元素下面的内容:

But to access the content beneath the first another_child element:

 my $another_child = $xml -> {another_node} -> {another_child} -> [0] -> {content};

请注意-由于具有多个<another_node>元素,因此XML被解析为一个数组,而不是一个数组. (如果您确实在其下有一个名为content的元素,那么您最终还会得到其他东西).您可以使用ForceArray进行更改,但最后得到的是散列数组的散列散列数组的散列-尽管它在处理子元素方面至少是一致的.注意,下面的讨论-这是一个糟糕的默认设置,而不是XML :: Simple的缺陷.

Note how - because of having multiple <another_node> elements, the XML has been parsed into an array, where it wasn't with a single one. (If you did have an element called content beneath it, then you end up with something else yet). You can change this by using ForceArray but then you end up with a hash of arrays of hashes of arrays of hashes of arrays - although it is at least consistent in it's handling of child elements. Note, following discussion - this is a bad default, rather than a flaw with XML::Simple.

您应该设置:

ForceArray => 1, KeyAttr => [], ForceContent => 1

如果将其应用于上述XML,则会得到:

If you apply this to the XML as above, you get instead:

$VAR1 = {
          'another_node' => [
                            {
                              'another_child' => [
                                                 {
                                                   'some_att' => 'a value'
                                                 },
                                                 {
                                                   'different_att' => 'different_value',
                                                   'content' => 'more content'
                                                 }
                                               ]
                            }
                          ],
          'parent' => [
                      {
                        'child' => [
                                   {
                                     'att' => 'some_att',
                                     'content' => 'content'
                                   }
                                 ]
                      }
                    ]
        };

这将给您带来一致性,因为您将不再拥有单节点元素,而处理方式却不同于多节点.

This will give you consistency, because you will no longer have single node elements handle differently to multi-node.

但是您仍然:

  • 具有5个参考深度树以获取值.

例如:

print $xml -> {parent} -> [0] -> {child} -> [0] -> {content};

您仍然将contentchild哈希元素视为属性,并且由于哈希是无序的,因此您根本无法重构输入.因此,基本上,您必须对其进行解析,然后通过Dumper运行它以找出需要查找的位置.

You still have content and child hash elements treated as if they were attributes, and because hashes are unordered, you simply cannot reconstruct the input. So basically, you have to parse it, then run it through Dumper to figure out where you need to look.

但是通过xpath查询,您可以通过以下方式到达该节点:

But with an xpath query, you get at that node with:

findnodes("/xml/parent/child"); 

您在 XML::Twig 中所没有得到的(而且我认为XML::LibXML,但我不太了解):

What you don't get in XML::Simple that you do in XML::Twig (and I presume XML::LibXML but I know it less well):

  • xpath支持. xpath是一种表示节点路径的XML方式.因此,您可以使用get_xpath('//child')在上面找到一个节点.您甚至可以在xpath中使用属性-像get_xpath('//another_child[@different_att]')一样,它将完全选择您想要的属性. (您也可以迭代比赛).
  • cutpaste来移动元素
  • parsefile_inplace允许您通过就地编辑来修改XML.
  • pretty_print选项,以格式化XML.
  • twig_handlerspurge-允许您处理非常大的XML,而不必将其全部加载到内存中.
  • simplify,如果您真的必须使其与XML::Simple向后兼容.
  • 通常,该代码比尝试遵循对哈希和数组的引用的菊花链要简单得多,由于结构上的根本差异,该代码永远无法做到一致.
  • xpath support. xpath is an XML way of expressing a path to a node. So you can 'find' a node in the above with get_xpath('//child'). You can even use attributes in the xpath - like get_xpath('//another_child[@different_att]') which will select exactly which one you wanted. (You can iterate on matches too).
  • cut and paste to move elements around
  • parsefile_inplace to allow you to modify XML with an in place edit.
  • pretty_print options, to format XML.
  • twig_handlers and purge - which allows you to process really big XML without having to load it all in memory.
  • simplify if you really must make it backwards compatible with XML::Simple.
  • the code is generally way simpler than trying to follow daisy chains of references to hashes and arrays, that can never be done consistently because of the fundamental differences in structure.

它也广泛可用-易于从CPAN下载,并作为可安装软件包分发在许多操作系统上. (遗憾的是,这不是默认安装.)

It's also widely available - easy to download from CPAN, and distributed as an installable package on many operating systems. (Sadly it's not a default install. Yet)

请参阅: XML :: Twig快速参考

为便于比较:

my $xml = XMLin( \*DATA, ForceArray => 1, KeyAttr => [], ForceContent => 1 );

print Dumper $xml;
print $xml ->{parent}->[0]->{child}->[0]->{content};

VS.

my $twig = XML::Twig->parse( \*DATA );
print $twig ->get_xpath( '/xml/parent/child', 0 )->text;
print $twig ->root->first_child('parent')->first_child_text('child');

这篇关于为什么不鼓励使用XML :: Simple?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆