Python在XML方面是否不好? [英] Is Python bad at XML?

查看:161
本文介绍了Python在XML方面是否不好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这个问题中使用短语不好的XML一直是一个争论点,所以我想从一开始就提供一个在这个上下文中我非常清楚这个术语的含义:如果对标准XML API的支持很差,并强制一个人使​​用特定于语言的API,其中命名空间似乎是一个事后的想法,那么我将倾向于表征该语言不太适合使用XML作为其他没有这些问题的主流语言。 不好的XML只是这些条件的缩影,我认为这是一个公平的方式来表征它。正如我将会描述的,我最初的Python经验引起了人们对于满足这些条件的担忧。但是,因为一般来说,我对Python的经验是非常积极的,似乎我错过了一些东西,从而激发了这个问题。

The use of the phrase "bad at XML" in this question has been a point of contention, so I'd like to start out by providing a very clear definition of what I mean by this term in this context: if support for standard XML APIs is poor, and forces one to use a language-specific API, in which namespaces seem to be an afterthought, then I would be inclined to characterize that language as being not as well suited to using XML as other mainstream languages that do not have these issues. "Bad at XML" is just a shorthand for these conditions, and I think it is a fair way to characterize it. As I will describe, my initial experience with Python has raised concerns about whether it fulfils these conditions; but, because in general my experience with Python has been quite positive, it seems likely that I'm missing something, thus motivating this question.

我试图做使用Python进行一些非常简单的XML处理。我最初希望能够重用我对标准W3C DOM API的知识,并高兴地发现 xml.dom xml.dom.minidom 模块做了很好的支持这些API。不幸的是,序列化被证明是有问题的,原因如下:

I'm trying to do some very simple XML processing with Python. I had initially hoped to be able to reuse my knowledge of standard W3C DOM API's, and happily found that the xml.dom and xml.dom.minidom modules did a good job of supporting these API's. Unfortunately, however, serialization proved to be problematic, for the following reasons:

  • xml.dom does not come with a serializer
  • the PyXML library, which includes a serializer for xml.dom, is no longer maintained, AND
  • minidom does not support serialization of namespaces, even though namespaces are supported in the API

我在这里浏览了其他类似W3C的图书馆列表:

I looked through the list of other W3C-like libraries here:

http://wiki.python.org/moin/PythonXml#W3CDOM-likelibraries

我发现许多其他库,如4Suite和libxml2dom也没有被维护。

I found that many other libraries, such as 4Suite and libxml2dom, are also not maintained.

另一方面, itools 在乍看起来似乎保持不变,但似乎没有可用的Ubuntu / Debian软件包,所以很难部署和维护。

On the other hand, itools at first glance appears to be maintained, but there does not appear to be an Ubuntu/Debian package available, and so would be difficult to deploy and maintain.

在这一点上,似乎试图在我的Python应用程序中使用W3C DOM API将要死机,我开始看看ElementTree API。但是eTree API支持命名空间的方式我认为是非常丑陋的,每次创建特定命名空间中的元素时,都需要使用字符串连接:

At this point, it seemed like trying to use W3C DOM API's in my Python application was going to be dead-end, and I began to look at the ElementTree API. But the way the eTree API supports namespaces I think is horribly ugly, requiring one to use string concatenation every time an element in a particular namespace is created:

http://lxml.de/tutorial.html#namespaces

所以,我的问题是,我是否忽略了某些东西,还是支持XML(特别是W3C DOM)在Python中实际上是非常糟糕的?

So, my question is, have I overlooked something, or is support for XML (in particular W3C DOM) actually quite bad in Python?

下面列出更精确的问题,答案将真正帮助我:

Here follows a list of more precise questions, the answers to which would really help me:


  • 在Python中是否有合理的W3C DOM支持?

  • 如果不是 xml.dom ,你是否使用eg 而不是W3C DOM?

  • 如果是这样,哪个库是最好的,并且如何克服API中命名空间的问题?

  • 如果您使用W3C DOM,您是否知道实现序列化并支持命名空间的库?

  • Is there reasonable support for W3C DOM in Python?
  • If not xml.dom, do you use e.g. etree instead of W3C DOM?
  • If so, which library is best, and how do you overcome the issues regarding namespacing in the API?
  • If you use W3C DOM instead, are you aware of a library that implements serialization with support for namespaces?

推荐答案

我会说python很好地处理XML。可用的不同图书馆的数量说明了 - 你有很多选择。如果您想要使用图书馆缺少功能,请随时提供一些补丁!

I would say python handles XML pretty well. The number of different libraries available speaks to that - you have lots of options. And if there are features missing from libraries that you would like to use, feel free to contribute some patches!

我个人使用DOM和lxml.etree(etree是真的快)。但是,我感到你对命名空间的痛苦。我写了一个快速帮助功能来处理它:

I personally use the DOM and lxml.etree (etree is really fast). However, I feel your pain about the namespace thing. I wrote a quick helper function to deal with it:

DEFAULT_NS = "http://www.domain.org/path/to/xml"

def add_xml_namespace(path, namespace=DEFAULT_NS):
    """Adds namespaces to an XPath-ish expression path for etree

    Test simple expression:
    >>> add_xml_namespace('image/namingData/fileBaseName')
    '{http://www.domain.org/path/to/xml}image/{http://www.domain.org/path/to/xml}namingData/{http://www.domain.org/path/to/xml}fileBaseName'

    More complicated expression
    >>> add_xml_namespace('.//image/*') 
    './/{http://www.domain.org/path/to/xml}image/*'

    >>> add_xml_namespace('.//image/text()')
    './/{http://www.domain.org/path/to/xml}image/text()'
    """
    pattern = re.compile(r'^[A-Za-z0-9-]+$')
    tags = path.split('/')
    for i in xrange(len(tags)):
        if pattern.match(tags[i]):
            tags[i] = "{%s}%s" % (namespace, tags[i])
    return '/'.join(tags)

我这样使用:

from lxml import etree
from utilities import add_xml_namespace as ns

tree = etree.parse('file.xml')
node = tree.get_root().find(ns('root/group/subgroup'))
# etc. 

如果你不知道提前命名空间,您可以从根节点提取它:

If you don't know the namespace ahead of time, you can extract it from a root node:

tree = etree.parse('file.xml')
root = tree.getroot().tag
namespace = root[1:root.index('}')]
ns = lambda path: add_xml_namespace(path, namespace)
...

其他评论:在这里,但是处理XML时需要工作。这不是一个python问题,它是一个XML问题。

Additional comment: There is a little work involved here, but work is necessary when dealing with XML. That's not a python issue, it's an XML issue.

这篇关于Python在XML方面是否不好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆