python lxml 我如何在项目名称中使用标签? [英] python lxml how i use tag in items name?

查看:56
本文介绍了python lxml 我如何在项目名称中使用标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用项目的特殊名称来构建xml文件,这是我当前的代码:

i need to build xml file using special name of items, this is my current code :

from lxml import etree
import lxml
from lxml.builder import E

wp = E.wp

tmp = wp("title")

print(etree.tostring(tmp))

当前输出是这样:

b'<wp>title</wp>'

我想成为:

b'<wp:title>title</title:wp>'

我如何创建名称如下的项目: wp:title ?

how i can create items with name like this : wp:title ?

推荐答案

您混淆了命名空间前缀 wp 以及标签名称.名称空间前缀是名称空间URI的文档本地名称. wp:title 需要解析器来查找 xmlns:wp ="..." 属性以查找名称空间本身(通常是URL,但是任何全局唯一的字符串都可以)),无论是在标签本身还是在父标签上.这会将标签连接到一个唯一值,而不会使标签名称太冗长而无法键入或读取.

You confused the namespace prefix wp with the tag name. The namespace prefix is a document-local name for a namespace URI. wp:title requires a parser to look for a xmlns:wp="..." attribute to find the namespace itself (usually a URL but any globally unique string would do), either on the tag itself or on a parent tag. This connects tags to a unique value without making tag names too verbose to type out or read.

您需要提供namepace以及(可选)提供到元素制造者对象的名称空间映射(将短名称映射到完整的名称空间名称).提供的默认 E 对象没有设置名称空间或名称空间映射.我将假设 wp http://wordpress.org/export/1.2/ Wordpress命名空间,尽管这似乎很可能,也可能是您尝试发送

You need to provide the namepace, and optionally, the namespace mapping (mapping short names to full namespace names) to the element maker object. The default E object provided doesn't have a namespace or namespace map set. I'm going to assume that here that wp is the http://wordpress.org/export/1.2/ Wordpress namespace, as that seems the most likely, although it could also be that you are trying to send Windows Phone notifications.

代替使用默认的 E 元素制作器,创建您自己的 ElementMaker 实例并将其传递给 namespace 参数以告诉 lxml元素所属的URL.为了在元素名称上获得正确的前缀,还需要为其提供一个 nsmap 字典,该字典将前缀映射到URL:

Instead of using the default E element maker, create your own ElementMaker instance and pass it a namespace argument to tell lxml what URL the element belongs to. To get the right prefix on your element names, you also need to give it a nsmap dictionary that maps prefixes to URLs:

from lxml.builder import ElementMaker

namespaces = {"wp": "http://wordpress.org/export/1.2/"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)

title = E.title("Value of the wp:title tag")

这会产生一个带有正确前缀 xmlns:wp 属性的标签:

This produces a tag with both the correct prefix, and the xmlns:wp attribute:

>>> from lxml.builder import ElementMaker
>>> namespaces = {"wp": "http://wordpress.org/export/1.2/"}
>>> E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)
>>> title = E.title("Value of the wp:title tag")
>>> etree.tostring(title, encoding="unicode")
'<wp:title xmlns:wp="http://wordpress.org/export/1.2/">Value of the wp:title tag</wp:title>'

您可以省略 nsmap 值,但是您希望在文档的 parent 元素上具有这样的映射.在这种情况下,您可能想为需要支持的每个命名空间创建单独的 ElementMaker 对象,然后将 nsmap 命名空间映射放在最外面的元素上.写出文档时, lxml 然后会使用短名称.

You can omit the nsmap value, but then you'd want to have such a map on a parent element of the document. In that case, you probably want to make separate ElementMaker objects for each namespace you need to support, and you put the nsmap namespace mapping on the outer-most element. When writing out the document, lxml then uses the short names throughout.

例如,创建一个 Wordpress WXR格式文档将需要多个名称空间:

For example, creating a Wordpress WXR format document would require a number of namespaces:

from lxml.builder import ElementMaker

namespaces = {
    "excerpt": "https://wordpress.org/export/1.2/excerpt/",
    "content": "http://purl.org/rss/1.0/modules/content/",
    "wfw": "http://wellformedweb.org/CommentAPI/",
    "dc": "http://purl.org/dc/elements/1.1/",
    "wp": "https://wordpress.org/export/1.2/",
}

RootElement = ElementMaker(nsmap=namespaces)
ExcerptElement = ElementMaker(namespace=namespaces["excerpt"])
ContentElement = ElementMaker(namespace=namespaces["content"])
CommentElement = ElementMaker(namespace=namespaces["wfw"])
DublinCoreElement = ElementMaker(namespace=namespaces["dc"])
ExportElement = ElementMaker(namespace=namespaces["wp"])

然后您将使用

doc = RootElement.rss(
    RootElement.channel(
        ExportElement.wxr_version("1.2"),
        # etc. ...
    ),
    version="2.0"
)

当用 etree.tostring(doc,pretty_print = True,encoding ="unicode")漂亮地打印时,会产生:

which, when pretty printed with etree.tostring(doc, pretty_print=True, encoding="unicode"), produces:

<rss xmlns:excerpt="https://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="https://wordpress.org/export/1.2/" version="2.0">
  <channel>
    <wp:wxr_version>1.2</wp:wxr_version>
  </channel>
</rss>

请注意,只有根< rss> 元素如何具有 xmlns 属性,以及< wp:wxr_version> 标记如何使用正确的前缀,即使我们只给它命名空间URI.

Note how only the root <rss> element has xmlns attributes, and how the <wp:wxr_version> tag uses the right prefix even though we only gave it the namespace URI.

举一个不同的例子,如果您要构建Windows Phone磁贴通知,它会更简单.毕竟,只有一个名称空间可以使用:

To give a different example, if you are building a Windows Phone tile notification, it'd be simpler. After all, there is just a single namespace to use:

from lxml.builder import ElementMaker

namespaces = {"wp": "WPNotification"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)

notification = E.Notification(
    E.Tile(
        E.BackgroundImage("https://example.com/someimage.png"),
        E.Count("42"),
        E.Title("The notification title"),
        # ...
    )
)

产生

<wp:Notification xmlns:wp="WPNotification">
  <wp:Tile>
    <wp:BackgroundImage>https://example.com/someimage.png</wp:BackgroundImage>
    <wp:Count>42</wp:Count>
    <wp:Title>The notification title</wp:Title>
  </wp:Tile>
</wp:Notification>

仅最外面的元素< wp:Notification> 现在具有 xmlns:wp 属性.所有其他元素仅需要包含 wp:前缀.

Only the outer-most element, <wp:Notification>, now has the xmlns:wp attribute. All other elements only need to include the wp: prefix.

请注意,所使用的前缀完全取决于您,甚至是可选.这是名称空间URI,它是在不同XML文档中唯一标识元素的真正关键.如果您使用的是 E = ElementMaker(namespace ="WPNotification",nsmap = {None:"WPNotification"}),因此生成的顶级元素具有< Notification xmlns ="WPNotification"> ,您仍然拥有完全合法的XML文档,根据XML标准,该文档具有完全相同的含义.

Note that the prefix used is entirely up to you and even optional. It is the namespace URI that is the real key to uniquely identifying elements across different XML documents. If you used E = ElementMaker(namespace="WPNotification", nsmap={None: "WPNotification"}) instead, and so produced a top-level element with <Notification xmlns="WPNotification"> you still have a perfectly legal XML document that, according to the XML standard, has the exact same meaning.

这篇关于python lxml 我如何在项目名称中使用标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆