xml.etree.ElementTree 获取节点深度 [英] xml.etree.ElementTree get node depth

查看:52
本文介绍了xml.etree.ElementTree 获取节点深度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

XML:

<?xml version="1.0"?>
<pages>
    <page>
        <url>http://example.com/Labs</url>
        <title>Labs</title>
        <subpages>
            <page>
                <url>http://example.com/Labs/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Labs/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Labs/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
    <page>
        <url>http://example.com/Tests</url>
        <title>Tests</title>
        <subpages>
            <page>
                <url>http://example.com/Tests/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Tests/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Tests/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
</pages>

代码:

// rexml is the XML string read from a URL
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
for node in tree.iter('page'):
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30

输出:

http://example.com/article1
Article1
------------------------------
http://example.com/article1/subarticle1
SubArticle1
------------------------------
http://example.com/article2
Article2
------------------------------
http://example.com/article3
Article3
------------------------------

Xml 表示站点地图的树状结构.

The Xml represents a tree like structure of a sitemap.

我整天都在文档和谷歌上翻来覆去,无法弄清楚如何获取条目的节点深度.

I have been up and down the docs and Google all day and can't figure it out hot to get the node depth of entries.

我使用了子容器的计数,但这仅适用于第一个父容器,然后它中断了,因为我无法弄清楚如何重置.但这可能只是一个骇人听闻的想法.

I used counting of the children container but that only works for the first parent and then it breaks as I can't figure it out how to reset. But this is probably just a hackish idea.

所需的输出:

0
http://example.com/article1
Article1
------------------------------
1
http://example.com/article1/subarticle1
SubArticle1
------------------------------
0
http://example.com/article2
Article2
------------------------------
0
http://example.com/article3
Article3
------------------------------

推荐答案

Used lxml.html.

import lxml.html

rexml = ...

def depth(node):
    d = 0
    while node is not None:
        d += 1
        node = node.getparent()
    return d

tree = lxml.html.fromstring(rexml)
for node in tree.iter('page'):
    print depth(node)
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30

这篇关于xml.etree.ElementTree 获取节点深度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆