使用递归使用 Python 解析 XML.返回值问题 [英] Xml parsing with Python using recursion. Problem with return value

查看:59
本文介绍了使用递归使用 Python 解析 XML.返回值问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Python 和一般编程有点陌生,所以我很抱歉.顺便说一下,提前致谢.

I am somewhat new to Python and programming in general so I apologize. By the way, thanks in advance.

我正在使用 Python 2.5、cElementTree 和 expat 解析 xml 文档(特别是在 Google 地球中使用的 kml).我试图从每种几何类型(即折线、多边形、点)的每个地标"节点内的名称"、描述"和坐标"节点中提取所有文本,但我想保留几何类型分离.例如,对于属于多边形"(即它有一个多边形"节点)的每个地标,我只需要名称"、描述"和坐标"文本.我还需要对折线"和点"执行此操作.我已经想出了一种方法来做到这一点,但代码冗长冗长且特定于每种几何类型,这导致了我的问题.

I am parsing an xml document (kml specifically which is used in Google Earth) using Python 2.5, cElementTree and expat. I am trying to pull out all the text from the 'name', 'description' and 'coordinates' nodes inside each 'placemark' node for each geometry type (i.e. polylines, polygon, point), but I want to keep the geometry types separate. For example, I want only the 'name','description', and 'coordinates' text for every placemark that is part of a 'polygon' (i.e. it has a 'polygon' node). I will need to do this for 'polylines' and 'points' also. I have figured out a way to do this, but the code is long a verbose and specific to each geometry type, which leads me to my question.

理想情况下,我想为每种几何类型使用相同的代码,但问题是每种几何类型都有不同的节点结构(即不同的节点名称和嵌套节点的数量).因此,对于概念证明,我认为这将是使用/学习递归来深入地标"节点的节点树并获取我正在寻找的信息的好机会.我查看了许多关于 Python 递归的帖子,但在实现所提供的解决方案时仍然遇到问题.

Ideally, I would like to use the same code for each geometry type, but the problem is that each geometry type has a different node structure (i.e. different node names and number of nested nodes). So for proof of concept I thought this would be a good opportunity to use/learn recursion to drill down the node tree of 'placemark' node and get the information I was looking for. I have looked at the many posts on Python recursion and am still having problems with implementing the solutions provided.

'placemark' 节点的示例 xml 是:

The sample xml for a 'placemark' node is:

 <Placemark>
    <name>testPolygon</name>
    <description>polygon text</description>
    <styleUrl>#msn_ylw-pushpin</styleUrl>
    <Polygon>
            <tessellate>1</tessellate>
            <outerBoundaryIs>
                    <LinearRing>
                            <coordinates>
                                    -81.4065,31.5072,0 -81.41269,31.45992,0 -81.34490,31.459696,0 
                            </coordinates>
                    </LinearRing>
            </outerBoundaryIs>
    </Polygon>
 </Placemark>

我使用的递归函数是:

def getCoords( child, searchNode ):

    # Get children of node
    children = child.getchildren()

    # If node has one or more child
    if len( children ) >= 1 :

        # Loop through all the children
        for child in children:

            # call to recursion function
            getCoords( child, searchNode )

    # If does not have children and is the 'searchNode'
    elif len( children ) == 0 and child.tag == searchNode:

        # Return the text inside the node. This is where it is not working    
        # Other posts recommended returning the function like 
        # return getCoords(child, searchNode), but I am getting an unending loop
        return child.text

    # Do nothing if node doesn't have children and does not match 'searchNode'    
    else: 

        print 'node does not have children and is not what we are looking for'

我调用递归函数如下:

searchNode = 'coordinates'

# loop through all 'Placemark nodes' in document
for mark in placemark:

    # Get children of 'Placemark' node
    children = mark.getchildren() 

    # Loop through children nodes
    for child in children:

        # if a 'Polygon' node is found
        if child.tag == 'Polygon':

            # call recursion function
            getCoords( child, searchNode)

我意识到,至少,我的部分问题是返回值.其他帖子建议返回该函数,我将其解释为返回 getCoords(child, searchNode),但我得到了一个无休止的循环.另外,我意识到这可以发布在 GIS 站点上,但我认为这更像是一个通用的编程问题.有任何想法吗?

I realize, at least, part of my problem is the return value. Other posts recommended returning the function, which I interpreted to be 'return getCoords(child, searchNode), but I am getting an unending loop. Also, I realize this could be posted on the GIS site, but I think this is more of a general programming question. Any ideas?

推荐答案

使用递归,您需要注意基本情况和递归情况.无论您的基本案例碰巧是什么,如果您希望能够从递归中收集信息,它们必须返回您的递归案例可以(并且更重要的是)使用的数据.同样,您需要确保您的递归案例返回的数据可以相互使用.

With recursion you want to pay attention to your base cases, and your recursive cases. Whatever your base cases happen to be, if you expect to be able to collect information from your recursion, they have to return data that your recursive cases can (and more importantly do) use. Similarly you need to make sure the data your recursive cases return can be used by each other.

首先确定您的基本情况和递归情况.基本情况是叶"节点,没有子节点.在基本情况下,您只想返回一些数据,而不是再次调用递归函数.这就是让您可以像他们所说的那样备份堆栈"并防止无限递归的原因.递归情况将要求您保存从一系列递归调用中收集的数据,这几乎就是您在 for 循环中所做的.

First identify your base and recursive cases. The base cases are the "leaf" nodes, with no children. In a base case you want to just return some data, and not call the recursive function again. This is what allows you to get "back up the stack" as they say, and prevent infinite recursion. The recursive cases will require you to save the data collected from a series of recursive calls, which is almost what you're doing in your for loop.

我注意到你有

# Recursive case: node has one or more child
if len( children ) >= 1 :
    # Loop through all the children
    for child in children:
        # call to recursion function
        getCoords( child, searchNode )

但是你对 getCoords 调用的结果做了什么?

but what are you doing with the results of your getCoords calls?

您要么想将结果保存在某种数据结构中,您可以在 for 循环结束时返回该数据结构,或者如果您对保存结果本身不感兴趣,只需打印您的基本情况 1(成功search ) 当你到达它而不是返回它.因为现在您的基本情况 1 只是将堆栈返回到一个对结果没有做任何事情的实例!所以试试:

You either want to save the results in some sort of a data structure which you can return at the end of your for loop, or if you're not interested in saving the results themselves, just print your base case 1 ( successful search ) when you reach it instead of returning it. Because now your base case 1 is just returning up the stack to an instance that isn't doing anything with the result! So try:

# If node has one or more child
if len( children ) >= 1 :
    # Data structure for your results
    coords = []
    # Loop through all the children
    for child in children:
        # call to recursion function
        result = getCoords( child, searchNode )
        # Add your new results together
        coords.extend(result)
    # Give the next instance up the stack your results!
    return coords

现在,由于您的结果在一个列表中并且您正在使用 extend() 方法,因此您还必须使您的基本案例返回列表!

Now since your results are in a list and you're using the extend() method you've got to make your base cases return lists as well!

# Base case 1: does not have children and is the 'searchNode'
elif len( children ) == 0 and child.tag == searchNode:
    # Return the text from the node, inside a list
    return [child.text]
# Base case 2: doesn't have children and does not match 'searchNode'
else:
    # Return empty list so your extend() function knows what to do with the result
    return []

这最终应该只为您提供一个列表,您可能希望将其存储在变量中.我刚刚在这里打印了结果:

This should just give you a single list in the end, which you'll probably want to store in a variable. I've just printed the results here:

searchNode = 'coordinates'
# loop through all 'Placemark nodes' in document
for mark in placemark:
    # Get children of 'Placemark' node
    children = mark.getchildren()
    # I imagine that getchildren() might return None, so check it
    # otherwise you'll get an error when trying to iterate on it
    if children:
        # Loop through children nodes
        for child in children:
            # if a 'Polygon' node is found
            if child.tag == 'Polygon':
                # call recursion function and print (or save) result
                print getCoords( child, searchNode)

这篇关于使用递归使用 Python 解析 XML.返回值问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆