从python树表示中提取父节点和子节点 [英] Extract parent and child node from python tree representation

查看:728
本文介绍了从python树表示中提取父节点和子节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]

我在Python中有许多可用的字符串,它们实际上是树的表示形式.我想为每个单词提取父节点和子节点,例如对于'Hello'我想要(INTJ, UH),对于'My'它是(NP, PRP$).

I have many of these strings available in Python, which are actually tree representations. I want to extract the parent and child node for every word, e.g. for 'Hello' I want (INTJ, UH), and for 'My' it is (NP, PRP$).

这是我想要的结果:

(INTJ, UH) , (NP, PRP$), (NP, NN) , (VP, VBZ) , (VP , VPZ) , (ADJP, JJ) , (WHNP, WP), (SQ, VBZ), (NP, PRP$), (NP, NN)

我该怎么做?

推荐答案

您的字符串显然是Tree对象列表的表示.如果您可以访问该列表或以其他方式重新构建该列表,那就更好了;否则,创建可使用的数据结构的最直接方法是

Your string is obviously the representation of a list of Tree objects. It would be much better if you had access to, or could reconstruct in some other way, that list – if not, the most straightforward way to create a data structure you can work with is eval() (with all the usual caveats about calling eval() on user-supplied data).

由于您没有对Tree类进行任何说明,因此我将编写一个简单的就此问题而言就足够了:

Since you don't say anything about your Tree class, I'll write a simple one that suffices for the purposes of this question:

class Tree:

    def __init__(self, name, branches):
        self.name = name
        self.branches = branches

现在我们可以重新创建您的数据结构:

Now we can recreate your data structure:

data = eval("""[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]""")

有了这些,我们可以编写一个函数,该函数生成所需的2元组列表:

Once we have that, we can write a function that produces the list of 2-tuples you want:

def tails(items, path=()):
    for item in items:
        if isinstance(item, Tree):
            if item.name in {".", ","}:  # ignore punctuation
                continue
            for result in tails(item.branches, path + (item.name,)):
                yield result
        else:
            yield path[-2:]

此函数递归地下降到树中,每次击中适当的叶子节点时都会产生最后两个Tree名称.

This function descends recursively into the tree, yielding the last two Tree names each time it hits an appropriate leaf node.

示例用法:

>>> list(tails(data))
[('INTJ', 'UH'), ('NP', 'PRP$'), ('NP', 'NN'), ('VP', 'VBZ'), ('ADJP', 'JJ'), ('WHNP', 'WP'), ('SQ', 'VBZ'), ('NP', 'PRP$'), ('NP', 'NN')]

这篇关于从python树表示中提取父节点和子节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆