如何在存储在DataFrame中的树中查找叶子 [英] How to find leaves in a tree stored in a DataFrame

查看:92
本文介绍了如何在存储在DataFrame中的树中查找叶子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前,我发布了此链接.

我想知道是否有人可以帮助仅检索叶子(如果不平衡的树),而不必检索N级的节点.

I'd like to know if someone could help on retrieving only the leaves (in case of unbalanced tree) and not necessarily the nodes of level N.

例如,给定下面的树(请注意:此树与链接上的树有些不同)

For instance, given the below tree (Note: this tree is a bit different than the one on the link)

child|   level|  parent|
   40|       1|      40|
   21|       2|      40|
   18|       2|      40|
   27|       2|      40|
   37|       3|      21|
    2|       3|      18|
   85|       3|      21|
   19|       3|      21|
   14|       4|      37|
   58|       4|       2|
   47|       4|      37|
   34|       4|      85|
   45|       4|      18|
   32|       4|       2|
   47|       4|      85|
   88|       4|      85|
   12|       4|      37|

如果我请求孩子40的所有叶子,我将获取叶子:所有4个孩子,其中19个(没有4个孩子)和27个(因为节点停止了2个孩子).

If I request all the leaves of the child 40, I retrieve as leaves : all the level 4 children with 19 (it doesn’t have level 4 children) and also 27 (because the node stopped level 2).

对于18岁的孩子,年龄分别为58岁和32岁.

For the child 18, it will be 58 and 32.

推荐答案

如果我理解正确,则需要叶子,即不是父母的孩子. 您可以通过以下方式获得它们:

If I understood correctly, you want the leaves, i.e. the children that aren't parents. You can get them by doing:

set(df['child']) - set(df['parent'])

如果您愿意使用networkx,则可以使用很多现有功能:

edit:

If you are willing to use networkx, you can use a lot of existing functionality:

import matplotlib.pyplot as plt
import networkx as nx

# create directed graph from dataframe:
G=nx.from_pandas_edgelist(df, source='parent', target='child', create_using=nx.DiGraph())

#visualise
nx.draw_networkx(G,with_labels=True)

#nitpicking here: your tree isn't a tree: 47 has two parents


# you can find leaves with this function:
def find_leaves(G, node):

    # list all descendants of the node, as well as the node
    d = list(nx.descendants(G, node))+[node]

    # create a subgraph with only these nodes and find the leaves. 
    H = G.subgraph(d)
    return [a for a in H.nodes if H.out_degree(a)==0 and H.in_degree(a)==1]

find_leaves(G, 18)

输出:

[45, 32, 58]

如果您不想使用networkx,则可以执行以下操作:

edit 2:

If you don't want to use networkx, you can do the following:

#create edgelist from dataframe:
edges = []
for ix, row in df.iterrows():
    edges.append((row['parent'], row['child']))

# recursive function that starts at start_node and returns nodes that 
# have no children:

def find_children(edges, start_node):

    # find edges that have the start node as the parent
    starting_edges = [(p,c) for p,c in edges if p == start_node]
    leaves = []

    # if node has children, loop through the children
    if starting_edges:
        for p, c in starting_edges:
            leaves += find_children(edges, c)

    # if the node has no children, store in list and return.
    else:
        leaves.append(start_node)

    return leaves


#testing:
find_children(edges, 18)

#output:
[58,32,45]

find_children(edges, 85)

#output:
[34, 47, 88]

这篇关于如何在存储在DataFrame中的树中查找叶子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆