如何在存储在DataFrame中的树中查找叶子 [英] How to find leaves in a tree stored in a DataFrame
问题描述
以前,我发布了此链接.
我想知道是否有人可以帮助仅检索叶子(如果不平衡的树),而不必检索N级的节点.
I'd like to know if someone could help on retrieving only the leaves (in case of unbalanced tree) and not necessarily the nodes of level N.
例如,给定下面的树(请注意:此树与链接上的树有些不同)
For instance, given the below tree (Note: this tree is a bit different than the one on the link)
child| level| parent|
40| 1| 40|
21| 2| 40|
18| 2| 40|
27| 2| 40|
37| 3| 21|
2| 3| 18|
85| 3| 21|
19| 3| 21|
14| 4| 37|
58| 4| 2|
47| 4| 37|
34| 4| 85|
45| 4| 18|
32| 4| 2|
47| 4| 85|
88| 4| 85|
12| 4| 37|
如果我请求孩子40的所有叶子,我将获取叶子:所有4个孩子,其中19个(没有4个孩子)和27个(因为节点停止了2个孩子).
If I request all the leaves of the child 40, I retrieve as leaves : all the level 4 children with 19 (it doesn’t have level 4 children) and also 27 (because the node stopped level 2).
对于18岁的孩子,年龄分别为58岁和32岁.
For the child 18, it will be 58 and 32.
推荐答案
如果我理解正确,则需要叶子,即不是父母的孩子. 您可以通过以下方式获得它们:
If I understood correctly, you want the leaves, i.e. the children that aren't parents. You can get them by doing:
set(df['child']) - set(df['parent'])
如果您愿意使用networkx
,则可以使用很多现有功能:
edit:
If you are willing to use networkx
, you can use a lot of existing functionality:
import matplotlib.pyplot as plt
import networkx as nx
# create directed graph from dataframe:
G=nx.from_pandas_edgelist(df, source='parent', target='child', create_using=nx.DiGraph())
#visualise
nx.draw_networkx(G,with_labels=True)
#nitpicking here: your tree isn't a tree: 47 has two parents
# you can find leaves with this function:
def find_leaves(G, node):
# list all descendants of the node, as well as the node
d = list(nx.descendants(G, node))+[node]
# create a subgraph with only these nodes and find the leaves.
H = G.subgraph(d)
return [a for a in H.nodes if H.out_degree(a)==0 and H.in_degree(a)==1]
find_leaves(G, 18)
输出:
[45, 32, 58]
如果您不想使用networkx
,则可以执行以下操作:
edit 2:
If you don't want to use networkx
, you can do the following:
#create edgelist from dataframe:
edges = []
for ix, row in df.iterrows():
edges.append((row['parent'], row['child']))
# recursive function that starts at start_node and returns nodes that
# have no children:
def find_children(edges, start_node):
# find edges that have the start node as the parent
starting_edges = [(p,c) for p,c in edges if p == start_node]
leaves = []
# if node has children, loop through the children
if starting_edges:
for p, c in starting_edges:
leaves += find_children(edges, c)
# if the node has no children, store in list and return.
else:
leaves.append(start_node)
return leaves
#testing:
find_children(edges, 18)
#output:
[58,32,45]
find_children(edges, 85)
#output:
[34, 47, 88]
这篇关于如何在存储在DataFrame中的树中查找叶子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!