在数据框中查找节点的更简便方法 [英] Easier way to find nodes in a dataframe

查看:63
本文介绍了在数据框中查找节点的更简便方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个父级,我想获得其节点,级别为N(原始数据帧要大得多)

Given a parent, I would like to get its nodes, of level N (original dataframe is way larger)

child|   level|  parent|
   40|       1|      40|
   21|       2|      40|
   18|       2|      40|
   37|       3|      21|
    2|       3|      18|
   85|       3|      21|
   14|       4|      37|
   58|       4|       2|
   47|       4|      37|
   34|       4|      85|
   45|       4|      18|
   32|       4|       2|
   47|       4|      85|
   88|       4|      85|
   12|       4|      37|

我做什么:

def get_children(x, tree, lst):
    for nod in tree[tree['parent'] == x]['child']:
        get_children(nod, tree, lst)
        lst.append(nod)

然后过滤N级的所有节点.

and then I filter all nodes of level N.

我想要另一种方法,因为当数据帧较大时,递归调用太多.

I would like another way because there are too much recursion calls when the dataframe is larger.

推荐答案

使用辅助数据结构,您可以分两个步骤解决此任务:

Using an auxiliary data structure you could solve this task in two steps:

  1. 存储每个节点的所有前辈
  2. 对于第N级的每个节点,检查节点x是否在前任列表中

没有进一步的假设(例如,邻接列表是有序的,并且较高级别的节点不能是较低级别节点的父级),则您将需要在列表中循环至少两次(一次创建父级索引,然后再一次)每次查询).但是,如果可以做出上述假设,那么一个循环就足够了.

Without further assumptions (e.g. that the adjacency list is ordered, and higher level nodes cannot be parents of lower level ones), you would need to cycle through the list at least twice (once to create the parents index, then once more for each lookup). But if the aforementioned assumptions can be made, a single loop would be sufficient.

# determine all predecessors of each node in tree
def index_parents(tree):
  parents = {}
  for i, n in tree.iterrows():
    child = int(n['child']) 
    parent = int(n['parent'])
    # if this child node is seen for the first time, initialize parents
    if child not in parents: 
      parents[child] = set()
    # add current parent to predecessor list
    parents[child].update(set([parent]))
    # add predecessors of parent to predecessor list
    parents[child].update(parents[parent])

  return parents

# check all nodes at given level if they contain x as predecessor
def get_children(x, tree, level, parents):
  x_children = []
  for i, n in tree.iterrows():
    child = int(n['child'])
    if n['level'] == level and x in parents[child]:
      # x is in predecessors of child, add it to the result
      x_children.append(child)

  return x_children

为您提供测试数据的测试:

A test given you test data:

>>> parents = index_parents(tree)
>>> get_children(21, tree, 3, parents)                                                                                    
>>> [37, 85]

这似乎有些倒退,但是避免了递归.

It may seem a bit backwards, but it avoids the recursion.

这篇关于在数据框中查找节点的更简便方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆