有没有比标准的“递归"更快的方法从 python 中的树状结构中获取子树? [英] Is there a faster way to get subtrees from tree like structures in python than the standard "recursive"?

查看:17
本文介绍了有没有比标准的“递归"更快的方法从 python 中的树状结构中获取子树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设以下数据结构具有三个 numpy 数组 (id, parent_id)(根元素的 parent_id 为 -1):

Let's assume the following data structur with three numpy arrays (id, parent_id) (parent_id of the root element is -1):

import numpy as np
class MyStructure(object):
  def __init__(self):
    """
    Default structure for now:

          1
         / \
        2   3
           / \
          4   5
    """
    self.ids = np.array([1,2,3,4,5])
    self.parent_ids = np.array([-1, 1, 1, 3, 3])

  def id_successors(self, idOfInterest):
    """
    Return logical index.
    """
    return self.parent_ids == idOfInterest

  def subtree(self, newRootElement):
    """
    Return logical index pointing to elements of the subtree.
    """
    init_vector = np.zeros(len(self.ids), bool)
    init_vector[np.where(self.ids==newRootElement)[0]] = 1
    if sum(self.id_successors(newRootElement))==0:
      return init_vector
    else:
      subtree_vec = init_vector
      for sucs in self.ids[self.id_successors(newRootElement)==1]:
        subtree_vec += self.subtree(sucs)
      return subtree_vec

对于许多 id (>1000) 来说,这个获取真的很慢.有没有更快的方法来实现它?

This get's really slow for many ids (>1000). Is there a faster way to implement that?

推荐答案

我认为伤害你的不是递归本身,而是每一步的大量非常广泛的操作(对所有元素).考虑:

I think it's not the recursion as such that's hurting you, but the multitude of very wide operations (over all elements) for every step. Consider:

init_vector[np.where(self.ids==newRootElement)[0]] = 1

扫描所有元素,计算每个匹配元素的索引,然后只使用第一个元素的索引.此特定操作可用作列表、元组和数组的方法索引 - 并且速度更快.如果 ID 是唯一的,则 init_vector 无论如何都只是 ids==newRootElement.

That runs a scan through all elements, calculates the index of every matching element, then uses only the index of the first one. This particular operation is available as the method index for lists, tuples, and arrays - and faster there. If IDs are unique, init_vector is simply ids==newRootElement anyway.

if sum(self.id_successors(newRootElement))==0:

再次对每个元素进行线性扫描,然后对整个数组进行归约,只是为了检查是否存在匹配项.将 any 用于此类操作,但我们甚至不需要对所有元素进行检查 - 如果 newRootElement 不在 self.parent_ids 中"可以完成这项工作,但这不是必需的,因为在空列表上执行 for 循环是完全有效的.

Again a linear scan of every element, then a reduction on the whole array, just to check if any matches are there. Use any for this type of operation, but once again we don't even need to do the check on all elements - "if newRootElement not in self.parent_ids" does the job, but it's not necessary as it's perfectly valid to do a for loop over an empty list.

最后是最后一个循环:

for sucs in self.ids[self.id_successors(newRootElement)==1]:

这一次,重复了一个 id_successors 调用,然后将结果与 1 进行了不必要的比较.只有在那之后才会进行递归,确保对每个分支重复上述所有操作(对于不同的 newRootElement).

This time, an id_successors call is repeated, and then the result is compared to 1 needlessly. Only after that comes the recursion, making sure all the above operations are repeated (for different newRootElement) for each branch.

整个代码是一个单向树的反向遍历.我们有父母,需要孩子.如果我们要进行诸如 numpy 之类的广泛操作,我们最好让它们计数 - 因此我们唯一关心的操作是为每个父级构建一个子级列表.用一次迭代就可以做到这一点:

The whole code is a reversed traversal of a unidirectional tree. We have parents and need children. If we're to do wide operations such as numpy is designed for, we'd best make them count - and thus the only operation we care about is building a list of children per parent. That's not very hard to do with one iteration:

import collections
children=collections.defaultdict(list)
for i,p in zip(ids,parent_ids):
  children[p].append(i)

def subtree(i):
  return i, map(subtree, children[i])

您需要的确切结构将取决于更多因素,例如树更改的频率、大小、分支的数量以及您需要请求的子树的大小和数量.例如,上面的字典+列表结构并不是非常节省内存.您的示例也已排序,这可以使操作更容易.

The exact structure you need will depend on more factors, such as how often the tree changes, how large it is, how much it branches, and how large and many subtrees you need to request. The dictionary+list structure above isn't terribly memory efficient, for instance. Your example is also sorted, which could make the operation even easier.

这篇关于有没有比标准的“递归"更快的方法从 python 中的树状结构中获取子树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆