为什么在算法中使用子树大小来选择二叉树中的随机节点? [英] Why subtree size is used in algorithm to select random node in a binary tree?

查看:49
本文介绍了为什么在算法中使用子树大小来选择二叉树中的随机节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶然发现了几种从二叉树中选择随机节点的算法实现,它们都使用子树大小属性.但是,我不明白为什么知道子树大小会有所帮助.这是实现 AB.

I stumbled upon several implementations of algorithm to select a random node from a binary tree, and they all use subtree size property. However, I don't understand why knowing subtree size helps. Here's implementation A and B.

大致思路可以用这个伪代码来描述:

The general idea can be described in this pseudocode:

Node getRandomNode() {
    Random seed = new Random;
    int random = random.nextInt(this.size);

    if (this.left.size == random)
        return this;
  
    if (this.left.size > random) {
        return this.left.getRandomNode();
    } else {
        return this.right.getRandomNode();
    }
}

我假设 getRandomNode() 方法总是在树的根上调用.

I assume that the getRandomNode() method is always called on the root of the tree.

据我所知,选择节点的随机性完全取决于 random 整数的随机性.在某些情况下,如果 random 重复自身,则 getRandomNode() 将返回相同的节点.

As far as I see the randomality of choosing a node depends solely on the randomality of random integer. In certain cases if random repeats itself then getRandomNode() will return the same node.

我也不明白算法如何在完整的二叉树中工作.每个节点子树的大小将始终是偶数.因此,如果 random 是偶数,则算法在此行上将不会匹配:

Also I don't understand how the algorithm can work in a complete binary tree. The size of each node subtree will always be an even number. So if random is an even number the algorithm will not have a match on this line:

if (this.left.size == random)
        return this;

并且仅当 random 是偶数时才会产生随机数,这意味着该算法不是随机的.

and will only produce a random number if random is an even number, which means that the algorithm is not random.

我错过了什么吗?

如果我必须实现该算法,我只需存储一个额外的链表,其中每个节点都有一个索引(如 Java 中的 ArrayList),然后从列表中返回索引等于 a 的节点从 random.nextInt() 返回的随机数.

If I had to implement the algorithm I would just store an additional linked list where each node would have an index (like ArrayList in Java) and then return the node from the list whose index equals a random number returned from random.nextInt().

推荐答案

感谢您在此处概述您的思考过程.让我们一步一步地看一遍你在这里所说的内容.

Thanks for outlining your thought process here. Let's go through what you're saying here one step at a time.

我假设 getRandomNode() 方法总是在树的根上调用.

I assume that the getRandomNode() method is always called on the root of the tree.

它总是在 some 子树的根上调用,但它不一定是整个树的根.例如,考虑这个简单的树:

It is always called on the root of some subtree, but it's not necessarily going to be the overall tree root. For example, consider this simple tree:

                  A
                 / \
                B   C
                   / \
                  D   E

这里,第一个调用将是 A.getRandomNode(),它将从以 A 为根的子树中随机选择一个节点.之后,我们有 1/5 的机会询问对于左子树中的一个随机节点,我们有 3/5 的机会从右子树中请求一个随机节点,1/5 的机会我们停止并返回 A.让我们说,为了争论,这里生成的随机数是 4,我们决定查看右子树.这意味着下一个调用是 C.getRandomNode(),它要求从这棵树中挑选一个随机节点:

Here, the first call will be to A.getRandomNode(), which will pick a random node out of the subtree rooted at A. After that, there's a 1/5 chance that we ask for a random node out of the left subtree, a 3/5 chance that we ask for a random node out of the right subtree, and a 1/5 chance that we stop and return A. Let's say, for the sake of argument, that the random number generated here is 4 and that we decide to look in the right subtree. That means that the next call is to C.getRandomNode(), which asks to pick a random node out of this tree:

             C
            / \
           D   E

在这里,我们将生成一个随机数,使得有 1/3 的机会从左子树中请求一个随机节点,1/3 的机会从右子树中请求一个随机节点,而 1/3 以 C 停止的机会.为了论证起见,假设我们生成随机数 0 并向左走.这会调用 D.getRandomNode(),它会从这个子树中请求一个随机节点:

Here, we'll generate a random number such that there's a 1/3 chance of asking for a random node from the left subtree, a 1/3 chance of asking for a random node from the right subtree, and a 1/3 chance of stopping with C. For the sake of argument, suppose we generate the random number 0 and go to the left. That calls D.getRandomNode(), which asks for a random node out of this subtree:

               D

此调用将始终返回 D,因为这是唯一的选择.

This call will always return D, since that's the only option.

希望这能让您对算法的工作原理有更多了解.在每个时间点,我们都希望有平等的机会选择任何元素.因此,我们决定是停止、向左还是向右,根据每个选项中的节点数对我们的选择进行加权.

Hopefully this gives you a bit more of a sense of how the algorithm works. At each point in time, we want to have an equal chance of choosing any element. Therefore, we decide whether to stop, go left, or go right, weighting our choice based on the number of nodes in each of the option.

据我所知,选择节点的随机性完全取决于随机整数的随机性.在某些情况下,如果随机重复自身,则 getRandomNode() 将返回相同的节点.

As far as I see the randomality of choosing a node depends solely on the randomality of random integer. In certain cases if random repeats itself then getRandomNode() will return the same node.

您说得对,这里唯一的随机性来源是选择了哪个随机整数.这是有道理的,因为树本身不是随机的.

You're correct that the only source of randomness here is which random integer gets chosen. And that would make sense, since the tree itself isn't random.

话虽如此,您在此处展示的实现在沿着树向下走的每一步都进行了单独的随机计算.因此,多个不同的随机选择都必须走相同的路,才能得到与 reuslt 相同的节点.

That being said, the implementation that you've shown here makes a separate random calculation at each step in walking down the tree. Therefore, multiple different random choices all have to go the same way in order to get the same node as the reuslt.

我也不明白算法如何在完整的二叉树中工作.每个节点子树的大小将始终是偶数.因此,如果 random 是偶数,则算法在此行上将不匹配:

Also I don't understand how the algorithm can work in a complete binary tree. The size of each node subtree will always be an even number. So if random is an even number the algorithm will not have a match on this line:

if (this.left.size == random)
     return this;

并且只有当 random 是偶数时才会产生一个随机数,这意味着该算法不是随机的.

and will only produce a random number if random is an even number, which means that the algorithm is not random.

你是对的,如果我们只生成一个随机数并在整个过程中使用这个数字,那么是的,你会遇到这样的问题.然而,这不是算法的工作方式.相反,每次我们对 getRandomNode() 进行递归调用时,我们都会选择一个新的随机数并使用它来进行路由.因此,无论我们选择的第一个数字是偶数还是奇数,我们最终都会到达一个唯一选择是返回当前节点的位置,此时我们返回一些东西.

You're correct that if we just generated a single random number and used that number throughout the process, then yes, you'd run into trouble with something like this. However, that's not how the algorithm works. Instead, each time we make a recursive call to getRandomNode(), we pick a new random number and use that to do our routing. Therefore, regardless of whether the first number we picked is even or odd, we'll eventually get to a spot where the only option is to return the current node, at which point we return something.

如果我必须实现该算法,我只需存储一个额外的链表,其中每个节点都有一个索引(如 Java 中的 ArrayList),然后从列表中返回索引等于 a 的节点random.nextInt().

If I had to implement the algorithm I would just store an additional linked list where each node would have an index (like ArrayList in Java) and then return the node from the list whose index equals a random number returned from random.nextInt().

你绝对可以这样解决问题.如果您使用的树从未改变 - 从未添加或删除任何内容 - 那么这是一个完全合理的策略.

You absolutely could solve the problem this way. If the tree you're working with never changes - nothing is ever added or removed - then this is a perfectly reasonable strategy.

标记子树大小的方法之所以有用,是因为它可以让您从树中随机采样,即使正在添加或删除节点.具体来说,您可以相当快速地调整存储在每个节点中的数字以响应节点的插入或删除,这比必须在 ArrayList 中重建或混洗元素要快得多,因为要改变形状的树.查看订单统计树,了解有关如何执行此操作的更多详细信息.

The reason that the approach of tagging subtree sizes is useful is that it lets you sample from the tree randomly even if nodes are being added or removed. Specifically, you can fairly quickly adjust the numbers stored in each node in response to an insertion or removal of a node, much faster than having to rebuild or shuffle elements around in an ArrayList were the tree to change shape. Check out the order statistic tree for more details about how to do this.

这篇关于为什么在算法中使用子树大小来选择二叉树中的随机节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆