如何从二叉搜索树中均匀地随机返回一个节点? [英] How to return a node, uniformly at random, from a binary search tree?

查看:193
本文介绍了如何从二叉搜索树中均匀地随机返回一个节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个BST(可能平衡或可能不平衡),如何随机地均匀返回任何节点?一个约束是您不能使用外部索引数据结构。您必须以一种遍历树的方式,使每个节点都有被访问的平等机会。

Given a BST (may or may not be balanced) how can one return "any" node uniformly at random? A constraint is you cannot use an external indexing data structure. You must traverse the tree in such a manner that every node has an equal chance of being visited.

这个问题使我困惑了很长时间。如果我们确实可以使用外部哈希表/指针,则可以对它们进行随机化并返回相应的节点。但是,我的同事提出了一个相当复杂的问题,其中没有其他数据结构可以使用。

This question has me perplexed for quite a while. If we can indeed use an external hashtable/pointers we could just randomize on those and return the corresponding node. However, my colleague has put forth a rather complex variant of the question, where no additional data structures can be used.


  • 一个简单的随机走法,一次50/50的机会进行左/右操作不起作用,因为返回节点的概率接近树的底部小得多(概率复合)

  • 即使随机生成深度 d 并最多遍历 d 个节点以返回该节点(如果是叶子则停止)不会生成均匀分布。

  • A simple random walk with a 50-50 chance of either going L/R doesn't work since the probability of returning a node near the bottom of the tree is much smaller(probabilities compound)
  • Even randomly generating a depth d and traversing at most d nodes to return the node (or stopping if it's a leaf) doesn't generate a uniform distribution.

更新:您也无法进行有序遍历,也无法将结果存储在数组中。

Update: You cannot have an inorder walk and store the result in an array either.

如何实现这种遍历?

推荐答案

以任何顺序行走树,并保持以下值:

Walk the tree in any order, keeping the following values:


  • N :看到的节点数

已选定:当前选定的节点。

最初, N 为0,选择的 None 。访问节点包含以下内容:

Initially, N is 0 and selected is None. Visiting a node consists of the following:


  1. 增量 N

生成一个在 [0,N)范围内的随机整数。

Generate a random integer in the range [0, N).

如果所选的随机整数为0,则将 selected 设置为当前节点。

If the random integer selected is 0, set selected to the current Node.

请注意,选择的值 N 在步行过程中进行修改。这意味着它们既是访问者函数的输入值,也是输出值。

Note that the values N and selected need to be modified during the walk. That means that they are both input and output values to the visitor function.

在遍历结束时, N 将是树中的节点数,而 selected 将是以均等概率选择的随机节点(假设您有一个好的随机数生成器)。

At the end of the walk, N will be the number of nodes in the tree, and selected will be a random node selected with uniform probability (assuming you have a good random number generator).

此算法不限于BST。它可以在任何形状的树上工作。特别是,它将在未知长度的对象的简单线性序列上工作,对应于众所周知的随机选择算法,该算法将遍历对象,用概率小于<$ c $的新访问对象替换选定的随机对象c> 1 / N 其中, N 是迄今为止看到的对象数。

This algorithm is not restricted to BSTs. It will work on any tree of any shape. In particular, it will work on a simple linear sequence of objects of unknown length, corresponding to the well-known random selection algorithm which is to iterate over the objects, replacing the selected random object with the newly visited one with probability 1/N where N is the number of objects seen to date.

如果您跟踪访问的节点,它也可以在任何连接的图上使用。

If you keep track of visited nodes, it will also work on any connected graph.

如果您有一棵非常大的树(或图),可能分布在多个树上对于服务器和/或存储设备,您可以使用此算法的不同表示形式,该算法提供一定程度的并行性(并且还避免了保持全局遍历结构或将值传递到该遍历的需求。)

If you have a very large tree (or graph), perhaps spread over a number of servers and/or storage devices, you can use a different presentation of this algorithm, which provides for a certain level of parallelism (and also prevents the need to keep a global walk structure or pass values into the walk).

我们假设每个节点服务器都可以直接访问 k 对象,并且可以间接访问某些已知数量的子级服务器。该算法允许有多余的孩子,但是假设网络通信(几乎)是完美的;处理网络拆分不在此答案的范围内。我们还假定每个查询都有一个关联的唯一查询号,这使我们能够处理一些网络工件。该查询没有其他信息(除了要响应的服务器以外),并且预计将返回一个由一个计数和一个随机选择的节点组成的元组。

We assume that each node-server has direct access to k objects and indirect access to some known number of children servers. The algorithm allows for redundant children, but assumes that network communication is (almost) perfect; dealing with network splits is outside of the scope of this answer. We also assume that every query has an associated unique query number, which allows us to deal with some networking artifacts. The query has no other information (other than the server to respond to), and is expected to return a tuple consisting of a count and a randomly-selected node.

何时节点服务器收到ID为 q 的查询,它将执行以下操作:

When a node-server receives a query with id q, it does the following:


  1. 如果先前已响应查询 q ,则立即返回< 0,空>

count 设置为 k 选择从它可以直接访问的 k 对象中随机选择的对象。

Set count to k and selected to a randomly-selected object from the k objects it has direct access to.

对于每个子服务器,发送查询(具有相同的查询ID)

For each child server, send the query (with the same query-id)

对于每个返回的响应(不返回)不管响应按什么顺序出现)

For each response returned (it doesn't matter what order the responses come in):

a。将 response.count 添加到 count

b。以概率 response.count / count ,将 selected 替换为 response.selected

b. With probability response.count / count, replace selected with response.selected

所有子级服务器都响应后,返回< count,selected>

When all children servers have responded, return <count, selected>

这篇关于如何从二叉搜索树中均匀地随机返回一个节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆