查找中值从增长的集 [英] Find median value from a growing set

查看:122
本文介绍了查找中值从增长的集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在接受采访时偶然发现了一个有趣的算法问题。我给我的答案,但不知道是否有更好的主意。所以我欢迎大家写一些关于他/她的想法。

I came across an interesting algorithm question in an interview. I gave my answer but not sure whether there is any better idea. So I welcome everyone to write something about his/her ideas.

您有一个空集。现在元件被放入集合逐一。我们假设所有的元素是整数,它们是不同的(根据集的定义,我们不考虑两个元件具有相同的值)。

You have an empty set. Now elements are put into the set one by one. We assume all the elements are integers and they are distinct (according to the definition of set, we don't consider two elements with the same value).

每一个新元素被添加到设定的时间,该组的中间值是问。中间值被定义中的相同数学:在已排序列表的中间元件。这里,特别是,当集的大小为偶数,假定组大小= 2 * X,中值元素是集的第x个元素。

Every time a new element is added to the set, the set's median value is asked. The median value is defined the same as in math: the middle element in a sorted list. Here, specially, when the size of set is even, assuming size of set = 2*x, the median element is the x-th element of the set.

一个例子: 先从一个空集, 12时加入,中位数为12, 7时加入,中位数为7, 8时加入,中位数为8, 11时添加,中位数为8, 5时添加,中位数为8, 16时加入,中位数为8, ......

An example: Start with an empty set, when 12 is added, the median is 12, when 7 is added, the median is 7, when 8 is added, the median is 8, when 11 is added, the median is 8, when 5 is added, the median is 8, when 16 is added, the median is 8, ...

注意,首先,将添加元素设定一个接一个和第二,我们不知道的元素将要被添加

Notice that, first, elements are added to set one by one and second, we don't know the elements going to be added.

我的回答。

由于它是一个关于寻找位数的问题,整理是必要的。最简单的解决方法是使用一个普通阵列,并保持该阵列排序。当一个新的元素来临时,使用二进制搜索找到的元素(log_n)的位置和元素添加到阵列中。由于它是一个正常的数组,以便移位阵列的其余部分是必要的,它的时间复杂度是n。当插入的元素,我们可以立即获得中位数,使用实例的时间。

Since it is a question about finding median, sorting is needed. The easiest solution is to use a normal array and keep the array sorted. When a new element comes, use binary search to find the position for the element (log_n) and add the element to the array. Since it is a normal array so shifting the rest of the array is needed, whose time complexity is n. When the element is inserted, we can immediately get the median, using instance time.

最坏的时间复杂度为:log_n + N + 1

The WORST time complexity is: log_n + n + 1.

另一种解决方案是使用链接列表。之所以使用链接列表是去除的移位阵列的需要。但找到新元素的位置,需要一个线性搜索。添加元素将即时时间,然后我们需要经过半个阵列,它总是占据N / 2时,找到中位数。

Another solution is to use link list. The reason for using link list is to remove the need of shifting the array. But finding the location of the new element requires a linear search. Adding the element takes instant time and then we need to find the median by going through half of the array, which always takes n/2 time.

最坏的时间复杂度为:N + 1 + N / 2

The WORST time complexity is: n + 1 + n/2.

第三种解决方案是使用二进制搜索树。使用一棵树,我们避免移阵列。但使用二叉搜索树,找到中位数是不是非常有吸引力。所以我改变二进制搜索树的方式,它始终是左子树和右子树是平衡的情况。这意味着,在任何时间,任一左子树和右子树具有节点的数目相同或右子树具有一个节点比在左子树更多。换句话说,就保证了在任何时候,该根元素是中位数。当然这需要改变在树的构建方式。技术细节类似于旋转红黑树。

The third solution is to use a binary search tree. Using a tree, we avoid shifting array. But using the binary search tree to find the median is not very attractive. So I change the binary search tree in a way that it is always the case that the left subtree and the right subtree are balanced. This means that at any time, either the left subtree and the right subtree have the same number of nodes or the right subtree has one node more than in the left subtree. In other words, it is ensured that at any time, the root element is the median. Of course this requires changes in the way the tree is built. The technical detail is similar to rotating a red-black tree.

如果树保持正常,可以确保在最坏时间复杂度为O(n)。

If the tree is maintained properly, it is ensured that the WORST time complexity is O(n).

于是三种算法都是线性的集合的大小。如果没有子线性算法存在,三种算法可以被认为是最佳的解决方案。因为它们不会彼此不同了,最好是最容易实现的,这是第二个,使用链接列表

So the three algorithms are all linear to the size of the set. If no sub-linear algorithm exists, the three algorithms can be thought as the optimal solutions. Since they don't differ from each other much, the best is the easiest to implement, which is the second one, using link list.

所以,我真的不知道是,会不会有一子线性算法对于这个问题,如果是这样那将是什么样。任何想法家伙?

So what I really wonder is, will there be a sub-linear algorithm for this problem and if so what will it be like. Any ideas guys?

史蒂夫。

推荐答案

您复杂的分析是混乱的。假设n项达尔增加;我们要输出N中位数(其中流中的第i个是第i项的中值)有效地流

Your complexity analysis is confusing. Let's say that n items total are added; we want to output the stream of n medians (where the ith in the stream is the median of the first i items) efficiently.

我相信这可以在O完成(N * LG N)使用两个优先级队列(如二进制或者斐波那契堆)的时间;一个队列为当前中值(因此最大的元素是位于顶部)以下项目,而另一个用于在它上面的项目(在此堆,最小的是在底部)。请注意,在斐波纳契(等)堆,插入是O(1)摊销;它只是弹出一个元素是O(LG N)。

I believe this can be done in O(n*lg n) time using two priority queues (e.g. binary or fibonacci heap); one queue for the items below the current median (so the largest element is at the top), and the other for items above it (in this heap, the smallest is at the bottom). Note that in fibonacci (and other) heaps, insertion is O(1) amortized; it's only popping an element that's O(lg n).

这将被称为网上中位选择算法,虽然<一href="http://en.wikipedia.org/wiki/Selection%5Falgorithm#Online%5Fselection%5Falgorithm">Wikipedia只有在线的最小/最大选择了会谈。这里有一个近似算法和的确定性和近似在线位数选择下界(下界表示没有更快的算法是可能的!)

This would be called an "online median selection" algorithm, although Wikipedia only talks about online min/max selection. Here's an approximate algorithm, and a lower bound on deterministic and approximate online median selection (a lower bound means no faster algorithm is possible!)

如果有少数比较N个可能的值,你也许可以打破基于比较的下限,就像你可以进行排序。

If there are a small number of possible values compared to n, you can probably break the comparison-based lower bound just like you can for sorting.

这篇关于查找中值从增长的集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆