c + +高效计算正在运行的中位数 [英] C++ Efficiently Calculating a Running Median

查看:330
本文介绍了c + +高效计算正在运行的中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

那些你已经阅读我的previous问题,知道我在工作的理解和执行快速排序和quickselect,以及一些其他的基本算法。

Those of you that have read my previous questions know about my work at understanding and implementing quicksort and quickselect, as well as some other basic algorithms.

Quickselect用于计算在未排序列表中的第k个最小的元素,并且这一概念也可用于寻找在未排序列表的中位数。

Quickselect is used to calculate the kth smallest element in an unsorted list, and this concept can also be used to find the median in an unsorted list.

这个时候,我需要援助,制定一个有效的方法来计算的运行中值,因为quickselect是不是一个好的选择,因为它需要重新计算每次名单的变化。由于quickselect有重新启动每次,它不能利用做previous计算的,所以我在寻找一个不同的算法,这是类似的(可能),但处于运行中位数的区域更有效。

This time, I need aid in devising an efficient technique to calculate the running median, because quickselect isn't a good choice as it needs to re-calculate every time the list changes. Because quickselect has to restart everytime, it can't take advantage of previous calculations done, so I'm looking for a different algorithm that's similar (possibly) but is more efficient in the area of running medians.

推荐答案

流中位数是计算使用两个堆。均小于或等于当前的中位数的数字是在左侧堆,其布置使得最大数是在堆的根。所有大于或等于当前的中位数的数字是在正确的堆,其布置使得最小数目是在堆的根。需要注意的是数字等于当前位数可处于堆。号在两个堆的计从不相差超过1

The streaming median is computed using two heaps. All the numbers less than or equal to the current median are in the left heap, which is arranged so that the maximum number is at the root of the heap. All the numbers greater than or equal to the current median are in the right heap, which is arranged so that the minimum number is at the root of the heap. Note that numbers equal to the current median can be in either heap. The count of numbers in the two heaps never differs by more than 1.

在这个过程开始的两个堆初始为空。在输入序列中的第一个数字将被添加到堆之一,它并不重要,并返回作为第一流中位数。输入序列中的第二个号码,然后添加到其它堆,如果有合适的堆的根小于左堆的两个堆被交换的根,并且两个数的平均值被返回作为第二流中位数。

When the process begins the two heaps are initially empty. The first number in the input sequence is added to one of the heaps, it doesn’t matter which, and returned as the first streaming median. The second number in the input sequence is then added to the other heap, if the root of the right heap is less than the root of the left heap the two heaps are swapped, and the average of the two numbers is returned as the second streaming median.

然后在主算法开始。输入序列中的每个随后的数量相比,目前中位数,以及添加到左堆如果它小于当前中值或向右堆,如果它是大于当前值;如果输入数等于当前的中位数,它被添加到任何堆具有较小的计数,或要么堆任意如果它们具有相同的计数。如果引起该两个堆的计数超过1不同,较大的堆的根被除去,并在更小的堆插入​​。然后当前正中被计算为更大的堆的根,如果它们在计数,或者两个堆的根的平均不同,如果它们是相同的尺寸。

Then the main algorithm begins. Each subsequent number in the input sequence is compared to the current median, and added to the left heap if it is less than the current median or to the right heap if it is greater than the current median; if the input number is equal to the current median, it is added to whichever heap has the smaller count, or to either heap arbitrarily if they have the same count. If that causes the counts of the two heaps to differ by more than 1, the root of the larger heap is removed and inserted in the smaller heap. Then the current median is computed as the root of the larger heap, if they differ in count, or the average of the roots of the two heaps, if they are the same size.

code方案和Python可在我的博客

Code in Scheme and Python is available at my blog.

这篇关于c + +高效计算正在运行的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆