使用2个堆查找中位数的复杂性 [英] Complexity of finding the median using 2 heaps

查看:151
本文介绍了使用2个堆查找中位数的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

找到给定的n个数字的中位数的一种方法是将它们分布在2个堆中. 1是包含较低n/2(ceil(n/2))个数字的max-heap和包含其余n/min的min-heap.如果以这种方式进行维护,则中位数为第一个堆的最大值(如果n为偶数,则为第二个堆的最小值).这是执行此操作的我的C ++代码:

A way of finding the median of a given set of n numbers is to distribute them among 2 heaps. 1 is a max-heap containing the lower n/2 (ceil(n/2)) numbers and a min-heap containing the rest. If maintained in this way the median is the max of the first heap (along with the min of the second heap if n is even). Here's my c++ code that does this:

priority_queue<int, vector<int> > left;
priority_queue<int,vector<int>, greater<int> > right;
cin>>n; //n= number of items
for (int i=0;i<n;i++) {
    cin>>a;
    if (left.empty())
        left.push(a);
    else if (left.size()<=right.size()) {
            if (a<=right.top())
                left.push(a);
            else {
                left.push(right.top());
                right.pop();
                right.push(a);
            }
    }
    else {
        if (a>=left.top())
            right.push(a);
        else {
            right.push(left.top());
            left.pop();
            left.push(a);
        }
    }
}

我们知道,heapify操作具有线性复杂性.这是否意味着如果像上面的代码那样将数字一一插入到两个堆中,就会发现线性时间的中位数?

We know that the heapify operation has linear complexity . Does this mean that if we insert numbers one by one into the two heaps as in the above code, we are finding the median in linear time?

推荐答案

线性时间heapify的费用是作为批处理操作从未排序的数组构建堆,而不是一次通过插入一个值来构建堆.

Linear time heapify is for the cost of building a heap from an unsorted array as a batch operation, not for building a heap by inserting values one at a time.

考虑一个最小堆,您将在其中按升序插入值流.堆顶部的值是最小的,因此每个值都会一直滴到堆底部.仅考虑插入值的最后一半.这时堆将具有几乎接近其全部高度,即log(n),因此每个值都会滴入log(n)个插槽,并且插入n/2个值的开销为O(n log(n))

Consider a min heap where you are inserting a stream of values in increasing order. The value at the top of the heap is the smallest, so each value trickles all the way down to the bottom of the heap. Consider just the last half of the values inserted. At this time the heap will have very nearly its full height, which is log(n), so each value trickles down log(n) slots, and the cost of inserting n/2 values is O(n log(n))

如果我向中值发现算法按升序显示值流,那么它要做的一件事就是从值流中按升序构建最小堆,因此中值发现的成本为O(n log(n)).实际上,最大堆实际上将要执行很多删除和插入操作,但这只是最重要的一个因素,因此我认为总体复杂度仍为O(n log(n))

If I present a stream of values in increasing order to your median finding algorithm one of the things it has to do is build a min heap from a stream of values in increasing order so the cost of the median finding is O(n log(n)). In, fact the max heap is going to be doing a lot of deletes as well as insertions, but this is just a constant factor on top so I think the overall complexity is still O(n log(n))

这篇关于使用2个堆查找中位数的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆