给定大小的每组中大量数字的中位数 [英] Median of large amount of numbers for each sets of given size

查看:111
本文介绍了给定大小的每组中大量数字的中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题是:

在固定大小(k)的数字中查找大数据(n个数字)的中位数

find median of a large data(n numbers) in fixed size (k) of numbers

我的想法是:
维持2个堆,最大堆用于小于当前中位数的数字,最小堆用于大于当前中位数的数字.
主要概念是查找一个堆中先前集合的第一个元素(取决于它是<或>当前中位数),然后将其替换为我们遇到的新元素.
现在进行修改,例如| size(heap1)-size(heap2)| = 1或0,因为中位数是平均值.如果size1!= size2则为top元素的大小,否则为size> other大小的堆的top元素.

What i thought is :
maintain 2 heaps , maximum heap for numbers less than current median and minimum heap for numbers greater than current median .
The main concept is to FIND the 1st element of previous set in one of the heap (depending on it is < or > current median), and replace it with the new element we encounter .
Now modify such as to make |size(heap1) - size(heap2)| = 1 or 0 because median is avg. of top element if size1 != size2 else the top element of the heap with size > size of other .

我面临的问题是时间复杂度增加,因为查找,该元素花费O(n)时间,因此总计O(n*k),所以我无法达到所需的复杂度O(n*logk)(如问题源中所要求的那样.

The problem i am facing is the time complexity increases because finding the element takes O(n) time so total O(n*k), so i am not able to achieve the desired complexity O(n*logk) (as was required in source of the question).

如何在不使用额外空间的情况下减少它?

How should it be reduced , without using extra space ?

edit:输入:1 4 3 5 6 2,k = 4
中位数:
从1 4 3 5 =(4 + 3)/2
从4 3 5 6 =(4 + 5)/2 从3 5 6 2 =(3 + 5)/2

edit : input : 1 4 3 5 6 2 , k=4
median :
from 1 4 3 5 = (4+3)/2
from 4 3 5 6 = (4+5)/2 from 3 5 6 2= (3+5)/2

推荐答案

您可以使用订单统计树,这是一个带有一些附加信息的BST,它允许在具有 n的树中以O(log n )的时间在O(log n )时间中查找中位数,分位数和其他订单统计信息元素.

You can solve this problem using an order-statistic tree, which is a BST with some additional information that allows finding medians, quantiles and other order statistics in O(log n) time in a tree with n elements.

首先,使用第一个 k 元素构建OST.然后,在一个循环中:

First, construct an OST with the first k elements. Then, in a loop:

  1. 查找并报告中间值.
  2. 删除插入到树中的第一个元素(您可以找到数组中的哪个元素).
  3. 插入数组中的下一个元素.

如果树是自平衡的,那么这些步骤中的每个步骤都需要O(log k ),因为我们保持不变,即树永远不会增长到超过 k 的大小.还提供O( k )辅助空间.预处理需要O( k log k )时间,而循环将 n + 1- k 次重复总时间为O( n log k ).

Each of these steps takes O(log k) if the tree is self-balancing, because we maintain the invariant that the tree never grows beyond size k, which also gives O(k) auxiliary space. The preprocessing takes O(k log k) time while the loop repeats n + 1 - k times for a total time of O(n log k).

这篇关于给定大小的每组中大量数字的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆