在沿着长的数据序列的固定大小的移动窗口中找到中值 [英] find median in a fixed-size moving window along a long sequence of data

查看:150
本文介绍了在沿着长的数据序列的固定大小的移动窗口中找到中值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个数据序列(它可能有重复),一个固定大小的移动
窗口,在每次迭代时从数据
序列的开始移动窗口,这样
(1)从窗口中移除最旧的数据元素并且将新的数据
元素推入窗口
(2)在每次移动时找到窗口内的数据的中值。

Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.

以下帖子没有帮助。

有效地找到随机序列的中值

joining data based on a moving time window in R

My想法:

使用2堆来保存中位数。在窗口中,在第一次迭代中对窗口
中的数据进行排序,最小堆保存较大的部分,最大堆
保存较小的部分。如果窗口具有奇数个数据,则最大堆
返回中值,否则
两个堆的顶部元素的算术平均值是中值。

Use 2 heaps to hold median. In side the window, sort the data in the window in the first iteration, the min heap holds the larger part and the max heap holds the smaller part. If the window has odd number of data, the max heap returns the median otherwise the arithmetic mean of the top elements of the two heaps is the median.

将新数据推入窗口时,从堆的一个
中删除最旧的数据,并将新数据与最大值进行比较, min堆这样
就决定要放哪个堆的数据。然后,找到中值只是
喜欢在第一次迭代。

When a new data is pushed in to the window, remove the oldest data from one of the heap and compare the new data with the top of max and min heap so that to decide which heap the data to be put. Then, find the median just like in the first iteration.

但是,如何在堆中找到数据元素是一个问题。 Heap是一个二进制
树而不是二叉搜索树。

But, how to find a data element in a heap is a problem. Heap is a binary tree not a binary search tree.

可以使用O(n)或O(n * lg m)解决它,其中m是窗口大小,
空间:O )?

Is it possible to solve it with O(n) or O(n * lg m) where m is the window size and space: O(1) ?

任何帮助都非常感激。

Any help is really appreciated.

感谢

推荐答案

只需将窗口保持为两个 std :: set ,一个用于下半部分,一个用于上半部分。插入一个新元素花费O(lg m),找到和删除一个旧元素的成本相同。使用您在问题中描述的方法确定中值使成本O(1)。

Just maintain your window as two std::sets, one for the lower half, one for the upper half. Insertion of a new element costs O(lg m), finding and removal of an old element costs the same. Determining the median using the method you described in your question costs O(1).

当您将窗口滑过序列时,在每次迭代中, (O(lg m)),插入新项目(O(lg m))并计算中值(O(1)),得到总共O(n lg m)。

As you slide the window over your sequence, in each iteration you remove the item falling out of the window (O(lg m)), insert the new item (O(lg m)) and compute the median (O(1)), resulting in a total of O(n lg m).

这个解决方案使用空间O(m),当然,但我不认为你可以不存储窗口的内容。

This solution uses space O(m), of course but I don't think you can get away without storing the window's contents.

这篇关于在沿着长的数据序列的固定大小的移动窗口中找到中值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆