快速计算最小,最大的和平均值来电号码的 [英] Fast calculation of min, max, and average of incoming numbers

查看:154
本文介绍了快速计算最小,最大的和平均值来电号码的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

节目接收大约50,000号每一秒。

Program is receiving approximately 50,000 numbers every second.

在任何时候,我需要计算最小值,最大值和平均值的值(数字)的是抵达最后一秒(关于向特定的时刻)。

At ANY given moment, I need to calculate minimum, maximum and average of the values (numbers) that arrived in the last second (regarding to given moment).

有没有办法做到这一点,而无需使用数组或列表(缓冲区)来存储到达数和计算结果?

Is there a way to do this without using array or list (buffer) to store arriving numbers and to calculate results?

如果我需要使用缓冲,这将是实现这一目标的有效途径?

If I need to use buffer, what would be the efficient way to achieve this?

(注意,从缓冲器号码也必须有效不时除去)

(Note that numbers from buffer also must be efficiently removed from time to time)

推荐答案

下面是一个算法,将在一定程度努力挽救在特定情况下的效率:

Here is an algorithm that will somewhat work to save efficiency in certain cases:

  1. 随着事件的进来,完全缓冲它们,并计算运行计数最高(微不足道)。

  1. As events come in, buffer them completely, and calculate a running sum, count, min, max (trivial).

在对平均最多<请求/ code>从缓冲的背面制成,循环并开始删除较旧超过一秒的值。从计数,当您去减。

When a request for average, min, or max is made, loop through from the back of the buffer and start removing values older than one second. Subtract from sum and count as you go.

  • 如果该值均大于你可以保持你的。如果值低于最高,你可以让你的最高。在这种情况下,你有平均最高高效地更新。

  • If the values are all above min you can keep your min. If the values are below max, you can keep your max. In this scenario, you have average, min, and max updated efficiently.

如果该值低于或以上最高,你会通过需要循环其余的阵列,并重新计算。

If the values are below min or above max, you'll need to loop through the rest of the array and recalculate it.

执行第二步一次,第二次左右也使缓冲区没有得到过满。这code,可以在每一个缓冲区插入执行还,或是其他地方是有道理的。

Do step two once a second or so also so that the buffer doesn't get too full. This code could be performed on every buffer insert also, or wherever made sense.

最适合做这种工作是一种​​循环缓冲区,以避免内存分配和GC的方式获得。它应当足够大以覆盖最坏的情况下为每秒消息大小

Best structure for this kind of work is a circular buffer, to avoid memory allocations and GC getting in the way. It should be large enough to cover the worst case scenario for message size per second.

更新

根据不同的使用场景一件事做的是运行以上,但在10×100ms的数据块,而不是1×1000ms的一块算法。也就是说,保持运行的最小值,最大值,总和与那些10块计数。然后,当你到达一个无效的情况下,你一般只需要浏览最新100ms的数据或快速穿过其他9块的最小值和最大值。

Depending on the usage scenario one other thing to do would be to run the algorithm above but in 10 x 100ms chunks rather than 1 x 1000ms piece. That is, keep the running min, max, sum and count on those 10 chunks. Then when you reach an 'invalidation' scenario, you generally only need to look through the latest 100ms of data or a quick pass through the min and max of the other 9 chunks.

@ ja72提供了一个很好的主意,以节省寻找最小和最大值,如果他们是无效的:

@ja72 provided a great idea to save on finding the min and max values if they are invalidated:

而不是保持敏/最大值x_min,x_max保持,而不是它们的位置在x [I]数组i_min和最大电流指标。然后发现他们可能是微不足道的,有时,但认为是最后值保持最小值和最大值,整个列表需要进行扫描,以建立新的限制。

Instead of keeping the min/max values x_min, x_max keep instead the index of where they are located in the x[i] array with i_min and i_max. Then finding them can be trivial sometimes, but when the last value considered holds the min and max, the entire list needs to be scanned to establish the new limits.

山姆持有人曾在评论另一个好主意 -​​ 保持平行排列,就是始终排序,这可以让你罗布泊号关闭顶部或底部找到新的最小值和最大值更容易。然而,在这里插入速度被泄露了一点(需要保持按顺序)。

Sam Holder had another good idea in the comments - keep a parallel array that is always sorted, this lets you lop numbers off the top or bottom to find new minimums and maximums easier. However, insert speed here is compromised a little (needs to remain in order).

最后,正确的选择将取决于计划的使用特性。多久值读取VS它们插入频率如何?

Ultimately, the right choice will depend on the usage characteristics of the program. How often will values be read vs how often they are inserted?

这篇关于快速计算最小,最大的和平均值来电号码的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆