快速算法百分重复计算？ [英] Fast algorithm for repeated calculation of percentile?

查看：167 发布时间：2015/11/30 14:14:09 algorithm optimization data-structures percentile

本文介绍了快速算法百分重复计算？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在一个算法我要计算一个数据集的第75百分位每当我添加一个值。现在我这样做：

获取值 X
插入 X 在后面的已排序数组
交换 X ，直到数组排序
阅读元素在位置阵列[array.size * 3/4]

点3为O（n），其余的是O（1），但是这仍然是很慢，特别是如果该数组变大。有什么办法来优化这个？

更新

感谢尼基塔！由于我使用C ++，这是最简单的解决方案来实现。这里是code：

 模板＆LT;类T＆GT;
类IterativePercentile {
上市：
  ///百分必须处于范围[0，1（
  IterativePercentile（双百分）
    ：_percentile（百分）
  {}

  //添加许多在O（日志（N））
  空加（const的T＆放大器; X）{
    如果（_lower.empty（）|| X  - 其中= _lower.front（））{
      _lower.push_back（X）;
      的std :: push_heap（_lower.begin（），_lower.end（），标准::少＆LT; T＆GT;（））;
    } 其他 {
      _upper.push_back（X）;
      的std :: push_heap（_upper.begin（），_upper.end（），性病::更大＆LT; T＆GT;（））;
    }

    无符号size_lower =（无符号）（（_ lower.size（）+ _upper.size（））* _percentile）+ 1;
    如果（_lower.size（）＆GT; size_lower）{
      //下往上
      的std :: pop_heap（_lower.begin（），_lower.end（），标准::少＆LT; T＆GT;（））;
      _upper.push_back（_lower.back（））;
      的std :: push_heap（_upper.begin（），_upper.end（），性病::更大＆LT; T＆GT;（））;
      _lower.pop_back（）;
    }否则，如果（_lower.size（）＆LT; size_lower）{
      //上，以降低
      的std :: pop_heap（_upper.begin（），_upper.end（），性病::更大＆LT; T＆GT;（））;
      _lower.push_back（_upper.back（））;
      的std :: push_heap（_lower.begin（），_lower.end（），标准::少＆LT; T＆GT;（））;
      _upper.pop_back（）;
    }
  }

  ///访问百分位在O（1）
  常量T＆放大器;得到（）const的{
    返回_lower.front（）;
  }

  无效明确（）{
    _lower.clear（）;
    _upper.clear（）;
  }

私人：
  双_percentile;
  的std ::矢量＆lt; T＆GT; _降低;
  的std ::矢量＆lt; T＆GT; _上;
};

解决方案

您可以用两个的堆的。不知道是否有一个不太做作的解决方案，但是这一次提供了 O（LOGN）时间复杂度和堆也被包括在大多数编程语言的标准库。

首先堆（堆）中包含最小75％的元素，另外一个堆（堆B） - 其余的（最大25％）。第一个具有在顶部最大元素，第二个 - 最小

添加元素。

查看是否有新的元素 X 为＆lt; = 最大值（A）。如果是，将其添加到堆 A ，否则 - 堆 B
现在，如果我们加入 X 来堆，并成为太大（包含元素的75％以上），我们需要从删除最大元素A （O（LOGN）），并把它添加到堆B（也是O（LOGN））。
如果堆B变为太大类似。

<醇开始=2>

查找0.75位数

就拿从A中的最大元素（或最小距离B）。需要O（LOGN）和O（1）时间，这取决于堆实现。

修改
由于海豚指出，我们需要指定precisely每堆究竟应该多大，每N（如果我们想precise答案）。例如，如果尺寸（A）=地板（N * 0.75）和尺寸（B）就是休息，那么，对每一 N'GT; 0 ，阵列[array.size * 3/4] =分钟（B）。

In an algorithm I have to calculate the 75th percentile of a data set whenever I add a value. Right now I am doing this:

Get value x
Insert x in an already sorted array at the back
swap x down until the array is sorted
Read the element at position array[array.size * 3/4]

Point 3 is O(n), and the rest is O(1), but this is still quite slow, especially if the array gets larger. Is there any way to optimize this?

UPDATE

Thanks Nikita! Since I am using C++ this is the solution easiest to implement. Here is the code:

template<class T>
class IterativePercentile {
public:
  /// Percentile has to be in range [0, 1(
  IterativePercentile(double percentile)
    : _percentile(percentile)
  { }

  // Adds a number in O(log(n))
  void add(const T& x) {
    if (_lower.empty() || x <= _lower.front()) {
      _lower.push_back(x);
      std::push_heap(_lower.begin(), _lower.end(), std::less<T>());
    } else {
      _upper.push_back(x);
      std::push_heap(_upper.begin(), _upper.end(), std::greater<T>());
    }

    unsigned size_lower = (unsigned)((_lower.size() + _upper.size()) * _percentile) + 1;
    if (_lower.size() > size_lower) {
      // lower to upper
      std::pop_heap(_lower.begin(), _lower.end(), std::less<T>());
      _upper.push_back(_lower.back());
      std::push_heap(_upper.begin(), _upper.end(), std::greater<T>());
      _lower.pop_back();
    } else if (_lower.size() < size_lower) {
      // upper to lower
      std::pop_heap(_upper.begin(), _upper.end(), std::greater<T>());
      _lower.push_back(_upper.back());
      std::push_heap(_lower.begin(), _lower.end(), std::less<T>());
      _upper.pop_back();
    }            
  }

  /// Access the percentile in O(1)
  const T& get() const {
    return _lower.front();
  }

  void clear() {
    _lower.clear();
    _upper.clear();
  }

private:
  double _percentile;
  std::vector<T> _lower;
  std::vector<T> _upper;
};

解决方案

You can do it with two heaps. Not sure if there's a less 'contrived' solution, but this one provides O(logn) time complexity and heaps are also included in standard libraries of most programming languages.

First heap (heap A) contains smallest 75% elements, another heap (heap B) - the rest (biggest 25%). First one has biggest element on the top, second one - smallest.

Adding element.

See if new element x is <= max(A). If it is, add it to heap A, otherwise - to heap B.
Now, if we added x to heap A and it became too big (holds more than 75% of elements), we need to remove biggest element from A (O(logn)) and add it to heap B (also O(logn)).
Similar if heap B became too big.

Finding "0.75 median"

Just take the largest element from A (or smallest from B). Requires O(logn) or O(1) time, depending on heap implementation.

edit
As Dolphin noted, we need to specify precisely how big each heap should be for every n (if we want precise answer). For example, if size(A) = floor(n * 0.75) and size(B) is the rest, then, for every n > 0, array[array.size * 3/4] = min(B).

这篇关于快速算法百分重复计算？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

快速算法百分重复计算？ [英] Fast algorithm for repeated calculation of percentile?

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

快速算法百分重复计算？ [英] Fast algorithm for repeated calculation of percentile?

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭