实施A"而在过去的[秒/分/小时] QUOT命中;数据结构 [英] Implementation of a "hits in last [second/minute/hour]" data structure

查看:121
本文介绍了实施A"而在过去的[秒/分/小时] QUOT命中;数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我觉得这是一个相当普遍的问题,但我似乎无法通过谷歌搜索四处寻找答案(也许有对这个问题,我不知道一个更precise的名字吗?)

您需要实现一个结构,用于报告命中和hitsInLastSecond一个打()方法|分|小时的方法。你有说纳秒级精度的定时器。你如何有效地实现这一点?

我的想法是这样的事情(在伪C ++)

 类隐私权|发布{
  虚空击(){
    hits_at [NOW()] = ++ last_count;
  }

  INT hitsInLastSecond(){
    汽车before_count = hits_at.lower_bound(NOW() -  1 *秒)
    如果(before_count == hits_at.end()){返回last_count; }
    返回last_count  -  before_count->第二个;
  }

  //等为分钟,小时

  地图< time_point,INT> hits_at;
  INT last_count = 0;
};
 

工作的呢?好吗?是更好的东西?

更新:增加了修剪,并切换到双端队列为每个评论:

 类隐私权|发布{
  虚空击(){
    hits.push_back(make_pair(NOW(),++ last_count));
  }

  INT hitsInLastSecond(){
    汽车前= LOWER_BOUND(hits.begin(),hits.end(),make_pair(NOW() -  1 *秒,-1));
    如果{返回last_count(== hits.end()之前); }
    返回last_count  -  before_count->第二个;
  }

  //等为分钟,小时

  无效修剪(){
    汽车旧= UPPER_BOUND(hits.begin()hits.end(),make_pair(现 -  1 *小时,-1));
    如果(老!= hits.end()){
      hits.erase(hits.begin(),旧的)
    }
  }

  deqeue<对< time_point,INT>>命中;
  INT last_count = 0;
};
 

解决方案

你所描述的被称为柱状图。什么

使用哈希值,如果你打算纳秒的精度,会吃掉很多CPU的。你可能需要一个环形缓冲区,用于存储数据。

使用的std ::时辰,以达到你所需要的时间precision,但坦率地说命中每秒好像你需要的,如果你正在寻找整体大局的最高粒度,它似乎并不像它会事情非常的precision是什么。

这是一个你可以去了解它的部分,介绍样本:

 的#include<阵>
#包括<算法>

模板<为size_t RingSize>
类直方图
{
    的std ::阵列<为size_t,RingSize> m_ringBuffer;
    为size_t m_total;
    为size_t m_position;
上市:
    直方图():m_total(0)
    {
        的std :: fill_n(m_ringBuffer.begin(),RingSize,0);
    }

    无效addHit()
    {
        ++ m_ringBuffer [m_position]
        ++ m_total;
    }

    无效incrementPosition()
    {
        如果(++ m_position> = RingSize)
            m_position = 0;
        m_total  -  = m_ringBuffer [m_position]
        m_ringBuffer [m_position] = 0;
    }

    双runningAverage()const的
    {
        返程(双)m_total /(双)RingSize;
    }

    为size_t runningTotal在()const的{返回m_total; }
};

直方图&所述; 60> secondsHisto;
直方图&所述; 60> minutesHisto;
直方图&所述; 24每个hoursHisto;
直方图&其中7个; weeksHisto;
 

这是一个天真的实现,它假定你将调用它的每一秒和增量的位置,并且将转runningTotal在从一个直方图下每RingSize(所以每60秒,secondsHisto.runningTotal添加到minutesHisto)。

希望这将是一个有益的介绍的地方,从开始。

如果你想跟踪每秒点击一个较长的直方图,可以做到这一点与这个模型中,通过增加环的大小,添加第二个总跟踪最后N个环形缓冲区条目,使m_subTotal = SUM(m_ringBuffer [m_position - N .. m_position]),类似的方式m_total作品

 为size_t m_10sTotal;

...

无效addHit()
{
    ++ m_ringBuffer [m_position]
    ++ m_total;
    ++ m_10sTotal;
}

无效incrementPosition()
{
    //从&GT减去数据; 10个采样间隔前。
    m_10sTotal  -  = m_ringBuffer [(m_position + RingBufferSize  -  10)%RingBufferSize]。
    //为幼稚总,做减法后,我们
    //前进位置,因为它会与重合
    //值RingBufferSize前的位置。
    如果(++ m_position> = RingBufferSize)
        m_position = 0;
    m_total  -  = m_ringBuffer [m_position]
}
 

您不必使组织相容克这些大小,这简直就是幼稚刮模型。有各种替换,诸如递增每个直方图同时

  secondsHisto.addHit();
minutesHisto.addHit();
hoursHisto.addHit();
weeksHisto.addHit();
 

每个滑过独立的,因此,所有具有当前值。大小各组织相容,只要你想在这个粒度的数据要追溯到。

I think this is a fairly common question but I can't seem to find answer by googling around (maybe there's a more precise name for the problem I don't know?)

You need to implement a structure with a "hit()" method used to report a hit and hitsInLastSecond|Minute|Hour methods. You have a timer with say nanosecond accuracy. How do you implement this efficiently?

My thought was something like this (in psuedo-C++)

class HitCounter {
  void hit() {
    hits_at[now()] = ++last_count;
  }

  int hitsInLastSecond() {
    auto before_count = hits_at.lower_bound(now() - 1 * second)
    if (before_count == hits_at.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  map<time_point, int> hits_at;
  int last_count = 0;
};

Does this work? Is it good? Is something better?

Update: Added pruning and switched to a deque as per comments:

class HitCounter {
  void hit() {
    hits.push_back(make_pair(now(), ++last_count));
  }

  int hitsInLastSecond() {
    auto before = lower_bound(hits.begin(), hits.end(), make_pair(now() - 1 * second, -1));
    if (before == hits.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  void prune() {
    auto old = upper_bound(hits.begin(). hits.end(), make_pair(now - 1 * hour, -1));
    if (old != hits.end()) {
      hits.erase(hits.begin(), old)
    }
  }

  deqeue<pair<time_point, int>> hits;
  int last_count = 0;
};

解决方案

What you are describing is called a histogram.

Using a hash, if you intend nanosecond accuracy, will eat up much of your cpu. You probably want a ring buffer for storing the data.

Use std::chrono to achieve the timing precision you require, but frankly hits per second seems like the highest granularity you need and if you are looking at the overall big picture, it doesn't seem like it will matter terribly what the precision is.

This is a partial, introductory sample of how you might go about it:

#include <array>
#include <algorithm>

template<size_t RingSize>
class Histogram
{
    std::array<size_t, RingSize> m_ringBuffer;
    size_t m_total;
    size_t m_position;
public:
    Histogram() : m_total(0)
    {
        std::fill_n(m_ringBuffer.begin(), RingSize, 0);
    }

    void addHit()
    {
        ++m_ringBuffer[m_position];
        ++m_total;
    }

    void incrementPosition()
    {
        if (++m_position >= RingSize)
            m_position = 0;
        m_total -= m_ringBuffer[m_position];
        m_ringBuffer[m_position] = 0;
    }

    double runningAverage() const
    {
        return (double)m_total / (double)RingSize;
    }

    size_t runningTotal() const { return m_total; }
};

Histogram<60> secondsHisto;
Histogram<60> minutesHisto;
Histogram<24> hoursHisto;
Histogram<7> weeksHisto;

This is a naive implementation which assumes you will call it every second and increment the position, and will transpose runningTotal from one histogram to the next every RingSize (so every 60s, add secondsHisto.runningTotal to minutesHisto).

Hopefully it will be a useful introductory place to start from.

If you want to track a longer histogram of hits per second, you can do that with this model, by increasing the ring size, add a second total to track the last N ring buffer entries, so that m_subTotal = sum(m_ringBuffer[m_position - N .. m_position]), similar to the way m_total works.

size_t m_10sTotal;

...

void addHit()
{
    ++m_ringBuffer[m_position];
    ++m_total;
    ++m_10sTotal;
}

void incrementPosition()
{
    // subtract data from >10 sample intervals ago.
    m_10sTotal -= m_ringBuffer[(m_position + RingBufferSize - 10) % RingBufferSize];
    // for the naive total, do the subtraction after we
    // advance position, since it will coincide with the
    // location of the value RingBufferSize ago.
    if (++m_position >= RingBufferSize)
        m_position = 0;
    m_total -= m_ringBuffer[m_position];
}

You don't have to make the histo grams these sizes, this is simply a naive scraping model. There are various alternatives, such as incrementing each histogram at the same time:

secondsHisto.addHit();
minutesHisto.addHit();
hoursHisto.addHit();
weeksHisto.addHit();

Each rolls over independently, so all have current values. Size each histo as far as you want data at that granularity to go back.

这篇关于实施A&QUOT;而在过去的[秒/分/小时] QUOT命中;数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆