如何将事件计数值与上一个时间间隔事件进行比较 [英] How to compare event count value with previous time interval event

查看:87
本文介绍了如何将事件计数值与上一个时间间隔事件进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找是否可以将当前一小时间隔的事件总数与事件总数进行比较前一个小时间隔,如果当前小时计数小于,则应从触发一封电子邮件Riemann

I am looking for whether I can compare the total number of event count for the current one hr interval with the total number of event count with the previous one hour interval and if the current hour count is less than previous hour count then one email should get triggered from Riemann.

我不确定是否可以存储该值并将其与当前事件值进行比较,因为我了解到事件将到期到Riemann中的 TTL 选项。

I am not sure whether we can store the value and compare it with the current event value because I learned events will get expired due to TTL option in Riemann.

如果我输入错误,请更正我,并建议我参考代码来实现在 Riemann 中。

Please correct me if I am wrong and suggest me a reference code to achieve it in Riemann.

预先感谢

推荐答案

听起来您想要一个小时的计数变化率,然后确定该变化率是否为负?做到这一点的一种方法就像您描述的那样:

It sounds like you want the rate of change of the count over an hour and then to decide if that rate is negative? One way to do this is just as you describe:

(fold-interval-metric 3600 folds/count                        
   (fixed-event-window 2
    (smap folds/difference
          (where (neg? (:metric event))
                 email))))

这很有意义。您可能会发现,如果使用内置的随时间推移的导数函数 ddt 并对其进行绘图,则可以在更短的时间范围内发现这些问题。如果您的成功率在每小时三分钟的三分钟内降为零,则计算机需要等待较长的时间才能等待57分钟。如果15分钟内的变化率接近负无穷大,则很有可能您的服务刚刚停止。

and this makes sense. You may find that if you use the built in derivative over time function ddt that and graph it you can spot these problems over much shorter timescales. If your success rate falls to zero on minute three of an hour 57 minutes is a long time for the computer to wait before it calls a human for help. If the rate of change on a 15 minute period approches negative infinity it's very likely that your service just stopped.

我喜欢将 ddt 包装在指数加权移动平均值 ewma ,因此峰值不会触发警报,并且使用这种模式的误报率极低:

I'm fond of wrapping ddt in the exponential weighted moving average ewma so spikes don't set off the alarms and have had an extremely low false positive rate with this pattern:

(ewma 30 (ddt ...your stuff here...))

我经常想使用 ewma ddt project

 (pipe ↲ (splitp = service
               "service:input" (ewma 30 ↲)
               "service:output" (ewma 30 ↲)
               bit-bucket) ;; throw out other services here
     (project [(service "service:input")
               (service "service:output")]
              (smap folds/quotient-sloppy
                    (with :service "service-ratio-rate-of-change"
                          (ddt ...your streams here...)))))

如果请求很少,您将需要在所有这些示例中使用间隔,以确保事件之间不会发出警报。如果您的事件很少发生,则可能还需要将:ttl设置为足够高的事件,以使它们在您进行捕获时不会过期。

If requests are infrequent you will need to play with the interval in all these examples to ensure that the alarms don't go off between events. If your events are infrequent you may also need to set the :ttl on the events high enough that they don't expire while you are agrigating them.

ps:the可以是您想要的任何符号,我只是选择了unicode字符。

pss:假阳性率是如果您仔细考虑这些事情,每季度一个警报应该是合理的。

ps: the ↲ can be any symbol(s) you want, I just chose that unicode character.
pss: a false posative rate of one alarm per quarter should be reasonable if you consider these things carefully.

这篇关于如何将事件计数值与上一个时间间隔事件进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆