Prometheus 中的增加()有时会使值加倍:如何避免? [英] increase() in Prometheus sometimes doubles values: how to avoid?
问题描述
我发现对于某些图表,我从 Prometheus 获得了 doubles 值,而这些值应该只是一个:
I've found that for some graphs I get doubles values from Prometheus where should be just ones:
我使用的查询:
increase(signups_count[4m])
抓取间隔设置为推荐的最大值2 分钟.
Scrape interval is set to the recommended maximum of 2 minutes.
如果我查询存储的实际数据:
If I query the actual data stored:
curl -gs 'localhost:9090/api/v1/query?query=(signups_count[1h])'
"values":[
[1515721365.194, "579"],
[1515721485.194, "579"],
[1515721605.194, "580"],
[1515721725.194, "580"],
[1515721845.194, "580"],
[1515721965.194, "580"],
[1515722085.194, "580"],
[1515722205.194, "581"],
[1515722325.194, "581"],
[1515722445.194, "581"],
[1515722565.194, "581"]
],
我看到只有两次增加.事实上,如果我查询这些时间,我会看到预期的结果:
I see that there were just two increases. And indeed if I query for these times I see an expected result:
curl -gs 'localhost:9090/api/v1/query_range?step=4m&query=increase(signups_count[4m])&start=1515721965.194&end=1515722565.194'
"values": [
[1515721965.194, "0"],
[1515722205.194, "1"],
[1515722445.194, "0"]
],
但是 Grafana(和 GUI 中的 Prometheus)倾向于在查询中设置不同的step
,对于不熟悉 Prometheus 内部工作的人来说,我得到了一个非常意外的结果.
But Grafana (and Prometheus in the GUI) tends to set a different step
in queries, with which I get a very unexpected result for a person unfamiliar with internal workings of Prometheus.
curl -gs 'localhost:9090/api/v1/query_range?step=15&query=increase(signups_count[4m])&start=1515721965.194&end=1515722565.194'
... skip ...
[1515722190.194, "0"],
[1515722205.194, "1"],
[1515722220.194, "2"],
[1515722235.194, "2"],
... skip ...
知道 increase()
只是 arate()
函数 的特定用例的语法糖,我想这就是在特定情况下它应该如何工作.
Knowing that increase()
is just a syntactic sugar for a specific use-case of the rate()
function, I guess this is how it is supposed to work given the circumstances.
如何避免这种情况?大多数时候,我如何让 Prometheus/Grafana 向我展示一个对一个,两个对两个?除了增加抓取间隔(这将是我的最后手段).
How to avoid such situations? How do I make Prometheus/Grafana show me ones for ones, and twos for twos, most of the time? Other than by increasing the scrape interval (this will be my last resort).
我了解 Prometheus 不是精确的某种工具,所以如果我不是在任何时候都有一个好的数字,但大多数时候我都可以.
I understand that Prometheus isn't an exact sort of tool, so it is fine with me if I would have a good number not at all times, but most of the time.
我还缺少什么?
推荐答案
这被称为 别名 并且是信号处理中的一个基本问题.您可以通过提高采样率来改善这一点,4m 范围与 2m 范围有点短.尝试 10m 的范围.
This is known as aliasing and is a fundamental problem in signal processing. You can improve this a bit by increasing your sample rate, a 4m range is a bit short with a 2m range. Try a 10m range.
例如,在 1515722220 执行的查询仅看到 580@1515722085.194 和 581@1515722205.194 样本.这是在 2 分钟内增加了 1,在 4 分钟内推断增加了 2 - 这是预期的.
Here for example the query executed at 1515722220 only sees the 580@1515722085.194 and 581@1515722205.194 samples. That's an increase of 1 over 2 minutes, which extrapolated over 4 minutes is an increase of 2 - which is as expected.
任何基于指标的监控系统都会有类似的工件,如果您想要 100% 的准确度,则需要日志.
Any metrics-based monitoring system will have similar artifacts, if you want 100% accuracy you need logs.
这篇关于Prometheus 中的增加()有时会使值加倍:如何避免?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!