我是否正确理解普罗米修斯的速率与增加函数? [英] Do I understand Prometheus's rate vs increase functions correctly?

查看:92
本文介绍了我是否正确理解普罗米修斯的速率与增加函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经仔细阅读了Prometheus 文档,但对我来说还是有点不清楚,所以我来这里是为了确认我的理解.

(请注意,为了尽可能简单的示例,我使用了 1 秒作为报废间隔、时间范围 - 即使在实践中不可能)

尽管我们每秒都报废一个计数器并且计数器的值现在是 30.为此,我们有以下时间序列:

手工计算的第二个 counter_value 增加(从现在起称之为 ICH)1 1 12 3 23 6 34 7 15 10 36 14 47 17 38 21 49 25 410 30 5

我们想对这个数据集运行一些查询.

1.rate()
官方文件指出:
rate(v range-vector) :计算范围向量中时间序列的每秒平均增长率."

通俗地说,这意味着我们将获得每秒的增量,并且给定秒的值将是给定范围内的平均增量?

我的意思是:
rate(counter[1s]):将匹配 ICH,因为将仅从一个值计算平均值.
rate(counter[2s]):将从 2 秒内的增量中获取平均值并将其分配到秒内
所以在前 2 秒我们得到了总计 3 的增量,这意味着平均值是 1.5/秒.最终结果:

第二个结果1 1,52 1,53 24 25 3,56 3,57 3,58 3,59 4,510 4,5

rate(counter[5s]):将在 5 秒内从增量中获取平均值并在秒之间分配它
与 [2s] 相同,但我们从 5 秒的总增量中计算平均值.最终结果:

第二个结果1 22 23 24 25 26 47 48 49 410 4

所以时间范围越大,我们得到的结果就越平滑.这些增加的总和将与实际计数器相匹配.

2.increase()
官方文件指出:
增加(v 范围向量):计算范围向量中时间序列的增加."

对我来说,这意味着它不会在秒之间分配平均值,而是会显示给定范围的单个增量(带外推).
increase(counter[1s]):在我看来,这将与 ICH 和 1s 的速率匹配,因为总范围和速率的基本粒度匹配.
increase(counter[2s]): 前 2 秒给了我们总共 3 的增量,所以2.seconds 会得到 3 的值,依此类推...

 第二个结果1 3*2 33 4*4 45 7*6 77 7*8 79 9*10 9

*在我看来,这些值是指每秒覆盖的外推值.

我理解得很好还是我离得很远?

解决方案

在理想世界中(样本的时间戳恰好在第二个,而规则评估恰好在第二个发生)rate(counter[1s]) 将准确返回您的 ICH 值,rate(counter[5s]) 将返回该 ICH 和前 4 个的平均值.除了第二个 1 的 ICH 是 0,而不是 1,因为没有人知道您的计数器何时为零:也许它就在那里增加了,也许昨天它增加了,从那以后一直保持在 1.(这就是为什么您不会在第一次出现值为 1 的计数器时看到增加的原因——因为您的代码刚刚创建并增加了它.)

increase(counter[5s]) 正是 rate(counter[5s]) * 5(和 increase(counter[2s])> 就是 rate(counter[2s]) * 2).

现在在现实世界中发生的情况是,您的样本不会在每一秒都准确地收集,并且规则评估也不会完全在第二秒发生.因此,如果您有一堆相隔(或多或少)1 秒的样本,并且您使用 Prometheus 的 rate(counter[1s]),您将不会得到任何输出.这是因为 Prometheus 所做的是获取 1 秒范围内的所有样本 [now() - 1s, now()](在绝大多数情况下这将是单个样本),尝试计算速率并失败.

如果你查询 rate(counter[5s]) OTOH,Prometheus 将选取 [now() - 5s, now] 范围内的所有样本(5 个样本,平均覆盖大约 4 秒,比如说 [t1, v1], [t2, v2], [t3, v3], [t4, v4], [t5, v5])和(假设你的计数器不会在间隔内重置)将返回 (v5 - v1)/(t5 - t1).IE.它实际上计算的是 ~4s 而不是 5s 的增长率.

increase(counter[5s]) 将返回 (v5 - v1)/(t5 - t1) * 5,所以增加的速度超过~4 秒,外推到 5 秒.

由于样本间隔不完全,rateincrease 通常会返回整数计数器的浮点值(这对于 rate,但对于 increase 来说不是那么多.

I have read the Prometheus documentation carefully, but its still a bit unclear to me, so I am here to get confirmation about my understanding.

(Please note that for the sake of the simplest examples possible I have used the one second for scrap interval, timerange - even if its not possible in practice)

Despite we scrap a counter in each second and the counter's values is 30 right now. We have the following timeseries for that:

second   counter_value    increase calculated by hand(call it ICH from now)
1             1                    1
2             3                    2
3             6                    3
4             7                    1
5            10                    3
6            14                    4
7            17                    3
8            21                    4
9            25                    4
10           30                    5

We want to run some query on this dataset.

1.rate()
Official document states:
"rate(v range-vector) : calculates the per-second average rate of increase of the time series in the range vector."

With a layman's terms this means that we will get the increase for every second and the value for the given second will be the average increment in the given range?

Here is what I mean:
rate(counter[1s]): will match ICH because average will be calculated from one value only.
rate(counter[2s]): will get the average from the increment in 2 sec and distribute it among the seconds
So in the first 2 second we got an increment of total 3 which means the average is 1.5/sec. final result:

second result
1       1,5
2       1,5
3        2
4        2
5       3,5
6       3,5
7       3,5
8       3,5
9       4,5
10      4,5

rate(counter[5s]): will get the average from the increment in 5 sec and distribute it among the seconds
The same as for [2s] but we calculate the average from total increment of 5sec. final result:

second result
1        2
2        2
3        2
4        2
5        2
6        4
7        4
8        4
9        4
10       4

So the higher the timerange the smoother result we will get. And the sum of these increase will match the actual counter.

2.increase()
Official document states:
"increase(v range-vector) : calculates the increase in the time series in the range vector."

For me this means it wont distribute the average among the seconds, but instead will show the single increment for the given range(with extrapolation).
increase(counter[1s]): In my term this will match with the ICH and the rate for 1s, just because the total range and rate's base granularity match.
increase(counter[2s]): First 2 seconds gave us an increment of 3 total,so 2.seconds will get the value of 3 and so on...

  second result   
    1        3*  
    2        3
    3        4*
    4        4
    5        7*
    6        7
    7        7*
    8        7
    9        9*
    10       9

*In my terms these values means the extrapolated values to cover every second.

Do I understand it well or am I far from that?

解决方案

In an ideal world (where your samples' timestamps are exactly on the second and your rule evaluation happens exactly on the second) rate(counter[1s]) would return exactly your ICH value and rate(counter[5s]) would return the average of that ICH and the previous 4. Except the ICH at second 1 is 0, not 1, because no one knows when your counter was zero: maybe it incremented right there, maybe it got incremented yesterday, and stayed at 1 since then. (This is the reason why you won't see an increase the first time a counter appears with a value of 1 -- because your code just created and incremented it.)

increase(counter[5s]) is exactly rate(counter[5s]) * 5 (and increase(counter[2s]) is exactly rate(counter[2s]) * 2).

Now what happens in the real world is that your samples are not collected exactly every second on the second and rule evaluation doesn't happen exactly on the second either. So if you have a bunch of samples that are (more or less) 1 second apart and you use Prometheus' rate(counter[1s]), you'll get no output. That's because what Prometheus does is it takes all the samples in the 1 second range [now() - 1s, now()] (which would be a single sample in the vast majority of cases), tries to compute a rate and fails.

If you query rate(counter[5s]) OTOH, Prometheus will pick all the samples in the range [now() - 5s, now] (5 samples, covering approximately 4 seconds on average, say [t1, v1], [t2, v2], [t3, v3], [t4, v4], [t5, v5]) and (assuming your counter doesn't reset within the interval) will return (v5 - v1) / (t5 - t1). I.e. it actually computes the rate of increase over ~4s rather than 5s.

increase(counter[5s]) will return (v5 - v1) / (t5 - t1) * 5, so the rate of increase over ~4 seconds, extrapolated to 5 seconds.

Due to the samples not being exactly spaced, both rate and increase will often return floating point values for integer counters (which makes obvious sense for rate, but not so much for increase).

这篇关于我是否正确理解普罗米修斯的速率与增加函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆