如何从java客户端正确使用Prometheus Histogram来跟踪大小而不是延迟? [英] How to correctly use Prometheus Histogram from java client to track size rather than latency?

查看:378
本文介绍了如何从java客户端正确使用Prometheus Histogram来跟踪大小而不是延迟?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个处理集合的 API.该API的执行时间与集合大小有关(集合越大,占用的时间越长).

I have an API that that processes collections. The execution time of this API is related to the collection size (the larger the collection, the more it will take).

我正在研究如何使用 prometheus 执行此操作,但不确定我是否正确执行操作(这方面的文档有点缺乏).

I am researching how can I do this with prometheus but am unsure whether I am doing things correctly (documentation is a bit lacking in this area).

我做的第一件事是定义一个摘要指标来衡量 API 的执行时间.我正在使用规范率(总和)/率(计数),如此处所述.

the first thing I did is define a Summary metric to measure execution time of the API. I am using the canonical rate(sum)/rate(count) as explained here.

现在,由于我知道延迟可能会受到输入大小的影响,我还想在平均执行时间上叠加请求 size.由于我不想测量每个可能的尺寸,我想我会使用直方图.像这样:

Now, since I know that the latency may be affected by the size of the input, I also want to overlay the request size on the avg execution time. Since I dont want to measure each possible size, I figured I'd use a histogram. Like so:

Histogram histogram = Histogram.build().buckets(10, 30, 50)
        .name("BULK_REQUEST_SIZE")
        .help("histogram of bulk sizes to correlate with duration")
        .labelNames("method", "entity")
        .register();

注意:术语大小"与以字节为单位的大小有关,而是与需要的集合的长度有关处理.2 项、5 项、50 项...

Note: the term 'size' does not relate to the size in bytes but to the length of the collection that needs to be processed. 2 items, 5 items, 50 items...

在执行中我做了(简化):

and in the execution I do (simplified):

@PUT
void process(Collection<Entity> entitiesToProcess, string entityName){
   Timer t = summary.labels("PUT_BULK", entityName).startTimer()

      // process...

   t.observeDuration();
   histogram.labels("PUT_BULK", entityName).observe(entitiesToProcess.size())
}

问题:

  • 稍后,当我查看 Grafana 中的 BULK_REQUEST_SIZE_bucket 时,我发现所有存储桶都具有相同的值,因此很明显我做错了什么.
  • 有没有更规范的方法来做到这一点?

推荐答案

您的代码是正确的(尽管 bulk_request_size_bytes 将是更好的度量名称).

Your code is correct (though bulk_request_size_bytes would be a better metric name).

问题很可能是您的存储桶不够理想,因为 10、30 和 50 字节对于大多数请求大小来说都非常小.我会尝试覆盖更多典型值的更大的存储桶尺寸.

The problem is likely that you've suboptimal buckets, as 10, 30 and 50 bytes are pretty small for most request sizes. I'd try larger bucket sizes that cover more typical values.

这篇关于如何从java客户端正确使用Prometheus Histogram来跟踪大小而不是延迟?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆