获取准确的stats_counts [英] Getting accurate graphite stats_counts

查看:76
本文介绍了获取准确的stats_counts的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在运行etsy/statsd节点应用程序,该应用程序每10秒将统计数据刷新为Carbon/Whisper.如果您发送100个增量(计数),则在开始的10秒钟内,石墨会正确显示它们,例如:

We have etsy/statsd node application running that flushes stats to carbon/whisper every 10 seconds. If you send 100 increments (counts), in the first 10 seconds, graphite displays them properly, like:

localhost:3000/render?from=-20min&target=stats_counts.test.count&format=json

[{"target": "stats_counts.test.count", "datapoints": [
 [0.0, 1372951380], [0.0, 1372951440], ... 
 [0.0, 1372952460], [100.0, 1372952520]]}]

但是,在10秒后,该数字降为0,null和或33.3.最终,它稳定在初始增量数的1/6处,在这种情况下为16.6.

However, 10 seconds later, and this number falls to 0, null and or 33.3. Eventually it settles at a value 1/6th of the initial number of increments, in this case 16.6.

/opt/graphite/conf/storage-schemas.conf是:

[sixty_secs_for_1_days_then_15m_for_a_month]
pattern = .*
retentions = 10s:10m,1m:1d,15m:30d

我想获得准确的计数,石墨是在60秒窗口内对数据求平均值,而不是求和吗?使用积分函数,经过一段时间后,显然会给出:

I would like to get accurate counts, is graphite averaging the data over the 60 second windows rather than summing it perhaps? Using the integral function, after some time has passed, obviously gives:

localhost:3000/render?from=-20min&target=integral(stats_counts.test.count)&format=json

[{"target": "stats_counts.test.count", "datapoints": [
 [0.0, 1372951380], [16.6, 1372951440], ... 
 [16.6, 1372952460], [16.6, 1372952520]]}]

推荐答案

Graphite使用存储在 storage-schemas.conf storage-aggregation.conf 中的设置的组合来管理数据的保留.我看到您的保留策略(来自storage-schemas.conf的片段)告诉Graphite仅以其最高分辨率(例如10s:10m)存储1个数据点,并且应该将这些数据点作为数据进行汇总管理会老化并进入较旧的时间间隔(已定义较低的分辨率,例如1m:1d).对于您而言,数据将在10分钟时进入下一个保留间隔,而10分钟后,数据将根据storage-aggregation.conf中的设置进行汇总.

Graphite manages the retention of data using a combination of the settings stored in storage-schemas.conf and storage-aggregation.conf. I see that your retention policy (the snippet from your storage-schemas.conf) is telling Graphite to only store 1 data point for it's highest resolution (e.g.10s:10m) and that it should manage the aggregation of those data points as the data ages and moves into the older intervals (with the lower resolution defined - e.g. 1m:1d). In your case, the data crosses into the next retention interval at 10 minutes, and after 10 minutes the data will roll up according the settings in the storage-aggregation.conf.

聚合/下采样

当数据老化并且落入指定了较低分辨率保留的时间间隔时,就会发生聚合/下采样.在您的情况下,您将每10秒间隔存储1个数据点,但是一旦数据超过10分钟,现在的石墨现在将在1分钟间隔内将数据存储为1个数据点.这意味着您必须告诉石墨如何处理10秒数据点(其中一分钟有6个数据点),并在整分钟内将它们聚合为1个数据点.应该平均吗?应该加起来吗?正如您在帖子中所暗示的那样,根据数据的类型(例如时间,计数器),这可能会有很大的不同.

Aggregation/downsampling happens when data ages and falls into a time interval that has lower resolution retention specified. In your case, you'll have been storing 1 data point for each 10 second interval but once that data is over 10 minutes old graphite now will store the data as 1 data point for a 1 minute interval. This means you must tell graphite how it should take the 10 second data points (of which you have 6 for the minute) and aggregate them into 1 data point for the entire minute. Should it average? Should it sum? Depending on the type of data (e.g. timing, counter) this can make a big difference, as you hinted at in your post.

默认情况下,石墨将汇总为较低分辨率的数据时将对数据进行平均.将平均值应用于计时器(甚至指标)数据时,使用平均值进行聚合是有意义的.就是说,您正在处理计数器,所以您想 sum .

By default graphite will average data as it aggregates into lower resolution data. Using average to perform the aggregation makes sense when applied to timer (and even gauge) data. That said, you are dealing with counters so you'll want to sum.

例如,在storage-aggregation.conf中:

For example, in storage-aggregation.conf:

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

UI(和原始数据)聚合/下采样

理解数据视图或查看不同时间段的原始(json)数据时,如何表示聚合/下采样的数据也很重要,因为数据保留架构阈值会直接影响图形.对于您而言,您要查询的是render?from=-20min,它跨越了10s:10m的边界.

It is also important to understand how the aggregated/downsampled data is represented when viewing a graph or looking at raw (json) data for different time periods, as the data retention schema thresholds directly impact the graphs. In your case you are querying render?from=-20min which crosses your 10s:10m boundary.

Graphite将根据定义的最低分辨率精度显示(并执行实时下采样)数据.换句话说,这意味着如果您对跨越一个或多个保留间隔的数据进行图形化处理,则会相应地获得汇总.一个示例将有所帮助(假设保留:保留= 10s:10m,1m:1d,15m:30d)

Graphite will display (and perform realtime downsampling of) data according to the lowest-resolution precision defined. Stated another way, it means if you graph data that spans one or more retention intervals you will get rollups accordingly. An example will help (assuming the retention of: retentions = 10s:10m,1m:1d,15m:30d)

任何数据不超过最近10分钟的图形都将显示10秒聚合.当您超过10分钟的阈值时,您将开始看到根据storage-aggregation.conf中设置的策略累积的1分钟计数数据.

Any graph with data no older than the last 10 minutes will be displaying 10 second aggregations. When you cross the 10 minute threshold, you will begin seeing 1 minute worth of count data rolled up according to the policy set in the storage-aggregation.conf.

摘要/tldr;

由于您要图形/查询20分钟的数据(例如render?from=-20min),因此您肯定会陷入精度较低的存储设置(例如10s:10m, 1m:1d ,15m): 30d),这表示正在根据您的聚合政策进行聚合. 您应该确认在storage-aggregation.conf文件中使用的正确模式是sum.此外,您可以将图形/查询时间范围缩短到少于10分钟,这样可以避免动态变化卷起.

Because you are graphing/querying for 20 minutes worth of data (e.g. render?from=-20min) you are definitely falling into a lower precision storage setting (i.e. 10s:10m,1m:1d,15m:30d) which means that aggregation is occurring according to your aggregation policy. You should confirm that you are using sum for the correct pattern in the storage-aggregation.conf file. Additionally, you can shorten the graph/query time range to less than 10min which would avoid the dynamic rollup.

这篇关于获取准确的stats_counts的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆