难以从石墨中获取准确的数字 [英] Having trouble getting accurate numbers from graphite

查看:123
本文介绍了难以从石墨中获取准确的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,它通过statsd向石墨发布许多统计信息.每当服务接收到一条消息时,其中一个统计信息就会简单地向statsd发送一个统计增量.我需要显示一个图表,显示此统计信息随时间推移的相对流量.一般而言,我应该能够显示一个图表,该图表每隔10秒刷新一次,并显示这10秒内收到的邮件数量以及给定时间段内的历史记录.但是,无论我如何格式化API查询,我似乎都无法获得准确的数据.我已经阅读了包括该文章在内的许多文章:

I have an application that publishes a number of stats to graphite via statsd. One of the stats simply sends a stat increment to statsd every time a message is received by the service. I need to display a graph that shows the the relative traffic over time for this stat. Generally speaking, I should be able to display a graph that refreshes every, say 10 seconds, and displays how many messages were recived in those 10 seconds as well as the history for a given period of time. However, no matter how I format my API query I cannot seem to get accurate data. I've read a number of articles including this one:

http://code.hootsuite.com/accurate-counting -with-graphite-and-statsd/

这似乎可以提供一些很好的见解,但仍然不能完全满足我的需求.这是我关闭的时间:

That seems to give some good insight but is still not quite giving me what I need. this is the closes I have come:

integral(hitcount(stats.recieved,"10seconds"))

integral(hitcount(stats.recieved, "10seconds"))

但是,我不喜欢这样做的累加结果,当我运行它时,我得到的统计数据与我收到的消息的日志中所看到的相差无几.我可以接受一些数据包丢失,但我们谈论的是数量级.我知道我做错了.只是希望有人可以让我对什么有所了解.

However, I don't like the cumulative result of this and when I run this I get statistics that come nowhere near to what I see n my logs for messages received. I am ok with accepting some packet loss but we talking about orders of magnitude. I know I am doing something wrong. Just hoping someone can give me some insight as to what.

推荐答案

要检查/尝试的几件事:

A couple of things to check/try:

为Statsd配置石墨

检查并确保已使用保留架构聚合设置在Graphite中与Statsd发送数据的方式匹配(即,每10秒刷新间隔发送一个数据点).

Check to make sure that you've used the retention schema and aggregation settings in Graphite that match how Statsd will be sending data (i.e. it sends one data point per 10 second flush interval).

运行单个Statsd聚合器

请确保仅运行一个Statsd实例,因为运行多个statsd守护程序会导致指标被丢弃(因为Graphite的精度最高为10s:6h,因此将其配置为仅存储一个数据点)

Be sure you are only running one instance of Statsd as running multiple statsd daemons will cause metrics to be dropped (as Graphite will be configured to only store one data point for it's highest precision of 10s:6h)

将UI或URL API中的时间范围限制为少于6小时

当显示包含超过6小时阈值的数据的图表时(例如,从现在到7个小时之前),您将开始看到所显示图表的价值1分钟的汇总计数数据(如果您已将Graphite配置为statsd retentions = 10s:6h,1min:7d,10min:5y).汇总将根据时间范围内最早的数据点发生(例如,直到7天以上=您将获得10分钟的汇总).

When displaying graphs with data that crosses over the 6 hour threshold (e.g. from now to 7 hours ago), you will begin seeing 1 minute worth of aggregated count data for the displayed graph (if you've configured Graphite for statsd with retentions = 10s:6h,1min:7d,10min:5y). Rollups will occur based on the oldest data point in the time range (e.g. now till 7+ days = you'll get 10 min rollups).

如果发送稀疏或突发"数据并显示旧时间范围(触发聚合)

确认您的xFilesFactor足够低,以至于聚合会产生非null值,即使null的比率很高.例如,在前10秒内有100个请求,而在一分钟内剩余的50秒内没有一个请求,将导致存储100, null, null, null, null, null;如果XFilesFactor高于1/6,则在数据老化时,该总和将为null.使用statsd建议的石墨配置可以解决此问题,但是很高兴知道...,因为这样可能会导致丢失数据的情况.

Confirm that your xFilesFactor is low enough that aggregation produces non null values even with a high rate of nulls. For example, 100 requests in the first 10 seconds, and none for the remaining 50 seconds in a minute would cause a storage of 100, null, null, null, null, null which would be summed up to null when the data ages if the XFilesFactor is higher than 1/6. Using the statsd recommended graphite configuration handles this, but it is good to know about... as this can give the appearance of lost data.

保存架构或聚合更改

如果在存储任何度量后(以低语=石墨的存储方式)更改了石墨模式或聚合设置,则需要删除该度量的.wsp文件(石墨将重新创建它们)或运行whisper-resize.py.

If you changed the graphite schema or aggregation settings after any metrics were stored (in whisper = graphite's storage) you'll need to either delete the .wsp files for the metric (graphite will recreate them) or run whisper-resize.py.

验证设置

您可以通过在.wsp文件上运行whisper-info.py来针对某些耳语数据验证设置.在/graphite/storage/whisper/中找到其中一个指标的.wsp文件 运行:whisper-info.py my_metric_data.wsp. whisper-info.py的输出应该告诉您有关存储设置如何工作的更多信息.

You can verify the settings against some whisper data by running whisper-info.py on a .wsp file. Find the .wsp file for one of your metrics in /graphite/storage/whisper/ Run: whisper-info.py my_metric_data.wsp. whisper-info.py output should tell you more about how the storage settings are working.

对于来自StatsD的指标,应确保将Graphite设置为每10秒间隔存储一个数据点.您应确保Graphite对来自Statsd的计数数据求和(而不是求平均值).可以使用推荐的 Statsd配置设置来处理这两种情况.不要运行多个Statsd聚合器.使用UI时,将返回的数据限制为少于6小时,或者在查看超过保留阈值的数据时了解正在查看的汇总.最后,请确保设置已生效(如果您已经在发送指标).

You should ensure that Graphite is set to store one data point per 10 second interval for metrics coming from StatsD. You should make sure that Graphite is summing (not averaging) for count data coming from Statsd. Both of these can be handled by using the recommended Statsd configuration settings. Don't run more than one Statsd aggregator. When using the UI, limit the data returned to less than 6 hours OR understand what rollup you are viewing when looking at data that crosses retention thresholds. Lastly, make sure the settings take (if you've already been sending metrics).

这篇关于难以从石墨中获取准确的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆