基于统计信息获取cassandra中的数据点 [英] Fetching datapoint in cassandra based on statistics

查看:264
本文介绍了基于统计信息获取cassandra中的数据点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我测试Cassandra(2.0)作为存储我们的时间序列数据的可能替换。



我做了一个简单的表,并转储了一些数据到其中:

  CREATE TABLE DataRaw(
channelId int,
sampleTime timestamp,
value double,
PRIMARY KEY(channelId,sampleTime)
)WITH CLUSTERING ORDER BY(sampleTime ASC);



我可以很容易地执行最常用的查询,如第一个值,最后一个值通过max,min,count,avg等。



但我还需要不仅获取一个范围中的最大值, p>

对于给定数据:

  sampleTime值
-08-28 00:00 10
2015-08-28 01:00 15
2015-08-28 02:00 13

我希望查询返回2015-08-28 01:00和15



这:

 选择sampletime,从dataraw获取的值,其中channelid = 865和sampletime> ='2014-01-01 00:00 '和sampleTime '2014-01-02 00:00'和值=(从其中channelid = 865并且sampletime> ='2014-01-01 00:00'的dataraw选择最大值(值),并且sampleTime<'2014-01-02 00:00') 

这将在正常SQL中工作,但Cassandra不喜欢。 b
$ b

任何想法?

解决方案

UPDATE 1 :关于如何表示这些聚合器的示例。通常,您将分别显示时间戳以及MIN和MAX值。这回答了关于什么是最大值以及何时达到的问题。





UPDATE 2 :SQL控制台

  SELECT实体,
MAX ),
date_format(MAX_VALUE_TIME(值),'yyyy-MM-dd HH:mm:ss')AS最大值时间
来自cpu_busy
WHERE时间> current_hour GROUP BY entity


I'm testing out Cassandra (2.0) as a possible replacement for storing our time-series data.

I made a simple table and dumped some of our data into it:

CREATE TABLE DataRaw(
  channelId int,
  sampleTime timestamp,
  value double,
  PRIMARY KEY (channelId, sampleTime)
) WITH CLUSTERING ORDER BY (sampleTime ASC);

I can quite easily perform the most used queries like first value, last value (current) and get statistics via max, min, count, avg etc.

But I also need to not only fetch the max value in a range, but the sampletime where that value is.

For for the given data:

sampleTime          value
2015-08-28 00:00    10
2015-08-28 01:00    15
2015-08-28 02:00    13

I'd like the query to return 2015-08-28 01:00 and 15

I tried something like this:

select sampletime, value from dataraw where channelid=865 and sampletime >= '2014-01-01 00:00' and sampleTime < '2014-01-02 00:00' and value = (select max(value) from dataraw where channelid=865 and sampletime >= '2014-01-01 00:00' and sampleTime < '2014-01-02 00:00');

which would work in "normal" SQL, but Cassandra does not like it.

Any ideas?

解决方案

Axibase Time-Series Database supports MIN_VALUE_TIME and MAX_VALUE_TIME aggregators.

  • MIN_VALUE_TIME returns time in milliseconds when the MIN value was first reached within the period.
  • MAX_VALUE_TIME returns time in milliseconds when the MAX value was first reached within the period.

Multiple aggregators can be combined within the same API request so you can fetch both MAX and MAX_VALUE_TIME in one go.

As for the back-end, ATSD uses HBase for raw storage.

Disclosure: I work for Axibase.

UPDATE 1: Examples on how these aggregators can be represented. Typically you would show timestamps along with MIN and MAX values respectively. This answers the question on what was the maximum and when was it reached.

UPDATE 2: SQL Console

SELECT entity, 
  MAX(value), 
  date_format(MAX_VALUE_TIME(value), 'yyyy-MM-dd HH:mm:ss') AS "Max Value Time" 
  FROM cpu_busy 
WHERE time > current_hour GROUP BY entity

这篇关于基于统计信息获取cassandra中的数据点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆