Cassandra中各列的总和 [英] Sum aggregation for each columns in cassandra

查看:102
本文介绍了Cassandra中各列的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据模型,

I have a Data model like below,

CREATE TABLE appstat.nodedata (
    nodeip text,
    timestamp timestamp,
    flashmode text,
    physicalusage int,
    readbw int,
    readiops int,
    totalcapacity int,
    writebw int,
    writeiops int,
    writelatency int,
    PRIMARY KEY (nodeip, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)

其中,nodeip-主键和时间戳-群集键(通过排序以获取最新信息),

where, nodeip - primary key and timestamp - clustering key (Sorted by descinding oder to get the latest),

此表中的样本数据

SELECT * from nodedata WHERE nodeip = '172.30.56.60' LIMIT 2;

 nodeip       | timestamp                       | flashmode | physicalusage | readbw | readiops | totalcapacity | writebw | writeiops | writelatency
--------------+---------------------------------+-----------+---------------+--------+----------+---------------+---------+-----------+--------------
 172.30.56.60 | 2017-12-08 06:13:07.161000+0000 |       yes |            34 |     57 |       19 |            27 |       8 |        89 |           57
 172.30.56.60 | 2017-12-08 06:12:07.161000+0000 |       yes |            70 |      6 |       43 |            88 |      79 |        83 |           89

这是正确可用的,每当我需要获取统计信息时,我都可以使用如下所示的分区键,

This is properly available and whenever I need to get the statistics I am able to get the data using the partition key like below,

(以上逻辑似乎与我之前的问题类似:跨分区的Cassandra中的聚合但期望有所不同

(The above logic seems similar to my previous question : Aggregation in Cassandra across partitions) but expectation is different,

我很有价值对于所有列(例如readbw,延迟等),在所有4个节点中每1分钟填充一次。

I have value for each column (like readbw, latency etc.,) populated for every one minute in all the 4 nodes.

现在,如果我需要获取a的最大值列(示例:readbw),可以使用以下查询,

Now, If I need to get the max value for a column (Example : readbw), It is possible using the following query,

SELECT max(readbw) FROM nodedata WHERE nodeip IN ('172.30.56.60','172.30.56.61','172.30.56.60','172.30.56.63') AND timestamp < 1512652272989 AND timestamp > 1512537899000;

1)第一个问题:有没有办法执行 max 汇总在列(readbw)的所有节点上不使用IN查询?

1) First question : Is there a way to perform max aggregation on all nodes of a column (readbw) without using IN query?

2)第二个问题:只要我将数据插入节点1,节点2,节点3和节点中,Cassandra中就有办法吗4.
它需要汇总并存储在另一个表中。这样我就可以从汇总表中收集每一列的汇总值。

2) Second question : Is there a way in Cassandra, whenever I insert the data in Node 1, Node 2, Node 3 and Node 4. It needs to be aggregated and stored in another table. So that I will collect the aggregated value of each column from the aggregated table.

如果我的观点不清楚,请告诉我。

If any of my point is not clear, please let me know.

谢谢,

哈里

Thanks,
Harry

推荐答案

如果您是dse Cassandra,则可以启用spark并编写聚合查询

If you are dse Cassandra you can enable spark and write the aggregation queries

这篇关于Cassandra中各列的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆