CQL与宽行 - 如何获取最近设置? [英] CQL with a wide row - how to get most recent set?

查看:186
本文介绍了CQL与宽行 - 如何获取最近设置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何编写CQL以从每行获取最近的一组数据?



我正在调查从MSSQL转换到Cassandra,并开始掌握的概念。大量的研究有巨大的帮助,但我没有找到答案(我知道必须有一种方式):

  CREATE TABLE WideData {
ID text,
已更新时间戳记,
标题文本
ReportData文本
PRIMARY KEY(ID,更新)
} WITH CLUSTERING ORDER(更新DESC)

插入到WideData(ID,更新,标题,报告数据)VALUES('aaa',NOW,'Title','Blah blah blah blah')
INSERT INTO WideData(ID,Updated,Title,ReportData)VALUES('bbb',NOW,'Title','Blah blah blah blah')

等待1分钟:

  INSERT INTO WideData(ID,Updated,Title,ReportData)VALUES 'bbb',NOW,'Title 2','Blah blah blah blah')

等待3分钟:

  INSERT INTO WideData(ID,Updated,Title,ReportData)VALUES('aaa',NOW,'Title 2' 'blah blah blah blah')

等待5分钟:

  INSERT INTO WideData(ID,Updated,Title,ReportData)VALUES('aaa',NOW,'Title 3','Blah blah blah blah')

如何编写CQL以从每行获取最近的一组数据?



SELECT ID,Title FROM WideRow - 给我5行,因为它为我枢纽的数据。



基本上我想要的结果SELECT ID,Title FROM WideRow WHERE .....)为:

  ID标题
aaa,Title3
bbb,Title2

此外,是否有办法获取数据数

  ID计数
aaa 3
bbb 2


b $ b

感谢



此外,任何参考以了解这些类型的查询的更多信息也会感激。

解决方案

对于当前数据模型,只能通过分区键查询最近的行。在您的情况下,这是 ID

  SELECT ID, WideData WHERE ID ='aaa'LIMIT 1 

由于您已在以DESCending顺序更新,将首先返回最新的更新的时间戳记的行。



给定你想要的结果,我将继续,假设你不想单独查询每个分区键。 Cassandra仅通过分区键维护CQL结果集顺序。另外Cassandra不支持聚合。所以,真的没有办法得到所有的 ID 在一起的最新,也没有办法获得有多少更新每个 ID 有。



使用Cassandra数据建模,您需要构建适合您查询的表。查询规划并不是Cassandra的一个强点(正如你所知道的)。要获得 ID 的最新更新,您需要构建一个额外的查询表,用于仅存储每个ID的最新更新。同样,要获取每个 ID 的更新计数,您可以使用 counter coulmns 以适合该查询。



tl; dr



在Cassandra中,反规范化和冗余数据存储是关键。对于某些应用程序,您可能需要为每个需要支持的查询创建一个表...这是确定。


How would I write the CQL to get the most recent set of data from each row?

I'm investigating transitioning from MSSQL to Cassandra and am starting to grasp the concepts. Lots of research has help tremendously, but I haven't found answer to this (I know there must be a way):

CREATE TABLE WideData {
 ID text,
 Updated timestamp,
 Title text,
 ReportData text,
 PRIMARY KEY (ID, Updated)
} WITH CLUSTERING ORDER (Updated DESC) 

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title', 'Blah blah blah blah')
INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('bbb', NOW, 'Title', 'Blah blah blah blah')

wait 1 minute:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('bbb', NOW, 'Title 2', 'Blah blah blah blah')

wait 3 minutes:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title 2', 'Blah blah blah blah')

wait 5 minutes:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title 3', 'Blah blah blah blah')

How would I write the CQL to get the most recent set of data from each row?

SELECT ID, Title FROM WideRow - gives me 5 rows, as it pivots the data for me.

Essentially I want the results for (SELECT ID, Title FROM WideRow WHERE .....) to be:

ID   Title
aaa, Title3
bbb, Title2

Also, is there a way to get a count of the number of data sets in a wide row?

Essentially the equivalent of TSQL: SELECT ID, Count(*) FROM Table GROUP BY ID

ID   Count
aaa  3
bbb  2

Thanks

Also, any references to learn more about these types of queries would also be appreciated.

解决方案

With your current data model, you can only query the most-recent row by partition key. In your case, that is ID.

SELECT ID, Title FROM WideData WHERE ID='aaa' LIMIT 1

Since you have indicated your clustering order on Updated in DESCending order, the row with the most-recent Updated timestamp will be returned first.

Given your desired results, I'll go ahead and assume that you do not want to query each partition key individually. Cassandra only maintains CQL result set order by partition key. Also Cassandra does not support aggregation. So there really is no way to get the "most recent" for all of your IDs together at once, nor is there a way to get a report of how many updates each ID has.

With Cassandra data modeling, you need to build your tables to suit your queries. Query "planning" is not really a strong point of Cassandra (as you are finding out). To get the most-recent updates by ID, you would need to build an additional query table designed to store only the most-recent update for each ID. Likewise, to get the count of updates for each ID you could create an additonal query table using counter coulmns to suit that query.

tl;dr

In Cassandra, denormalization and redundant data storage is the key. For some applications, you might have one table for each query you need to support...and that's ok.

这篇关于CQL与宽行 - 如何获取最近设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆