如何按Cassandra中的最后更新日期对数据进行排序? [英] How do I sort data by the last update date in Cassandra?
问题描述
我需要建议以正确设计Cassandra中的表。我需要得到所有书籍的分类清单。排序是按上次更新的日期进行的。每次购买特定书籍时, number_of_buyers
列都会更新。另外,我需要更新 updated_at
列的值。问题是 updated_at
列是集群键
,它是主键的一部分
。我们无法更新作为主键一部分的列中的值。
I need advice to correctly design the table in Cassandra. I need to get a sorted list of all the books. Sorting is performed by the date of the last update. Each time a particular book is purchased, the number_of_buyers
column is updated. Also, I need to update the value of the updated_at
column. The problem is the updated_at
column is the clustering key
which is the part of the primary key
. We can't update values in columns that are part of the primary key.
create table books (
book_id uuid,
created_at timestamp,
updated_at timestamp,
book_name varchar,
book_author varchar,
number_of_buyers int,
primary key (book_id, updated_at)
) with clustering order by (updated_at desc);
另一个示例:
create table chat_rooms (
chat_room_id uuid,
created_at timestamp,
updated_at timestamp,
last_message_content varchar,
last_message_author varchar,
unread_messages_number int,
primary key (chat_room_id, updated_at)
) with clustering order by (updated_at desc);
每个聊天室都有最新消息。这些信息总是在变化。如果发生变化,我想将聊天室放在列表的顶部。
Each chat room has the latest message. This information is always changing. In cases of change, I want to put the chat room at the top of the list. Classic behavior in many messengers.
推荐答案
您将需要对其他内容进行分区。诀窍是要在查询灵活性之间找到适当的平衡(这是您明显的需求),同时避免无限的分区增长。
So for sure; you are going to need to partition on something different. The trick is going to be finding the right balance of query flexibility (your obvious need here) while avoiding unbound partition growth.
对于图书
表,是否可以对类别
之类的东西进行分区?您知道吗,例如恐怖片,幻想片,图画小说,非小说类片,教学片等等??
For the books
table, is it possible to partition on something like category
? You know, like horror, fantasy, graphic novel, non-fiction, instructional, etc..?
CREATE TABLE book_events (
book_id uuid,
created_at timestamp,
updated_at timestamp,
book_name varchar,
book_author varchar,
number_of_buyers int,
category text,
PRIMARY KEY (category, book_name, updated_at, book_id)
) WITH CLUSTERING ORDER BY (book_name ASC,updated_at DESC,book_id ASC);
对于主键定义,我们可以在类别
上进行分区,然后群集在 book_name
和 updated_at
上,并在 book_id
上结束(为了唯一性)。然后,为每个销售事件 INSERT
新建一行。在查询中(插入几行之后),在使用<$ c $时,对 updated_at
使用 MAX
聚合在 book_name
上使用c> GROUP BY 子句。
For the PRIMARY KEY definition, we can partition on category
, and then cluster on book_name
and updated_at
, with book_id
on the end (for uniqueness). Then, INSERT
a new row for each sale event. On the query (after inserting a few rows), use the MAX
aggregation on updated_at
while using the GROUP BY
clause on book_name
.
SELECT book_name,book_author,number_of_buyers,MAX(updated_at) FROm book_events
WHERE category='Computers & Technology' GROUP BY book_name;
book_name | book_author | number_of_buyers | system.max(updated_at)
---------------------------------+------------------------------------------------------------+------------------+---------------------------------
Mastering Apache Cassandra 3.x | Aaron Ploetz, Teja Malepati | 52 | 2020-10-05 14:29:33.134000+0000
Seven NoSQL Databases in a Week | Aaron Ploetz, Devram Kandhare, Brian Wu, Sudarshan Kadambi | 163 | 2020-10-05 14:29:33.142000+0000
(2 rows)
唯一要考虑的是如何处理废弃的销售行。当然,您可以随时删除它们,具体取决于写入频率。最最佳的解决方案是考虑销售节奏,并应用TTL。
The only other consideration, is what to do with the obsoleted sale rows. You could delete them as you go, depending on the write frequency, of course. The most-optimal solution would be to consider the cadence of sales, and apply a TTL.
该解决方案绝对不能按原样完成,但我希望它能引导您朝着正确的方向发展
This solution is definitely not complete as-is, but I hope it leads you in the proper direction.
这篇关于如何按Cassandra中的最后更新日期对数据进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!