我应该使用什么数据结构来模拟“按计数器排序”在Cassandra? [英] What data structure should I use to mimic "order by counter" in Cassandra?

查看:161
本文介绍了我应该使用什么数据结构来模拟“按计数器排序”在Cassandra?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在有一个这样的表格

Let's say I currently have a table like this

create table comment_counters
{ 
    contentid uuid,
    commentid uuid,
    ...
    liked counter,
    PRIMARY_KEY(contentid, commentid)
 };

此表的目的是跟踪评论和个人评论 。

This purpose of this table is to track the comments and the number of times individual comments have been "liked".

我想要做的是获取由每个内容的表格中的喜好数量决定的热门评论(也就是20个评论)。

What I would like to do is to get the top comments (let's say 20 top comments) determined by their number of likes from this table for each content.

我知道没有办法通过计数器订购,所以我想知道的是,有什么其他方法来做这个Cassandra,通过重组我的表或跟踪更多/不同信息,或者我离开没有别的选择,但在RDBMS中这样做?

I know there's no way to order by counters so what I would like to know is, are there any other ways to do this in Cassandra, by restructuring my tables or tracking more/different information for instance, or am I left with no choice but to do this in an RDBMS?

在客户端排序并不是我现在想要考虑的一个选择。

Sorting in client is not really an option I would like to consider at this stage.

推荐答案

不幸的是,现在有办法使用纯Cassandra查询来做这种类型的聚合。进行此类数据分析的最佳选择是使用外部工具,例如 Spark
使用Spark可以启动定期作业,这些定期作业将从comment_counters表中读取和聚合所有计数器,然后将结果(例如前20条注释)写入不同的表,然后可以直接使用查询。
请参见此处以开始使用Cassandra和Spark。

Unfortunately there's now way to do this type of aggregations using plain Cassandra queries. The best option for doing this kind of data analysis would be to use an external tool such as Spark. Using Spark you can start periodical jobs that would read and aggregate all counters from the comment_counters table and afterwards write the results (such as top 20 comments) to a different table that you can use to query directly afterwards. See here to get started with Cassandra and Spark.

这篇关于我应该使用什么数据结构来模拟“按计数器排序”在Cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆