为什么在相对较小的数据集上，特定分区上的Cassandra COUNT()会花费很长时间 [英] Why Cassandra COUNT() on a specific partition takes really long on relatively small datasets

查看：98 发布时间：2020/9/20 19:55:19 cassandra nosql bigdata cql

本文介绍了为什么在相对较小的数据集上，特定分区上的Cassandra COUNT(*)会花费很长时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个定义如下的表:

I have a table defined like:

键空间:

CREATE KEYSPACE messages WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

表格:

CREATE TABLE messages.textmessages (
    categoryid int,
    date timestamp,
    messageid timeuuid,
    message text,
    userid int,
    PRIMARY KEY ((categoryid, date), messageid)
) WITH CLUSTERING ORDER BY (messageid ASC);

目标是拥有宽的行时间序列存储，以使categoryid和date(一天的开始)构成我的分区键，而messageid提供聚类.这使我可以执行以下查询:

The goal is to have a wide row time-series storage such that categoryid and date(beginning of day) constitutes my partition key and the messageid provides the clustering. This enables me to do queries like:

SELECT * FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')

在给定的一天中获取消息；它是如此之好，如此之快！

to get messages in a given day; it works so well so fast!

问题

我需要能够通过用SELECT COUNT(*)替换上面的SELECT *来计算给定日期的邮件.即使列族中的条目少于100K，这也将花费很长时间.实际上在cqlsh上超时.

I need to be able to count the messages in a given day by substituting SELECT * above with SELECT COUNT(*). This takes very long even with a little less than 100K entries in the column family; it actually times out on cqlsh.

对于问题

为什么即使在以下情况下，该查询也会花费这么长时间:

Why would this query take so long even when:

SELECT COUNT(*) FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')

该计数位于少于10万条记录的特定分区上
我在高性能Macbook Pro上只有一个Cassandra节点
该实例中没有活动的写入/读取；开发笔记本电脑上的分区少于20个

为什么在相对较小的数据集上，特定分区上的Cassandra COUNT()会花费很长时间 [英] Why Cassandra COUNT() on a specific partition takes really long on relatively small datasets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么在相对较小的数据集上，特定分区上的Cassandra COUNT(*)会花费很长时间 [英] Why Cassandra COUNT(*) on a specific partition takes really long on relatively small datasets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么在相对较小的数据集上，特定分区上的Cassandra COUNT()会花费很长时间 [英] Why Cassandra COUNT() on a specific partition takes really long on relatively small datasets

登录关闭