Cassandra CQL Select count with LIMIT [英] Cassandra CQL Select count with LIMIT

查看:24
本文介绍了Cassandra CQL Select count with LIMIT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个简单的表格:

I created a simple tabe:

CREATE TABLE test (
  "type" varchar,
  "value" varchar,
  PRIMARY KEY(type,value)
);

我在其中插入了 5 行:

I inserted 5 rows into it:

INSERT INTO test(type,value) VALUES('test','tag1')
INSERT INTO test(type,value) VALUES('test','tag2')
INSERT INTO test(type,value) VALUES('test','tag3')
INSERT INTO test(type,value) VALUES('test','tag4')
INSERT INTO test(type,value) VALUES('test','tag5')

我运行了 SELECT * from test LIMIT 3 并且它按预期工作.

I ran SELECT * from test LIMIT 3 and it works as expected.

 type | value
------+------
 test |  tag1
 test |  tag2
 test |  tag3

当我运行 SELECT COUNT(*) from test LIMIT 3 时,它产生:

When I ran SELECT COUNT(*) from test LIMIT 3, it produces:

 count
-------
     5

不应该说3吗?

Datastax 文档 似乎建议指定 LIMIT 将覆盖默认值 10,000.为什么在这种情况下不起作用?如果重要的话,我使用 Cassandra 2.2.5 并通过 cqlsh 运行所有查询.

The Datastax documentation seems to suggest that specifying a LIMIT will overwrite the default of 10,000. Why does it not work in this case? If it matters, I'm on Cassandra 2.2.5 and ran all the queries through cqlsh.

更新Java 驱动程序和 CQLSH 都经过测试,表明 LIMIT 确实无法按照文档中的规定工作.如果有任何 Datastax 员工阅读,我们将不胜感激.

Update Both the Java driver and CQLSH have been tested to show that LIMIT indeed does not work as prescribed in the documentation. If there are any Datastax employees reading, your input would be greatly appreciated.

推荐答案

我对此的自发反应是行计数始终只返回其结果集中的一行,说明找到的行数.因此,任何大于 1 的 LIMIT 都不会产生影响.

My spontaneous response to this was that a row count always only returns one row in its result set, stating the number of rows found. So any LIMIT greater than 1 would not have an effect.

但正如@light 正确指出的那样,文档指出 LIMIT 应该适用于 count(*).也有充分的理由.根据这篇博文 Cassandra 无法提供任何元数据使用数字或行,但必须检查每个分区(在每个节点上)才能获得数字.因此,这是一项非常昂贵的操作.

But as @light correctly pointed out, the documentation states that the LIMIT should apply to a count(*). And with good reason too. According to this blog post Cassandra cannot source any meta data to come up with the number or rows, but has to inspect every partition (on every node) to get to the number. It thus is a very expensive operation.

然而,与文档相反,当使用 cqlsh 或 Java 驱动程序 (v3.0.0) 查询 C* 2.2.4 时,LIMIT 子句对报告的行数没有影响.也没有 10'000 行的 cqlsh 的默认限制.如果超过 10'000,则 LIMIT 也不会超过 10'000.

However, contrary to the documentation, when querying C* 2.2.4 with cqlsh or with the Java driver (v3.0.0) the LIMIT clause has no effect on the reported number of rows. Neither has the default limit of cqlsh of 10'000 rows. Nor has a LIMIT greater than 10'000 if there are more than 10'000.

文档和实现似乎不同步.虽然我不能说哪个不正确.

The documentation and implementation seem to be out of sync. Though which one is incorrect I cannot say.

编辑

@Abhishek Anand 引用的工单得出的结论是 文档错误.不是行为.因此,将限制指定为 1 将计算所有行.这就是期望的行为.

The ticket referenced by @Abhishek Anand concludes that the documentation is wrong. Not the behavior. So specifying a limit of 1 will count all your rows. And that is the desired behavior.

这篇关于Cassandra CQL Select count with LIMIT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆