Cassandra CQL使用LIMIT选择计数 [英] Cassandra CQL Select count with LIMIT

查看:790
本文介绍了Cassandra CQL使用LIMIT选择计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个简单的tabe:

  CREATE TABLE test(
typevarchar,
valuevarchar,
PRIMARY KEY(type,value)
);

我插入了5行:

  INSERT INTO test(type,value)VALUES('test','tag1')
INSERT INTO test(type,value)VALUES('test','tag2')
INSERT INTO test(type,value)VALUES('test','tag3')
INSERT INTO test(type,value)VALUES('test','tag4')
INSERT INTO测试(类型,值)VALUES('test','tag5')

$ c> SELECT * from test LIMIT 3 ,并且按预期工作。

  value 
------ + ------
test | tag1
test | tag2
test | tag3



当我从测试运行 SELECT COUNT(*)LIMIT 3 ,它产生:

  count 
-------
5

不应该说3吗?



Datastax文档< a>似乎暗示指定 LIMIT 将覆盖默认值10,000。为什么在这种情况下不工作?如果重要,我在Cassandra 2.2.5,并通过cqlsh运行所有的查询。



更新
Java驱动程序和CQLSH已经过测试,表明 LIMIT 确实不能按照文档中的规定工作。如果有Datastax员工阅读,您的意见将非常感谢。

解决方案

始终只返回其结果集中的一行,说明找到的行数。因此任何大于1的LIMIT都不会有效果。



但是正如@light正确指出的,文档声明LIMIT应该适用于 count(*)。也有很好的理由。根据此博客帖子 Cassandra无法提取任何元数据与数字或行,但必须检查每个分区(在每个节点上)获得数字。



但是,与文档相反,当使用cqlsh或使用Java驱动程序(v3.0.0)查询C * 2.2.4时, LIMIT 子句对报告的行数没有影响。两者都不具有10,000行的cqlsh的默认限制。如果超过10'000,则LIMIT大于10'000。



文档和实现似乎不同步。



票据参考资料作者@Abhishek Anand总结说,文档错误。不是行为。因此,指定限制为1将计算所有行。这是所需的行为。


I created a simple tabe:

CREATE TABLE test (
  "type" varchar,
  "value" varchar,
  PRIMARY KEY(type,value)
);

I inserted 5 rows into it:

INSERT INTO test(type,value) VALUES('test','tag1')
INSERT INTO test(type,value) VALUES('test','tag2')
INSERT INTO test(type,value) VALUES('test','tag3')
INSERT INTO test(type,value) VALUES('test','tag4')
INSERT INTO test(type,value) VALUES('test','tag5')

I ran SELECT * from test LIMIT 3 and it works as expected.

 type | value
------+------
 test |  tag1
 test |  tag2
 test |  tag3

When I ran SELECT COUNT(*) from test LIMIT 3, it produces:

 count
-------
     5

Shouldn't it say 3?

The Datastax documentation seems to suggest that specifying a LIMIT will overwrite the default of 10,000. Why does it not work in this case? If it matters, I'm on Cassandra 2.2.5 and ran all the queries through cqlsh.

Update Both the Java driver and CQLSH have been tested to show that LIMIT indeed does not work as prescribed in the documentation. If there are any Datastax employees reading, your input would be greatly appreciated.

解决方案

My spontaneous response to this was that a row count always only returns one row in its result set, stating the number of rows found. So any LIMIT greater than 1 would not have an effect.

But as @light correctly pointed out, the documentation states that the LIMIT should apply to a count(*). And with good reason too. According to this blog post Cassandra cannot source any meta data to come up with the number or rows, but has to inspect every partition (on every node) to get to the number. It thus is a very expensive operation.

However, contrary to the documentation, when querying C* 2.2.4 with cqlsh or with the Java driver (v3.0.0) the LIMIT clause has no effect on the reported number of rows. Neither has the default limit of cqlsh of 10'000 rows. Nor has a LIMIT greater than 10'000 if there are more than 10'000.

The documentation and implementation seem to be out of sync. Though which one is incorrect I cannot say.

EDIT

The ticket referenced by @Abhishek Anand concludes that the documentation is wrong. Not the behavior. So specifying a limit of 1 will count all your rows. And that is the desired behavior.

这篇关于Cassandra CQL使用LIMIT选择计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆