错误计数(*)与cassandra-cql [英] Wrong count(*) with cassandra-cql

查看:162
本文介绍了错误计数(*)与cassandra-cql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图为我的测试创建一些用户。我使用对于Ruby on Rails的cassandra-cql gem在0..100000的循环中创建了用户,然后我计数了我的数据库中的用户,结果只有10000个用户。如果我创建9000,一切工作正常。首先我认为用户不存在,但我使用了Apollo WebUI for Cassandra,我可以找到id为100000的用户和下面的用户。为什么会发生这种情况?

I tried to create some users for my testing. I created users in a loop from 0..100000 using the cassandra-cql gem for Ruby on Rails, and then I counted the users in my database and there were only 10000 users as result. If I create 9000, everything works fine. First I thought the users didn't exist, but I used the Apollo WebUI for Cassandra, and I could find the user with the id 100000 and users below. Why does this happen?

我知道我应该使用计数器列来提供我的应用程序中的用户数,但我想知道这是一个错误还是失败。

I know I should use a counter column to provide the number of users in my application, but I want to know if this is a bug or a failure of mine.

def self.create_users
  (0..19000).each do |f|
    @@db.execute("INSERT INTO users (uid, first_name, last_name, email) VALUES (?,?,?,?)", f.to_s, "first_name", "last_name", "email")
  end
end

def self.count_users
  count = @@db.execute("SELECT count(*) FROM users")
  count.fetch do |c|
    return c[0]
  end
end


推荐答案

CQL操作限制将返回给用户的行数和列数。默认情况下,限制为10,000。因为count(*)操作实际上必须取出所有行才能获得计数,所以它也受默认值10,000行的限制。您可以增加查询的限制(虽然我不推荐):

CQL operations limit both the number of rows and the number of columns that will be returned to the user. By default that limit is 10,000. Because the count(*) operation actually has to fetch out all the rows in order to get the count, it is also limited by the default of 10,000 rows. You could increase the limit for the query (although I don't recommend it):

SELECT count(*) FROM users limit 20000;

注意,这是一个昂贵的操作,特别是当你有很多行。您应该预计,对于任何中型或大型数据集,此类型的查询可能需要很长时间。如果可能,您应该将此计数反规范化为计数器或其他不需要获取列族的所有行的其他形式。

Note that this is an expensive operation especially when you have a lot of rows. You should anticipate this type of query could take a long time for any medium or large size dataset. If at all possible you should denormalize this count into a counter or some other form that will not require fetching all the rows in your column family.

这篇关于错误计数(*)与cassandra-cql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆