Cassandra:使用where子句包含大于或小于(<和>)的查询 [英] Cassandra: Query with where clause containing greather- or lesser-than (< and >)

查看:291
本文介绍了Cassandra:使用where子句包含大于或小于(<和>)的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Cassandra 1.1.2,我正在尝试将RDBMS应用程序转换为Cassandra。在我的RDBMS应用程序中,有一个名为table1的表:

I'm using Cassandra 1.1.2 I'm trying to convert a RDBMS application to Cassandra. In my RDBMS application I have following table called table1:

| Col1 | Col2 | Col3 | Col4 |




  1. Col1:字符串(主键)

  2. Col2:字符串(主键)

  3. Col3:Bigint(索引)

  4. Col4:Bigint

  1. Col1: String (primary key)
  2. Col2: String (primary key)
  3. Col3: Bigint (index)
  4. Col4: Bigint

此表计数超过2亿条记录。最常用的查询是这样的:

This table counts over 200 million records. Mostly used query is something like:

Select * from table where col3 < 100 and col3 > 50;

在Cassandra中,我使用以下语句创建表:

In Cassandra I used following statement to create the table:

create table table1 (primary_key varchar, col1 varchar, 
col2 varchar, col3 bigint, col4 bigint, primary key (primary_key));

create index on table1(col3);

我将主键更改为额外的列(我在应用程序内部计算了键)。
导入一些记录后,我尝试执行以下cql:

I changed the primary key to an extra column (I calculate the key inside my application). After importing a few records I tried to execute following cql:

select * from table1 where col3 < 100 and col3 > 50;

结果是:

Bad Request: No indexed columns present in by-columns clause with Equal operator



查询从table1中选择col1,col2,col3,col4,其中col3 = 67 有效

Google表示无法执行这种操作查询。那正确吗?有任何建议如何创建这样的查询?

Google said there is no way to execute that kind of queries. Is that right? Any advice how to create such a query?

推荐答案

Cassandra索引实际上并不支持顺序访问。请参见 http://www.datastax.com/docs/1.1/ddl/indexes快速快速地说明它们的用处。但不要失望;使用Cassandra(以及许多其他NoSQL系统)的更经典的方法是去规范化,去规范化,去规范化。

Cassandra indexes don't actually support sequential access; see http://www.datastax.com/docs/1.1/ddl/indexes for a good quick explanation of where they are useful. But don't despair; the more classical way of using Cassandra (and many other NoSQL systems) is to denormalize, denormalize, denormalize.

可能是个好主意在这种情况下,您可以使用经典的存储桶范围模式,该模式可以让您使用推荐的RandomPartitioner,并使行在群集中分布得很好,同时仍然允许顺序访问值。这种情况下的想法是,您将进行第二次动态列族映射(存储和排序) col3 值,返回到相关的 primary_key 值。例如,如果您的 col3 值的范围是0到10 ^ 9,并且分布相当均匀,则您可能希望将它们放入1000个范围为10 ^ 6的存储桶中(最佳粒度取决于您所需的查询种类,您拥有的数据种类,查询往返时间等)。 cql3的示例架构:

It may be a good idea in your case to use the classic bucket-range pattern, which lets you use the recommended RandomPartitioner and keep your rows well distributed around your cluster, while still allowing sequential access to your values. The idea in this case is that you would make a second dynamic columnfamily mapping (bucketed and ordered) col3 values back to the related primary_key values. As an example, if your col3 values range from 0 to 10^9 and are fairly evenly distributed, you might want to put them in 1000 buckets of range 10^6 each (the best level of granularity will depend on the sort of queries you need, the sort of data you have, query round-trip time, etc). Example schema for cql3:

CREATE TABLE indexotron (
    rangestart int,
    col3val int,
    table1key varchar,
    PRIMARY KEY (rangestart, col3val, table1key)
);

插入 table1 时,应插入 indexotron 中的相应行,其中 rangestart = int(col3val / 1000000)。然后,当您需要使用col3> X枚举 table1 中的所有行时,您需要查询多达1000个 indexotron ,但其中的所有 col3val 都将进行排序。示例查询查找所有 table1.col3<的所有 table1.primary_key 值。 4021

When inserting into table1, you should insert a corresponding row in indexotron, with rangestart = int(col3val / 1000000). Then when you need to enumerate all rows in table1 with col3 > X, you need to query up to 1000 buckets of indexotron, but all the col3vals within will be sorted. Example query to find all table1.primary_key values for which table1.col3 < 4021:

SELECT * FROM indexotron WHERE rangestart = 0 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 1000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 2000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 3000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 4000 AND col3val < 4021 ORDER BY col3val;

这篇关于Cassandra:使用where子句包含大于或小于(&lt;和&gt;)的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆