CQL SELECT 大于对索引非键列的查询 [英] CQL SELECT greater-than query on indexed non-key column

查看：16 发布时间：2021/12/31 17:47:00 indexing cassandra

本文介绍了CQL SELECT 大于对索引非键列的查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在原始问题之后添加了一个案例来描述问题.

我希望查询不属于我的键的列.如果我理解正确，我需要在该列上定义二级索引.但是，我希望使用大于条件(而不仅仅是相等条件)并且这似乎仍然不受支持.

我错过了什么吗?您将如何解决这个问题?

我想要的设置:

Cassandra 1.1.6CQL3创建表表1(KeyA int,KeyB 整数，值一个整数，主键(KeyA、KeyB));在表 1 上创建索引(值 A)；SELECT * FROM Table1 WHERE ValueA >3000；

由于在 Cassandra 1.1.6 中仍然不支持在具有复合键的 ColumnFamilies 上定义二级索引，因此我必须解决删除其中一个键的临时解决方案，但我仍然遇到与非相等条件相同的问题.

还有其他方法可以解决这个问题吗?

感谢您的宝贵时间.

<小时>

编辑 1

这是一个可以解释问题的案例.正如 rs-atl 所指出的，这可能是数据模型问题.假设我在 stackoverflow 上保留了所有用户的列族.对于每个用户，我保留了一批统计数据(声誉、NumOfAnswers、NumOfVotes...所有这些都是整数).我想查询这些统计信息以获取相关用户.

CREATE TABLE UserStats(用户 ID 整数，声誉，NumOfAnswers 整数，...很多统计......NumOfVotes 整数，主键(用户 ID));

现在我有兴趣根据这些统计信息对用户 ID 进行切片.我想要所有声誉超过 10K 的用户，我想要所有回答少于 5 的用户，等等.

希望有帮助.再次感谢.

解决方案

在 CQL 中，一旦为所有列创建了索引(即二级索引)，您就可以在所有列上应用 WHERE 子句).否则，您将收到以下错误:

错误请求:在带有 Equal 运算符的 by-columns 子句中不存在索引列

不幸的是，即使有二级索引，由于性能问题.

<块引用>

问:为什么总是需要至少有一个 EQ 比较二级索引?

A:二级指标的不等式总是在内存中完成，因此在另一个二级索引上没有至少一个 EQ您将加载数据库中的每一行，其中包含大量数据库不是一个好主意.因此，通过要求至少有一个 EQ(二级)索引，您希望限制需要读入的行集内存到可管理的大小.(虽然显然你仍然可以得到也遇到了麻烦).

因此，基本上，如果您除了 EQ 比较之外还有任何其他内容，它会加载否则匹配"您的查询的所有行，并一次检查它们是否匹配.默认情况下不允许这样做，因为它可能很慢".(本质上，索引仅索引为了相等"，而不是像 < 和 > 关系数据库上的索引那样的任何东西).

需要注意的一点是，如果您在二级索引上有多个非 EQ 条件，您还需要在查询中包含 ALLOW FILTERING 关键字，否则您将得到

无法执行此查询，因为它可能涉及数据过滤，因此可能具有不可预测的性能.如果您想在性能不可预测的情况下执行此查询，请使用 ALLOW FILTERING

一种简单的解决方法是将一个虚拟列附加到您的表中，其中所有行在该列上都具有相同的值.因此，在这种情况下，您可以仅对所需的列执行范围查询.请注意，在 NoSQL 数据库上执行此类查询可能会减慢系统速度/使系统陷入困境.

<小时>

示例

cqlsh:demo>desc table table1;创建表 table1 (钥匙国际，键位整数，虚拟值整数，值一个整数，主键(keya，keyb)) ....cqlsh:演示>从表1中选择*；钥匙 |键 |虚拟值|值a------+------+--------------+--------1 |2 |0 |34 |5 |0 |67 |8 |0 |9

在 ValueA 和 DummyValue 上创建二级索引:

cqlsh:demo>在 table1 (valuea) 上创建索引 table1_valuea；cqlsh:演示>在 table1 (dummyvalue) 上创建索引 table1_valueb；

使用 DummyValue=0 对 ValueA 执行范围查询:

cqlsh:demo>select * from table1 where dummyvalue = 0 and valuea >5 允许过滤；钥匙 |键 |虚拟值|值a------+------+--------------+--------4 |5 |0 |67 |8 |0 |9

EDIT1: added a case to describe the problem after the original question.

I wish to query on a column which is not part of my key. If I understand correctly, I need to define a secondary index on that column. However, I wish to use a greater than condition (not just equality condition) and that still seems unsupported.

Am I missing something? How would you address this issue?

My desired Setup:

Cassandra 1.1.6
CQL3

CREATE TABLE Table1(
             KeyA int,
             KeyB int,
             ValueA int,
             PRIMARY KEY (KeyA, KeyB)
           );

CREATE INDEX ON Table1 (ValueA);

SELECT * FROM Table1 WHERE ValueA > 3000;

Since defining a secondary index on ColumnFamilies with Composite Keys is still not supported in Cassandra 1.1.6 I have to settle on a temporary solution of dropping one of the keys but I still have the same problem with non equality conditions.

Is there another way to address this?

Thank you for your time.

Relevant sources: http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt http://www.datastax.com/docs/1.1/ddl/indexes

EDIT1

Here's a case that will explain the problem. As rs-atl noted, it might be a data model problem. Let's say I keep a column family of all the users on stackoverflow. for each user I keep a batch of stats (Reputation, NumOfAnswers, NumOfVotes... all of them are int). I want to query on those stats to get the relevant users.

CREATE TABLE UserStats(
             UserID int,
             Reputation int,
             NumOfAnswers int,
             .
             .
             .
             A lot of stats...
             .
             .
             .
             NumOfVotes int,
             PRIMARY KEY (UserID)
           );

Now I'm interested in slicing UserID's based on those stats. I want all the users with over 10K reputation, I want all the users with less than 5 answers, etc. etc.

I hope that helps. Thanks again.

解决方案

In CQL, you are able to apply the WHERE clause on all columns once you have created indices for them (i.e., secondary index). Otherwise, you will get the following error:

Bad Request: No indexed columns present in by-columns clause with Equal operator

Unfortunately, even with secondary indices, the WHERE clause are required to have at least one EQ on an secondary index by CQL due to performance issue.

Q: Why is it necessary to always have at least one EQ comparison on secondary indices?

A: Inequalities on secondary indices are always done in memory, so without at least one EQ on another secondary index you will be loading every row in the database, which with a massive database isn't a good idea. So by requiring at least one EQ on an (secondary) index, you hopefully limit the set of rows that need to be read into memory to a manageable size. (Although obviously you can still get into trouble with that as well).

So basically if you have anything besides an EQ comparison, it loads all rows "that elsewise match" your query, and checks if they match, one at a time. Which is not allowed by default since it "could be slow." (In essence, indexes only index "for equality" not for anything else like < and > which indexes on a relational database would).

One thing to note is that if you have more than one non EQ conditions on secondary indices, you also need to include the ALLOW FILTERING key word in your query, or else you'll get

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

One simple way to work-around is to append a dummy column to your table where all row have the same value on that column. So in this case you are able to perform ranged query on just your desired column. Do realize that these kind of queries on a NoSQL database may be slow/bog down a system.

Example

cqlsh:demo> desc table table1;

CREATE TABLE table1 (
  keya int,
  keyb int,
  dummyvalue int,
  valuea int,
  PRIMARY KEY (keya, keyb)
) ....

cqlsh:demo> select * from Table1;

 keya | keyb | dummyvalue | valuea
------+------+------------+--------
    1 |    2 |          0 |      3
    4 |    5 |          0 |      6
    7 |    8 |          0 |      9

Create secondary indices on ValueA and DummyValue:

cqlsh:demo> create index table1_valuea on table1 (valuea);
cqlsh:demo> create index table1_valueb on table1 (dummyvalue);

Perform ranged query on ValueA with DummyValue=0:

cqlsh:demo> select * from table1 where dummyvalue = 0 and valuea > 5 allow filtering;

 keya | keyb | dummyvalue | valuea
------+------+------------+--------
    4 |    5 |          0 |      6
    7 |    8 |          0 |      9

这篇关于CQL SELECT 大于对索引非键列的查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CQL SELECT 大于对索引非键列的查询 [英] CQL SELECT greater-than query on indexed non-key column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

CQL SELECT 大于对索引非键列的查询 [英] CQL SELECT greater-than query on indexed non-key column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭