Cassandra NOT EQUAL运算符 [英] Cassandra NOT EQUAL Operator

查看:122
本文介绍了Cassandra NOT EQUAL运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对所有Cassandra专家的疑问.

Question to all Cassandra experts out there.

我有一个拥有大约一百万条记录的列族.

I have a column family with about a million records.

我想以某种方式查询这些记录,以便能够执行Not-Equal-To这种操作.

I would like to query these records in such a way that I should be able to perform a Not-Equal-To kind of operation.

我对此进行了谷歌搜索,看来我必须使用某种Map-Reduce.

I Googled on this and it seems I have to use some sort of Map-Reduce.

有人可以告诉我这方面有哪些选择.

Can somebody tell me what are the options available in this regard.

推荐答案

我可以建议一些方法.

I can suggest a few approaches.

1)如果您要测试的值数量有限,请考虑将其建模为boolean列(即:列isEqualToUnitedStates为true或false).

1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean columns (i.e.: column isEqualToUnitedStates with true or false).

2)否则,请考虑通过组合客户端上两个单独的查询< X> X的结果来模拟不受支持的查询!= X.

2) Otherwise, consider emulating the unsupported query != X by combining results of two separate queries, < X and > X on the client-side.

3)如果您的模式不能支持上述两种查询,则可能必须求助于编写自定义例程,该例程将进行客户端过滤并动态构造不相等的集合.如果您可以先将搜索空间缩小到可管理的比例,这样在没有不相等条件的情况下运行查询相对便宜,那么这将是可行的.

3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.

因此,假设您对除Widget之外的每种产品类型的特定客户的所有购买感兴趣.理想的查询可能类似于SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget';现在,您当然不能运行它,但是在这种情况下,您应该能够运行SELECT * FROM purchases WHERE customer = 'Bob'而不会浪费太多资源并在客户端应用程序中过滤item != 'Widget'.

So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget'; Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob' without wasting too many resources and filter item != 'Widget' in the client application.

4)最后,如果在进行扫描之前无法以有意义的方式限制数据(不进行相等性检查的查询将返回太多行以致无法舒适地进行处理),则您可能不得不求助于MapReduce.这意味着要运行一个分布式作业,该作业将扫描整个群集中表中的所有行.显然,此类作业的运行速度将比本地查询慢得多,并且设置起来非常复杂.如果您想采用这种方式,请查看 Cassandra Hadoop集成.

4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.

这篇关于Cassandra NOT EQUAL运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆