Cassandra NOT EQUAL运算符 [英] Cassandra NOT EQUAL Operator
问题描述
对所有Cassandra专家的疑问.
Question to all Cassandra experts out there.
我有一个拥有大约一百万条记录的列族.
I have a column family with about a million records.
我想以某种方式查询这些记录,以便能够执行Not-Equal-To
这种操作.
I would like to query these records in such a way that I should be able to perform a Not-Equal-To
kind of operation.
我对此进行了谷歌搜索,看来我必须使用某种Map-Reduce
.
I Googled on this and it seems I have to use some sort of Map-Reduce
.
有人可以告诉我这方面有哪些选择.
Can somebody tell me what are the options available in this regard.
推荐答案
我可以建议一些方法.
I can suggest a few approaches.
1)如果您要测试的值数量有限,请考虑将其建模为boolean
列(即:列isEqualToUnitedStates
为true或false).
1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean
columns (i.e.: column isEqualToUnitedStates
with true or false).
2)否则,请考虑通过组合客户端上两个单独的查询< X
和> X
的结果来模拟不受支持的查询!= X
.
2) Otherwise, consider emulating the unsupported query != X
by combining results of two separate queries, < X
and > X
on the client-side.
3)如果您的模式不能支持上述两种查询,则可能必须求助于编写自定义例程,该例程将进行客户端过滤并动态构造不相等的集合.如果您可以先将搜索空间缩小到可管理的比例,这样在没有不相等条件的情况下运行查询相对便宜,那么这将是可行的.
3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.
因此,假设您对除Widget之外的每种产品类型的特定客户的所有购买感兴趣.理想的查询可能类似于SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget';
现在,您当然不能运行它,但是在这种情况下,您应该能够运行SELECT * FROM purchases WHERE customer = 'Bob'
而不会浪费太多资源并在客户端应用程序中过滤item != 'Widget'
.
So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget';
Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob'
without wasting too many resources and filter item != 'Widget'
in the client application.
4)最后,如果在进行扫描之前无法以有意义的方式限制数据(不进行相等性检查的查询将返回太多行以致无法舒适地进行处理),则您可能不得不求助于MapReduce.这意味着要运行一个分布式作业,该作业将扫描整个群集中表中的所有行.显然,此类作业的运行速度将比本地查询慢得多,并且设置起来非常复杂.如果您想采用这种方式,请查看 Cassandra Hadoop集成.
4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.
这篇关于Cassandra NOT EQUAL运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!