Cassandra中的二级索引和反向索引之间的区别是什么? [英] What is the difference between a secondary index and an inverted index in Cassandra?

查看:527
本文介绍了Cassandra中的二级索引和反向索引之间的区别是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我读到这两个,我认为他们都解释相同的方法,我googled,但什么也找不到。是不同的实现? Cassandra做了第二个索引本身,但是倒序索引必须由我自己实现?

When I read about these two, I thought both of them are explaining the same approach, I googled but found nothing. Is the difference in implementation? Cassandra does the secondary index itself but inverted index has to be implemented by myself?

这是更快的搜索方式?

推荐答案

主要区别是Cassandra中的二级索引不像手动倒排索引那样分布。利用内置的二级索引,每个节点对其在本地存储的数据进行索引(使用LocalPartitioner)。使用手动索引,索引独立于存储值的节点分布。

The main difference is that secondary indexes in Cassandra are not distributed in the same way a manual inverted index would be. With the inbuilt secondary indexes, each node indexes the data it stores locally (using the LocalPartitioner). With manual indexing, the indexes are distributed independently of the nodes that store the values.

这意味着,对于内置索引,每个查询必须到达每个节点,而如果你手动做了倒排索引,你只需去一个节点(加副本)查询你正在查找的值。将索引存储在本地的一个优点是索引可以用数据原子地更新。 (虽然,从Cassandra 1.2,原子批次可以用于这个,虽然他们有点慢。)

This means that, for the inbuilt indexes, each query must go to each node, whereas if you did inverted indexing manually you would just go to one node (plus replicas) to query the value you were looking up. One advantage of having the index stored locally is that indexes can be updated atomically with the data. (Although, since Cassandra 1.2, the atomic batches could be used for this instead although they are a bit slower.)

这是为什么Cassandra索引不推荐真正高基数数据。如果您在每个节点上执行查找,但只有一个或两个结果,则效率较低,手动倒排索引会更好。如果你的查找返回很多结果,那么你将需要查找每个节点,所以内置的索引工作得很好。

This is why Cassandra indexes are not recommended for really high cardinality data. If you are doing a lookup on each node but there are only one or two results, it is inefficient and a manual inverted index will be better. If your lookup returns many results, then you will need to lookup on each node anyway so the inbuilt indexes work well.

使用Cassandra的内置索引的另一个优点是索引是延迟更新的,所以您不需要在每次更新时都进行读取。 (请参见 CASSANDRA-2897 。)对于具有高写入吞吐量的索引表,这可能是一个显着的速度改进。

A further advantage of using Cassandra's inbuilt indexing is that the indexes are updated lazily, so you don't need to do a read on every update. (See CASSANDRA-2897.) This can be a significant speed improvement for indexed tables with high write throughput.

这篇关于Cassandra中的二级索引和反向索引之间的区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆