Cassandra 中的二级索引和倒排索引有什么区别? [英] What is the difference between a secondary index and an inverted index in Cassandra?

查看:42
本文介绍了Cassandra 中的二级索引和倒排索引有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我读到这两个时,我认为他们都在解释相同的方法,我用谷歌搜索但什么也没找到.执行上有区别吗?Cassandra自己做二级索引,倒排索引要自己实现?

When I read about these two, I thought both of them are explaining the same approach, I googled but found nothing. Is the difference in implementation? Cassandra does the secondary index itself but inverted index has to be implemented by myself?

顺便说一下,哪个搜索速度更快?

Which is faster in searching, by the way?

推荐答案

主要区别在于 Cassandra 中的二级索引的分布方式与手动倒排索引的分布方式不同.使用内置的二级索引,每个节点都对其本地存储的数据进行索引(使用 LocalPartitioner).通过手动索引,索引的分布独立于存储值的节点.

The main difference is that secondary indexes in Cassandra are not distributed in the same way a manual inverted index would be. With the inbuilt secondary indexes, each node indexes the data it stores locally (using the LocalPartitioner). With manual indexing, the indexes are distributed independently of the nodes that store the values.

这意味着,对于内置索引,每个查询都必须转到每个节点,而如果您手动进行倒排索引,则只需转到一个节点(加上副本)来查询您要查找的值.将索引存储在本地的优点之一是可以使用数据自动更新索引.(尽管,从 Cassandra 1.2 开始,原子批次可以用于此目的,尽管它们有点慢.)

This means that, for the inbuilt indexes, each query must go to each node, whereas if you did inverted indexing manually you would just go to one node (plus replicas) to query the value you were looking up. One advantage of having the index stored locally is that indexes can be updated atomically with the data. (Although, since Cassandra 1.2, the atomic batches could be used for this instead although they are a bit slower.)

这就是为什么不建议将 Cassandra 索引用于非常高的基数数据的原因.如果在每个节点上查找,结果只有一两个,效率低下,手动倒排索引会更好.如果您的查找返回许多结果,那么您无论如何都需要在每个节点上查找,这样内置索引才能正常工作.

This is why Cassandra indexes are not recommended for really high cardinality data. If you are doing a lookup on each node but there are only one or two results, it is inefficient and a manual inverted index will be better. If your lookup returns many results, then you will need to lookup on each node anyway so the inbuilt indexes work well.

使用 Cassandra 的内置索引的另一个优点是索引会延迟更新,因此您无需在每次更新时都进行读取.(请参阅 CASSANDRA-2897.)这可以显着提高索引表的速度具有高写入吞吐量.

A further advantage of using Cassandra's inbuilt indexing is that the indexes are updated lazily, so you don't need to do a read on every update. (See CASSANDRA-2897.) This can be a significant speed improvement for indexed tables with high write throughput.

这篇关于Cassandra 中的二级索引和倒排索引有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆