为什么二级索引在 Cassandra 中效率较低? [英] Why secondary indexes are less efficient in Cassandra?

查看:85
本文介绍了为什么二级索引在 Cassandra 中效率较低?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Cassandra 文档中读到,创建二级索引的效率较低,因为在最坏的情况下,它需要接触所有节点才能找到该非键列的数据.

I read in Cassandra documentation that creating secondary index is less efficient as because in worst case it need to touch all nodes in order to find out the data of that non-key column.

但我的疑问是,即使我们不创建二级索引,它也必须接触所有节点(在最坏的情况下)并找出具有此非键列值的特定行所在的位置.

But my doubt is even if we do not create secondary index, then also it will have to touch all nodes (in worst case) and find out where that particular row with this non-key column value resides.

注意:是的,我知道如果基数很高,那么二级索引可能会包含(存储)几乎所有行的索引,这样就存储而言很糟糕.但是我想知道不创建二级索引比创建二级索引更有效率?

Note: Yeah, I understand that it is possible that if the cardinality is high then the secondary index will contain(store) index for mostly all rows and in this way it is bad in terms of storage. But I want to know how not creating secondary index is efficient than creating secondary index?

推荐答案

二级索引应该只在特定情况下使用,例如,当您将它们与分区键列的条件一起使用时,您具有正确的数据基数等.

Secondary indexes should be used only in specific cases, like, when you use them together with condition on partition key column, you have correct cardinality for data, etc.

例如,如果我们有下表:

For example, if we have following table:

create table test.test (
  pk int,
  c1 int,
  val1 int,
  val2 int,
  primary key(pk, c1));

并且你在val2列上创建了二级索引,那么下面的查询会非常有效:

and you created a secondary index on the column val2, then following query will be very effective:

select * from test.test where pk = 123 and val2 = 10

因为您将查询的执行限制为仅作为 pk 的副本的节点,其值为 123.

because you restricted the execution of query only to the nodes that are replicas for pk with value 123.

但是如果你这样做了

select * from test.test where val2 = 10

然后 Cassandra 需要去每个节点,并在那里请求数据 - 它会慢得多,并且会给协调节点带来压力.

then Cassandra will need to go to the every node, and ask for data there - it will be much slower, and put a pressure to coordinating node.

标准二级索引还有其他限制,例如仅搜索特定值、列具有非常低或非常高的基数时的问题等.SASI 索引从设计的角度来看更好,尽管它们仍处于试验阶段,并且存在问题执行.

Standard secondary indexes have other limitations, such as, search only for specific values, problems when column has very low or very high cardinality, etc. SASI indexes are better from design standpoint, although they are still experimental, and have problems with implementation.

您可以在以下博文.

DataStax 在商业产品中有其他实现:

DataStax has other implementations in the commercial offering:

  • 基于 Apache Solr 的 DSE 搜索,因此您可以获得很大的灵活性(全文搜索、范围查询等)
  • 称为 SSTable 附加索引 (SAI) 的新实现 - 它们目前被标记为测试版,但它们比标准二级索引提供更大的灵活性,并且比 DSE 搜索的开销更少

这篇关于为什么二级索引在 Cassandra 中效率较低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆