Bloom filter在Cassandra中的作用是什么? [英] What is role of bloom filter in cassandra?

查看:108
本文介绍了Bloom filter在Cassandra中的作用是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从Cassandra文档的两个不同链接中,我发现:



链接1


用于检查内存的结构如果在访问磁盘上的SSTable之前内存表中存在行数据




链接2


Cassandra检查Bloom过滤器以发现哪些SSTable可能具有请求分区数据。


我的问题是以上两种说法对吗?如果是,是否分别为Memtable和SSTable维护Bloom筛选器?

解决方案

一个绽放过滤器是一种通用的数据结构,用于检查集合中是否存在某个元素。它的算法设计得非常快,以冒返回假阳性结果为代价的风险。



Cassandra使用Bloom过滤器来测试是否有任何SSTables可能包含是否请求分区键,没有实际上必须读取其内容(从而避免了昂贵的IO操作)。



如果Bloom过滤器为给定的分区键返回 false ,则可以肯定确定分区键为在相应的SSTable中不存在;但是,如果返回 true ,则SSTable 可能包含分区键。发生这种情况时,Cassandra将诉诸更复杂的技术来确定是否需要读取该SSTable。请注意,大多数读取都会参考Bloom过滤器,并且仅在某些写入期间(将内存表刷新到磁盘时)才会更新它们。您可以在此处中了解更多有关Cassandra的读取路径的信息。 / p>

回到您的问题:



1)第一条语句(存储在内存中的结构用于检查行是否内存中的数据在访问磁盘上的SSTables之前存在于内存中)恕我直言:在将内存表刷新到磁盘后,bloom过滤器的确进行了更新,但它们没有引用该内存表。

2)每个SSTable都维护Bloom过滤器,即磁盘上的每个SSTable在内存中都有一个对应的Bloom过滤器。


From two different links of the Cassandra's documentation, I found:

link 1

A structure stored in memory that checks if row data exists in the memtable before accessing SSTables on disk

and

link2

Cassandra checks the Bloom filter to discover which SSTables are likely to have the request partition data.

My question is does both the above statements are right? If yes, does bloom filters maintained for a Memtable and SSTable separately? Thanks in advance.

解决方案

A Bloom filter is a generic data structure used to check if an element is present in a set or not. Its algorithm is designed to be extremely fast, at the cost of risking to return false positives.

Cassandra uses bloom filters to test if any of the SSTables is likely to contain the requested partition key or not, without actually having to read their contents (and thus avoiding expensive IO operations).

If a bloom filter returns false for a given partition key, then it is absolutely certain that the partition key is not present in the corresponding SSTable; if it returns true, however, then the SSTable is likely to contain the partition key. When this happens, Cassandra will resort to more sophisticated techniques to determine if it needs to read that SSTable or not. Note that bloom filters are consulted for most reads, and updated only during some writes (when a memtable is flushed to disk). You can read more about Cassandra's read path here.

Back to your questions:

1) The first statement ("A structure stored in memory that checks if row data exists in the memtable before accessing SSTables on disk") is IMHO not accurate: bloom filters are indeed updated when a memtable is flushed to disk, but they do not reference the memtable.

2) Bloom filters are maintained per SSTable, i.e. each SSTable on disk gets a corresponding bloom filter in memory.

这篇关于Bloom filter在Cassandra中的作用是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆