数据库索引的排序字符串表(SSTable)或B +树? [英] Sorted String Table (SSTable) or B+ Tree for a Database Index?

查看:217
本文介绍了数据库索引的排序字符串表(SSTable)或B +树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用两个数据库来说明此示例: CouchDB Cassandra



CouchDB



CouchDB使用B +树(使用一个聪明的修改在其仅附加的环境中工作) - 更具体地当文档被修改(插入/更新/删除)时,它们被附加到正在运行的数据库文件,以及一个完整的叶 - >从所有节点的B +树中的节点路径,由紧接在文档之后的更新版本影响。



这些一片一片的索引修订内嵌在修改旁边,使得完整索引是在文件末尾附加的最近索引修改的并集,以及进一步的附加部分 en.wikipedia.org/wiki/B+_tree\"> B+树是O(logn)。




Cassandra



Cassandra将记录键保存在表中(内存中),让我们将它们看作是这个问题的数组, (排序)有序字符串表



我们可以将所有这些表的集合视为索引(根据我的理解)。



Cassandra是compact/合并这些排序字符串表,从而创建索引的更完整的文件表示形式。


搜索排序后的数组




问题



假设在CouchDB中维护部分B +树块和Cassandra中的部分排序字符串索引之间存在类似的复杂程度,并且假设两者都提供O(logn)搜索时间,您认为其中一个将更好地表示数据库索引为什么?



我特别好奇,如果有一个实现细节关于一个,使其特别是有吸引力,或者如果他们都是清洗,你只需选择你喜欢使用的数据结构/对开发人员更有意义。



感谢你的想法。

解决方案

当比较一个BTree索引和一个SSTable索引时,你应该考虑写复杂度:




  • 当随机写入写入时拷贝BTree时,您将产生随机读取(执行叶节点和路径的副本)。因此,当我的写入我在磁盘上顺序,对于大于RAM的数据集,这些随机读取将迅速成为瓶颈。对于类似SSTable的索引,在写入时不会发生此类读取 - 只有顺序写入。


  • 每次更新到BTree可能会产生log_b N IOs - 也就是说,最终可能为每个键写3或4个块。如果密钥大小远小于块大小,这是非常昂贵的。对于类似SSTable的索引,每个写入IO将包含尽可能多的新鲜键,因此每个键的IO成本更像是1 / B。




在实践中,这使SSTable-比BTrees快几千倍(随机写入)。



,我们发现它很容易实现类似SSTable的索引(几乎)无锁,其中BTrees的锁定策略已变得相当复杂。



重新考虑您的阅读成本。你的正确比一个BTree是O(log_b N)随机IO的随机点读取,但类似SSTable的索引实际上是O(#sstables.log_b N)。没有一个合适的合并方案,#sstables是与N成正比。有各种各样的技巧(例如使用Bloom过滤器),但这些不帮助小,随机范围查询。这是我们在Cassandra找到的:



http://www.acunu.com/blogs/richard-low/cassandra-under-heavy-write-load-part-ii/



这就是为什么Castle,我们的(GPL)存储引擎会略有不同地合并,并且可以实现更好的(O(log ^ 2 N))范围查询性能在写性能(O(log 2 N / B))中略微折衷。在实践中,我们发现它比Cassandra的SSTable索引写的更快。



如果你想更多地了解这一点,我已经谈了如何works:




Using two databases to illustrate this example: CouchDB and Cassandra.

CouchDB

CouchDB uses a B+ Tree for document indexes (using a clever modification to work in their append-only environment) - more specifically as documents are modified (insert/update/delete) they are appended to the running database file as well as a full Leaf -> Node path from the B+ tree of all the nodes effected by the updated revision right after the document.

These piece-mealed index revisions are inlined right alongside the modifications such that the full index is a union of the most recent index modifications appended at the end of the file along with additional pieces further back in the data file that are still relevant and haven't been modified yet.

Searching the B+ tree is O(logn).

Cassandra

Cassandra keeps record keys sorted, in-memory, in tables (let's think of them as arrays for this question) and writes them out as separate (sorted) sorted-string tables from time to time.

We can think of the collection of all of these tables as the "index" (from what I understand).

Cassandra is required to compact/combine these sorted-string tables from time to time, creating a more complete file representation of the index.

Searching a sorted array is O(logn).

Question

Assuming a similar level of complexity between either maintaining partial B+ tree chunks in CouchDB versus partial sorted-string indices in Cassandra and given that both provide O(logn) search time which one do you think would make a better representation of a database index and why?

I am specifically curious if there is an implementation detail about one over the other that makes it particularly attractive or if they are both a wash and you just pick whichever data structure you prefer to work with/makes more sense to the developer.

Thank you for the thoughts.

解决方案

When comparing a BTree index to an SSTable index, you should consider the write complexity:

  • When writing randomly to a copy-on-write BTree, you will incur random reads (to do the copy of the leaf node and path). So while the writes my be sequential on disk, for datasets larger than RAM, these random reads will quickly become the bottle neck. For a SSTable-like index, no such read occurs on write - there will only be the sequential writes.

  • You should also consider that in the worse case, every update to a BTree could incur log_b N IOs - that is, you could end up writing 3 or 4 blocks for every key. If key size is much less than block size, this is extremely expensive. For an SSTable-like index, each write IO will contain as many fresh keys as it can, so the IO cost for each key is more like 1/B.

In practice, this make SSTable-like thousands of times faster (for random writes) than BTrees.

When considering implementation details, we have found it a lot easier to implement SSTable-like indexes (almost) lock-free, where as locking strategies for BTrees has become quite complicated.

You should also re-consider your read costs. You are correct than a BTree is O(log_b N) random IOs for random point reads, but a SSTable-like index is actually O(#sstables . log_b N). Without an decent merge scheme, #sstables is proportional to N. There are various tricks to get round this (using Bloom Filters, for instance), but these don't help with small, random range queries. This is what we found with Cassandra:

http://www.acunu.com/blogs/richard-low/cassandra-under-heavy-write-load-part-ii/

This is why Castle, our (GPL) storage engine, does merges slightly differently, and can achieve a lot better (O(log^2 N)) range queries performance with a slight trade off in write performance (O(log^2 N / B)). In practice we find it to be quicker than Cassandra's SSTable index for writes as well.

If you want to know more about this, I've given a talk about how it works:

这篇关于数据库索引的排序字符串表(SSTable)或B +树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆