可靠和高效的键值数据库为Linux? [英] Reliable and efficient key--value database for Linux?

查看:141
本文介绍了可靠和高效的键值数据库为Linux?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个用于Linux的快速,可靠和内存高效的键值数据库。我的键约128字节,最大值大小可以是128K或256K。数据库子系统不应使用大于约1 MB的RAM。总数据库大小为20G(!),但每次只访问一小部分随机数据。如果需要,我可以移动一些数据blob从数据库(到常规文件),所以大小下降到2 GB的最大。数据库必须在系统崩溃后保持不变,而不会在最近未修改的数据中丢失任何数据。我将有大约100倍的读取比写。如果它可以使用块设备(没有文件系统)作为存储,这是一个加号。我不需要客户端 - 服务器功能,只是一个库。我需要Python绑定(但如果它们不可用,我可以实现它们。)

I need a fast, reliable and memory-efficient key--value database for Linux. My keys are about 128 bytes, and the maximum value size can be 128K or 256K. The database subsystem shouldn't use more than about 1 MB of RAM. The total database size is 20G (!), but only a small random fraction of the data is accessed at a time. If necessary, I can move some data blobs out of the database (to regular files), so the size gets down to 2 GB maximum. The database must survive a system crash without any loss in recently unmodified data. I'll have about 100 times more reads than writes. It is a plus if it can use a block device (without a filesystem) as storage. I don't need client-server functionality, just a library. I need Python bindings (but I can implement them if they are not available).

我应该考虑哪些解决方案,你推荐哪个?

Which solutions should I consider, and which one do you recommend?

候选人我知道哪些可以工作:

Candidates I know of which could work:


  • Tokyo Cabinet (Python绑定为 pytc a>,另请参见 pytc示例代码,支持散列和B +树,事务日志文件等等,在数据库创建时,bucket数组的大小是固定的;作者必须关闭文件以给其他人一个机会;对于每个文件重新打开文件的许多小的写入非常慢;暴君服务器可以帮助处理大量小写; 东京内阁,东京暴君和Berkeley DB

  • VSDB (安全的在NFS上,不锁定;怎么样的障碍?更新是非常慢,但不如在cdb中那么慢;

  • BerkeleyDB (提供崩溃恢复功能) ;提供事务; bsddb Python模块提供绑定)

  • Samba的TDB (包含事务和Python绑定,某些用户经历损坏,有时 mmap()是整个文件, repack 有时将文件大小加倍,如果数据库大于2G(即使在64位系统上),群集实现会产生神秘失败( CTDB )也可用;文件在大量修改后变得太大;文件在大量散列争用后变得太慢;没有内置方法来重建文件;通过锁定单个散列桶来非常快速的并行更新) li>
  • aodbm (附加只有这样才能在系统崩溃时使用Python绑定)

  • hamsterdb (带有Python绑定)

  • C-tree (成熟的多功能商业解决方案高性能,具有功能减少的免费版)

  • 旧的 TDB (2001年起)

  • bitcask (日志结构,用Erlang编写)

  • 各种其他DBM实现(如GDBM,NDBM,QDBM,Perl的SDBM或Ruby's;可能他们没有正确的崩溃恢复)

  • Tokyo Cabinet (Python bindings are pytc, see also pytc example code, supports hashes and B+trees, transaction log files and more, the size of the bucket array is fixed at database creation time; the writer must close the file to give others a chance; lots of small writes with reopening the file for each of them are very slow; the Tyrant server can help with the lots of small writes; speed comparison between Tokyo Cabinet, Tokyo Tyrant and Berkeley DB)
  • VSDB (safe even on NFS, without locking; what about barriers?; updates are very slow, but not as slow as in cdb; last version in 2003)
  • BerkeleyDB (provides crash recovery; provides transactions; the bsddb Python module provides bindings)
  • Samba's TDB (with transactions and Python bindings, some users experienced corruption, sometimes mmap()s the whole file, the repack operation sometimes doubles the file size, produces mysterious failures if the database is larger than 2G (even on 64-bit systems), cluster implementation (CTDB) also available; file grows too large after lots of modifications; file becomes too slow after lots of hash contention; no built-in way to rebuild the file; very fast parallel updates by locking individual hash buckets)
  • aodbm (append-only so survives a system crash, with Python bindings)
  • hamsterdb (with Python bindings)
  • C-tree (mature, versatile commercial solution with high performance, has a free edition with reduced functionality)
  • the old TDB (from 2001)
  • bitcask (log-structured, written in Erlang)
  • various other DBM implementations (such as GDBM, NDBM, QDBM,, Perl's SDBM or Ruby's; probably they don't have proper crash recovery)

我不会使用这些:


  • MemcacheDB (客户端 - 服务器,使用BereleleyDB作为后端) li>
  • cdb (每次写入时需要重新生成整个数据库)

  • http://www.wildsparx.com/apbcdb/(ditto)

  • Redis (保持整个内存中的数据库)

  • SQLite (无需定期抽真空就会变得非常慢,尽管3.1版本和更高版本的sqlite允许 auto_vacuum ing; beware:小型写入事务可能非常缓慢,请参阅Firefox 3.0中的位置栏中的自动完成;

  • MongoDB (过重,将值视为具有内部结构的对象)

  • Firebird (基于SQL的RDBMS,过重)

  • MemcacheDB (client-server, uses BereleleyDB as a backend)
  • cdb (needs to regenerate the whole database upon each write)
  • http://www.wildsparx.com/apbcdb/ (ditto)
  • Redis (keeps the whole database in memory)
  • SQLite (it becomes very slow without periodic vacuuming, see autocompletion in the in the location bar in Firefox 3.0, even though versions 3.1 and later of sqlite allow auto_vacuuming; beware: small writing transactions can be very slow; beware: if a busy process is doing many transactions, other processes starve, and they can never get the lock)
  • MongoDB (too heavy-weight, treats values as objects with internal structure)
  • Firebird (SQL-based RDBMS, too heavy-weight)

FYI,最近关于键值数据库的文章在Linux杂志上。

FYI, a recent article about key--value databases in the Linux magazine.

FYI,一个旧版软件列表

FYI,a MemcacheDB,Redis和Tokyo Cabinet Tyrant的速度比较

StackOverflow上的相关问题:

Related questions on StackOverflow:

  • Key Value Database For Windows?
  • Is there a business proven cloud store / Key=>Value Database? (Open Source)

推荐答案

Cabinet / pytc溶液。它非常快(比使用搁置模块使用任何dbm在我的实现),用于读和写(虽然我也做了更多的阅读)。我的问题是关于python绑定的斯巴达文档,但有足够的示例代码来找出如何做你需要做的。此外,tokyo机柜是相当容易安装(因为是python绑定),不需要服务器(正如你提到的),似乎被积极支持(稳定,但不再在积极发展) 。您可以以只读模式打开文件,允许并发访问或读/写模式,阻止其他进程访问数据库。

I've had good luck with the Tokyo Cabinet/pytc solution. It's very fast (a bit faster than using the shelve module using anydbm in my implementation), both for reading and writing (though I too do far more reading). The problem for me was the spartan documentation on the python bindings, but there's enough example code around to figure out how to do what you need to do. Additionally, tokyo cabinet is quite easy to install (as are the python bindings), doesn't require a server (as you mention) and seems to be actively supported (stable but no longer under active development). You can open files in read-only mode, allowing concurrent access, or read/write mode, preventing other processes from accessing the database.

我在查看各种选项夏天,我得到的建议是这样的:尝试不同的选择,看看什么最适合你。如果只是一个最好的选项,这将是很好,但每个人都在寻找略有不同的功能,并愿意做出不同的权衡。

I was looking at various options over the summer, and the advice I got then was this: try out the different options and see what works best for you. It'd be nice if there were simply a "best" option, but everyone is looking for slightly different features and are willing to make different trade-offs. You know best.

(也就是说,如果您分享了最适合您的效果,以及您为何选择该解决方案!)

(That said, it'd be useful to others if you shared what ended up working the best for you, and why you chose that solution over others!)

这篇关于可靠和高效的键值数据库为Linux?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆