Java快速数据存储&恢复 [英] Java Fast Data Storage & Retrieval

查看:174
本文介绍了Java快速数据存储&恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将记录存储到持久存储中并按需检索它。要求如下:

I need to store records into a persistant storage and retrieve it on demand. The requirement is as follows:


  1. 极快的检索和插入

  2. 每条记录都有一个独特的键。此密钥将用于检索记录

  3. 存储的数据应该是持久的,即应在JVM重启时可用

  4. 单独的流程会移动陈旧每天一次记录到RDBMS

  1. Extremely fast retrieval and insertion
  2. Each record will have a unique key. This key will be used to retrieve the record
  3. The data stored should be persistent i.e. should be available upon JVM restart
  4. A separate process would move stale records to RDBMS once a day

你们怎么想?由于延迟问题,我无法使用标准数据库。像HSQLDB / H2这样的内存数据库具有性能约束。此外,记录是简单的字符串对象,不符合SQL条件。我正在考虑某种基于平面文件的解决方案。有任何想法吗?任何开源项目?我相信,必须有人在此之前解决了这个问题。

What do you guys think? I cannot use standard database because of latency issues. Memory databases like HSQLDB/ H2 have performace contraints. Moreover the records are simple string objects and do not qualify for SQL. I am thinking of some kind of flat file based solution. Any ideas? Any open source project? I am sure, there must be someone who has solved this problem before.

推荐答案

有很多不同的工具和方法,但是我认为它们都无法满足所有要求。

There are lot of diverse tools and methods, but I think none of them can shine in all of the requirements.

对于低延迟,您只能依靠内存数据访问 - 磁盘在物理上太慢(和SSD也是如此)。如果数据不适合单个机器的内存,我们必须将数据分配给更多节点,总结足够的内存。

For low latency, you can only rely on in-memory data access - disks are physically too slow (and SSDs too). If data does not fit in the memory of a single machine, we have to distribute our data to more nodes summing up enough memory.

持久性,毕竟我们必须将数据写入磁盘。假设最佳组织
这可以作为后台活动完成,而不会影响延迟。
但是对于可靠性(故障转移,HA或其他),磁盘操作不能完全独立于访问方法:我们必须在修改数据时等待磁盘以使我们的操作变为shure不会消失。 并发还会增加一些复杂性和延迟。

For persistency, we have to write our data to disk after all. Supposing optimal organization this can be done as background activity, not affecting latency. However for reliability (failover, HA or whatever), disk operations can not be totally independent of the access methods: we have to wait for the disks when modifying data to make shure our operation will not disappear. Concurrency also adds some complexity and latency.

数据模型不限制此处:大多数方法都支持访问基于唯一的密钥。

Data model is not restricting here: most of the methods support access based on a unique key.

我们必须决定,


  • 如果数据适合一台机器的内存,或者我们必须找到分布式解决方案,

  • 如果并发是一个问题,或者没有并行操作,

  • 如果可靠性严格,我们不能松动修改,或者我们可以忍受计划外崩溃会导致数据丢失的事实。

解决方案可能


  • 自我实现的数据结构使用标准java库,文件等可能不是最佳解决方案,因为可靠性和低延迟需要巧妙的实施和大量的测试,

  • 传统的RDBMS 具有灵活的数据模型,持久,原子和隔离操作,缓存等 - t嘿实际上知道得太多,而且大部分难以分发。这就是为什么它们太慢,如果你不能关闭不需要的功能,通常就是这种情况。

  • NoSQL 键值存储是很好的选择。这些术语非常模糊,涵盖了大量工具。例如


    • BerkeleyDB或Kyoto Cabinet作为单机持久键值存储(使用B树):如果数据集足够小,可以使用适合一台机器的记忆。

    • Project Voldemort作为分布式键值存储:在内部使用BerkeleyDB java版,简单且分布式,

    • ScalienDB作为分布式键值存储:可靠,但写入速度也不慢。

    • MemcacheDB,Redis其他具有持久性的缓存数据库,

    • 流行的NoSQL系统,如Cassandra,CouchDB, HBase等:主要用于大数据。

    • self implemented data structures using standard java library, files etc. may not be the best solution, because reliability and low latency require clever implementations and lots of testing,
    • Traditional RDBMS s have flexible data model, durable, atomic and isolated operations, caching etc. - they actually know too much, and are mostly hard to distribute. That's why they are too slow, if you can not turn off the unwanted features, which is usually the case.
    • NoSQL and key-value stores are good alternatives. These terms are quite vague, and cover lots of tools. Examples are
      • BerkeleyDB or Kyoto Cabinet as one-machine persistent key-value stores (using B-trees): can be used if the data set is small enough to fit in the memory of one machine.
      • Project Voldemort as a distributed key-value store: uses BerkeleyDB java edition inside, simple and distributed,
      • ScalienDB as a distributed key-value store: reliable, but not too slow for writes either.
      • MemcacheDB, Redis other caching databases with persistency,
      • popular NoSQL systems like Cassandra, CouchDB, HBase etc: used mainly for big data.

      NoSQL工具列表可以找到例如。 此处

      A list of NoSQL tools can be found eg. here.

      Voldemort的性能测试报告子毫秒响应时间,这些可以很容易地实现,但是我们也必须小心硬件(如上面提到的网络属性)。

      Voldemort's performance tests report sub-millisecond response times, and these can be achieved quite easily, however we have to be careful with the hardware too (like the network properties mentioned above).

      这篇关于Java快速数据存储&恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆