我应该选择什么:MongoDB / Cassandra / Redis / CouchDB? [英] What should I choose: MongoDB/Cassandra/Redis/CouchDB?

查看:202
本文介绍了我应该选择什么:MongoDB / Cassandra / Redis / CouchDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在开发一个非常大的项目,我想知道是否有人可以给我一些关于DB后端应该选择的建议。



我们的系统是复合的1100电子设备向中央服务器发送信号,然后服务器存储信号info(信号大约35字节长)。这些设备将如何每分钟发送大约3个信号,因此如果我们做de数字,那么在数据库上每天将有4.752.000个新记录,每月总计142.560.000个新记录。 p>

我们需要一个光照快速可靠的DB后端。当然,我们需要对那个DB做一些复杂的数据挖掘。我们正在对MongoDB / Cassandra / Redis / CouchDB进行一些研究,但文档网站仍处于早期阶段。



任何帮助?想法?



非常感谢!

解决方案

空间规模(1000+个设备)会误导您计算和/或存储规模。每秒几十个35字节的插入对于任何主流DBMS来说都是一个微不足道的工作负载,甚至运行在低端硬件上。同样,每月142万条记录每月只有1〜10吉字节的存储量,没有任何压缩,包括索引。



在您的问题评论中说:


这是关于可靠性,可扩展性和速度的一切都很重要的解决方案容易扩展(MongoDB autosharding?更多节点,速度也非常重要


可靠性任何主流DBMS都能保证这一点(假设你的意思是不会损坏你的数据,它不会崩溃 - 见我的讨论CAP定理在这个答案的底部)速度?即使用一台机器,10〜100倍这个工作量不应该是一个问题。可扩展性?在当前速率,全年的数据,未压缩,甚至完全索引,将很容易适合100千兆字节的磁盘空间(同样,我们已经建立了插入率不是一个问题)。



因此,我没有看到任何明确的需要一个奇异的解决方案,如NoSQL,或者甚至一个分布式数据库 - 一个简单的,旧的关系数据库,如MySQL会很好。如果您担心故障转移,只需在主从配置中设置备份服务器。如果我们说的是当前尺度的100或1000倍,则只需根据数据收集设备的ID水平分割几个实例( {/ div} {partition index} = {device id} modulo {请记住,离开关系数据库世界的安全和舒适的界限意味着放弃它的表示模型 其丰富的工具集。这将使你的复杂数据库更加困难 - 你不需要把数据放入数据库,你还需要得到它。



全部的说法,MongoDB和CouchDB是不常见的简单部署和使用。他们也很有趣,并且会让你对任何数量的人更有吸引力(不仅仅是程序员 - 高管。)



普遍的智慧是, ,在你建议的三个NoSQL解决方案中,Cassandra是最好的高插入体积(当然,相对来说,我不认为你有高插入体积 - 这是设计为使用 Facebook );这是因为更难以处理。因此,除非你有一些奇怪的要求,你没有提及,我会建议反对它,为您的用例。



如果你积极地设置在NoSQL部署,你可能想考虑CAP定理。这将帮助您决定MongoDB和CouchDB之间。这是一个很好的链接: http://blog.nahurst.com/visual-guide-to-nosql-systems 。这一切都归结于你的意思是可靠性: MongoDB交易的可用性一致性,而CouchDB交易一致性的可用性。 (Cassandra允许您通过指定必须为写入/读取成功写入/读取多少个服务器来进行此查询; UPDATE:现在,CouchDB也可以使用 BigCouch !非常令人兴奋...)



您的项目中运气最好。


We're developing a really big project and I was wondering if anyone can give me some advice about what DB backend should we pick.

Our system is compound by 1100 electronic devices that send a signal to a central server and then the server stores the signal info (the signal is about 35 bytes long). How ever these devices will be sending about 3 signals per minute each, so if we do de numbers, that'll be 4.752.000 new records/day on the database, and a total of 142.560.000 new records/month.

We need a DB Backend that is lighting fast and reliable. Of course we need to do some complex data mining on that DB. We're doing some research on the MongoDB/Cassandra/Redis/CouchDB, however the documentation websites are still on early stages.

Any help? Ideas?

Thanks a lot!

解决方案

Don't let the spatial scale (1000+ devices) mislead you as to the computational and/or storage scale. A few dozen 35-byte inserts per second is a trivial workload for any mainstream DBMS, even running on low-end hardware. Likewise, 142 million records per month is only on the order of 1~10 gigabytes of storage per month, without any compression, including indices.

In your question comment, you said:

"It's all about reliability, scalability and speed. It's very important that the solution scales easily (MongoDB autosharding?) just throwing in more nodes, and the speed is also very important

Reliability? Any mainstream DBMS can guarantee this (assuming you mean it's not going to corrupt your data, and it's not going to crash--see my discussion of the CAP theorem at the bottom of this answer). Speed? Even with a single machine, 10~100 times this workload should not be a problem. Scalability? At the current rate, a full year's data, uncompressed, even fully indexed, would easily fit within 100 gigabytes of disk space (likewise, we've already established the insert rate is not an issue).

As such, I don't see any clear need for an exotic solution like NoSQL, or even a distributed database--a plain, old relational database such as MySQL would be just fine. If you're worried about failover, just setup a backup server in a master-slave configuration. If we're talking 100s or 1000s of times the current scale, just horizontally partition a few instances based on the ID of the data-gathering device (i.e. {partition index} = {device id} modulo {number of partitions}).

Bear in mind that leaving the safe and comfy confines of the relational database world means abandoning both its representational model and its rich toolset. This will make your "complex datamining" much more difficult--you don't just need to put data into the database, you also need to get it out.

All of that being said, MongoDB and CouchDB are uncommonly simple to deploy and work with. They're also very fun, and will make you more attractive to any number of people (not just programmers--executives, too!).

The common wisdom is that, of the three NoSQL solutions you suggested, Cassandra is the best for high insert volume (of course, relatively speaking, I don't think you have high insert volume--this was designed to be used by Facebook); this is countered by being more difficult to work with. So unless you have some strange requirements you didn't mention, I would recommend against it, for your use case.

If you're positively set on a NoSQL deployment, you might want to consider the CAP theorem. This will help you decide between MongoDB and CouchDB. Here's a good link: http://blog.nahurst.com/visual-guide-to-nosql-systems. It all comes down to what you mean by "reliability": MongoDB trades availability for consistency, whereas CouchDB trades consistency for availability. (Cassandra allows you to finesse this tradeoff, per query, by specifying how many servers must be written/read for a write/read to succeed; UPDATE: Now, so can CouchDB, with BigCouch! Very exciting...)

Best of luck in your project.

这篇关于我应该选择什么:MongoDB / Cassandra / Redis / CouchDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆