我应该选择什么:MongoDB/Cassandra/Redis/CouchDB? [英] What should I choose: MongoDB/Cassandra/Redis/CouchDB?

查看:18
本文介绍了我应该选择什么:MongoDB/Cassandra/Redis/CouchDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在开发一个非常大的项目,我想知道是否有人可以就我们应该选择什么数据库后端给我一些建议.

我们的系统由 1100 个电子设备组成,这些设备向中央服务器发送信号,然后服务器存储信号信息(信号长约 35 个字节).这些设备每分钟将发送大约 3 个信号,所以如果我们进行数字化,那么数据库中每天将有 4.752.000 条新记录,每月总共有 142.560.000 条新记录.

我们需要一个快速可靠的数据库后端.当然,我们需要对该数据库进行一些复杂的数据挖掘.我们正在对 MongoDB/Cassandra/Redis/CouchDB 进行一些研究,但文档网站仍处于早期阶段.

有什么帮助吗?想法?

非常感谢!

解决方案

不要让空间规模(1000 多个设备)在计算和/或存储规模方面误导您.每秒几十次 35 字节的插入对于任何主流 DBMS 来说都是微不足道的工作负载,即使在低端硬件上运行也是如此.同样,每月 1.42 亿条记录仅在每月 1 到 10 GB 的存储量级,没有任何压缩,包括索引.

在您的问题评论中,您说:

<块引用>

这一切都与可靠性、可扩展性和速度有关.解决方案易于扩展(MongoDB 自动分片?)非常重要,只需加入更多节点,速度也非常重要

可靠性?任何主流 DBMS 都可以保证这一点(假设您的意思是它不会破坏您的数据,也不会崩溃——请参阅我在本答案底部对 CAP 定理的讨论).速度?即使是单机,10~100倍这个工作量应该不是问题.可扩展性?按照目前的速度,一整年的数据,未压缩,甚至完全索引,都可以轻松容纳 100 GB 的磁盘空间(同样,我们已经确定插入率不是问题).

因此,我看不出对像 NoSQL 这样的奇特解决方案,甚至分布式数据库有任何明确的需求——像 MySQL 这样的普通的旧关系数据库就可以了.如果您担心故障转移,只需在主从配置中设置备份服务器.如果我们谈论的是当前规模的 100 或 1000 倍,只需根据数据收集设备的 ID 水平分区几个实例(ie {partition index} = {device id} modulo {分区数}).

请记住,离开关系数据库世界的安全和舒适范围意味着放弃其表示模型丰富的工具集.这将使您的复杂数据挖掘"变得更加困难——您不仅需要将数据放入数据库,还需要将其取出.

综上所述,MongoDB 和 CouchDB 的部署和使用异常简单.它们也很有趣,会让你对任何人(不仅仅是程序员——也包括高管!)更具吸引力.

普遍的看法是,在您建议的三种 NoSQL 解决方案中,Cassandra 是最适合高插入量的(当然,相对而言,我不认为您具有高插入量--这是为 Facebook 设计的);与此相反,它更难处理.因此,除非您有一些未提及的奇怪要求,否则我建议您不要使用它,用于您的用例.

如果您对 NoSQL 部署持肯定态度,则可能需要考虑 CAP 定理.这将帮助您在 MongoDB 和 CouchDB 之间做出决定.这是一个很好的链接:http://blog.nahurst.com/visual-guide-to-nosql-systems.这一切都归结为可靠性"的含义:MongoDB 以可用性换取一致性,而 CouchDB 以一致性换取可用性.(Cassandra 允许您在每个查询中通过指定必须写入/读取多少个服务器才能成功写入/读取来优化此权衡;更新:现在,CouchDB 也可以使用 BigCouch!非常令人兴奋...)

祝你的项目好运.

We're developing a really big project and I was wondering if anyone can give me some advice about what DB backend should we pick.

Our system is compound by 1100 electronic devices that send a signal to a central server and then the server stores the signal info (the signal is about 35 bytes long). How ever these devices will be sending about 3 signals per minute each, so if we do de numbers, that'll be 4.752.000 new records/day on the database, and a total of 142.560.000 new records/month.

We need a DB Backend that is lighting fast and reliable. Of course we need to do some complex data mining on that DB. We're doing some research on the MongoDB/Cassandra/Redis/CouchDB, however the documentation websites are still on early stages.

Any help? Ideas?

Thanks a lot!

解决方案

Don't let the spatial scale (1000+ devices) mislead you as to the computational and/or storage scale. A few dozen 35-byte inserts per second is a trivial workload for any mainstream DBMS, even running on low-end hardware. Likewise, 142 million records per month is only on the order of 1~10 gigabytes of storage per month, without any compression, including indices.

In your question comment, you said:

"It's all about reliability, scalability and speed. It's very important that the solution scales easily (MongoDB autosharding?) just throwing in more nodes, and the speed is also very important

Reliability? Any mainstream DBMS can guarantee this (assuming you mean it's not going to corrupt your data, and it's not going to crash--see my discussion of the CAP theorem at the bottom of this answer). Speed? Even with a single machine, 10~100 times this workload should not be a problem. Scalability? At the current rate, a full year's data, uncompressed, even fully indexed, would easily fit within 100 gigabytes of disk space (likewise, we've already established the insert rate is not an issue).

As such, I don't see any clear need for an exotic solution like NoSQL, or even a distributed database--a plain, old relational database such as MySQL would be just fine. If you're worried about failover, just setup a backup server in a master-slave configuration. If we're talking 100s or 1000s of times the current scale, just horizontally partition a few instances based on the ID of the data-gathering device (i.e. {partition index} = {device id} modulo {number of partitions}).

Bear in mind that leaving the safe and comfy confines of the relational database world means abandoning both its representational model and its rich toolset. This will make your "complex datamining" much more difficult--you don't just need to put data into the database, you also need to get it out.

All of that being said, MongoDB and CouchDB are uncommonly simple to deploy and work with. They're also very fun, and will make you more attractive to any number of people (not just programmers--executives, too!).

The common wisdom is that, of the three NoSQL solutions you suggested, Cassandra is the best for high insert volume (of course, relatively speaking, I don't think you have high insert volume--this was designed to be used by Facebook); this is countered by being more difficult to work with. So unless you have some strange requirements you didn't mention, I would recommend against it, for your use case.

If you're positively set on a NoSQL deployment, you might want to consider the CAP theorem. This will help you decide between MongoDB and CouchDB. Here's a good link: http://blog.nahurst.com/visual-guide-to-nosql-systems. It all comes down to what you mean by "reliability": MongoDB trades availability for consistency, whereas CouchDB trades consistency for availability. (Cassandra allows you to finesse this tradeoff, per query, by specifying how many servers must be written/read for a write/read to succeed; UPDATE: Now, so can CouchDB, with BigCouch! Very exciting...)

Best of luck in your project.

这篇关于我应该选择什么:MongoDB/Cassandra/Redis/CouchDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆