从MySQL切换到Cassandra - 优点/缺点? [英] Switching from MySQL to Cassandra - Pros/Cons?

查看:190
本文介绍了从MySQL切换到Cassandra - 优点/缺点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一点背景 - 这个问题涉及一个运行在一个小型EC2实例上的项目,并且即将迁移到一个中等的一个。主要的组件是Django,MySQL以及大量使用python和java编写的自定义分析工具,这些工具都是以
的重量来提升的。同样的机器也运行Apache。

For a bit of background - this question deals with a project running on a single small EC2 instance, and is about to migrate to a medium one. The main components are Django, MySQL and a large number of custom analysis tools written in python and java, which do the heavy lifting. The same machine is running Apache as well.

数据模型如下所示 - 大量的实时数据来自各种网络传感器,最理想的是,我想建立一个长时间的投票方法,而不是每15分钟一次的投票(限制计算统计数据和写入数据库本身)。一旦数据进来,我将原始版本存储在
MySQL中,让分析工具松散在这些数据上,并将统计信息存储在另外几个表中。所有这一切都是使用Django呈现的。

The data model looks like the following - a large amount of real time data comes in streamed from various networked sensors, and ideally, I'd like to establish a long-poll approach rather than the current poll every 15 minutes approach (a limitation of computing stats and writing into the database itself). Once the data comes in, I store the raw version in MySQL, let the analysis tools loose on this data, and store statistics in another few tables. All of this is rendered using Django.

我需要的关系功能 -

Relational features I would need -


  • 订购 [Cassandra的API中的SliceRange似乎令人满意]


  • 多个表之间的manytomany关系 [Cassandra SuperColumns似乎对一对多很好]

  • 这个Sphinx给了我一个很好的全文引擎,所以这也是必要的。

  • Order by [SliceRange in Cassandra's API seems to satisy this]
  • Group by
  • Manytomany relations between multiple tables [Cassandra SuperColumns seem to do well for one to many]
  • Sphinx on this gives me a nice full text engine, so thats a necessity too. [On Cassandra, the Lucandra project seems to satisfy this need]

我的主要问题是数据读取非常慢(写也不热)。我现在不想扔大量的钱和硬件,而且我喜欢随着时间的推移而轻松扩展的东西。在这种意义上(或便宜的),垂直缩放MySQL并不是微不足道的。

My major problem is that data reads are extremely slow (and writes aren't that hot either). I don't want to throw a lot of money and hardware on it right now, and I'd prefer something that can scale easily with time. Vertically scaling MySQL is not trivial in that sense (or cheap).

从本质上讲,在阅读了很多关于NOSQL并对MongoDB,Cassandra和Voldemort进行实验后,我的问题是,

So essentially, after having read a lot about NOSQL and experimented with things like MongoDB, Cassandra and Voldemort, my questions are,


  • 在一个中等的EC2实例上, 将在读/写中获得任何好处转移到像Cassandra那样的 这篇文章(pdf)绝对似乎表明了这一点。目前,我会说每分钟几百个写作将成为常态。对于读取 - 由于数据每隔5分钟更改一次,缓存失效必须很快发生。在某些时候,它也应该能够处理大量的并发用户。即使创建了索引,MySQL的应用程序性能目前仍然在大型表上执行一些连接而死亡 - 某些32k行的顺序需要一分钟以上的渲染。 (这可能是EC2虚拟化I / O的工件)。表的大小约为四百五十万行,约有五个这样的表格。

  • On a medium EC2 instance, would I gain any benefits in reads/writes by shifting to something like Cassandra? This article (pdf) definitely seems to suggest that. Currently, I'd say a few hundred writes per minute would be the norm. For reads - since the data changes every 5 minutes or so, cache invalidation has to happen pretty quickly. At some point, it should be able to handle a large number of concurrent users as well. The app performance currently gets killed on MySQL doing some joins on large tables even if indexes are created - something to the order of 32k rows takes more than a minute to render. (This may be an artifact of EC2 virtualized I/O as well). Size of tables is around 4-5 million rows, and there are about 5 such tables.

给定CAP定理,大家都谈到在多个节点上使用Cassandra和最终的一致性。但是,对于刚刚开始增长的项目, 是否有意义的是
部署一个节点Cassandra服务器
?有什么注意事项吗?例如,可以将MySQL替换为Django的后端吗? [这是推荐吗?]

Everyone talks about using Cassandra on multiple nodes, given the CAP theorem and eventual consistency. But, for a project that is just beginning to grow, does it make sense to deploy a one node cassandra server? Are there any caveats? For instance, can it replace MySQL as a backend for Django? [Is this recommended?]

如果我做转移,我猜想我将不得不重写应用程序的一部分做更多的管理因为我需要多次查找来获取行。

If I do shift, I'm guessing I'll have to rewrite parts of the app to do a lot more "administrivia" since I'd have to do multiple lookups to fetch rows.

将MySQL用作键价值存储 ,而不是关系引擎,然后去吗?这样我可以利用大量稳定的API,以及稳定的引擎(并根据需要进行关系)。 (Brett Taylor在Friendfeed上发表的文章 - http://bret.appspot.com/ entry / how-friendfeed-uses-mysql

转载将非常感谢!

谢谢。

推荐答案

Cassandra和今天可用的其他分布式数据库不提供您从sql使用的一种特殊查询支持。这是因为您无法通过连接分发查询,所以重点在于非规范化。

Cassandra and the other distributed databases available today do not provide the kind of ad-hoc query support you are used to from sql. This is because you can't distribute queries with joins performantly, so the emphasis is on denormalization instead.

然而,Cassandra 0.6(beta版正式出局,但您可以构建如果你不耐烦,可以从0.6分支自己)支持Hadoop map / reduce用于分析,这实际上听起来很适合你。

However, Cassandra 0.6 (beta officially out tomorrow, but you can build from the 0.6 branch yourself if you're impatient) supports Hadoop map/reduce for analytics, which actually sounds like a good fit for you.

Cassandra为添加提供了极好的支持新节点无痛,甚至是一组初始组。

Cassandra provides excellent support for adding new nodes painlessly, even to an initial group of one.

这就是说,在几百写入/分钟你将会在mysql上很长时间,长时间。 Cassandra在成为一个关键/价值商店(甚至更好的是键/列家族)方面要好得多,但MySQL在关系数据库方面要好得多。 :)

That said, at a few hundred writes/minute you're going to be fine on mysql for a long, long time. Cassandra is much better at being a key/value store (even better, key/columnfamily) but MySQL is much better at being a relational database. :)

尚未对Cassandra(或其他nosql数据库)支持django。他们正在谈论在1.2之后为下一个版本做点什么,但是基于与pycon的django开发人员交谈,没有人真的确定这将是什么样子。

There is no django support for Cassandra (or other nosql database) yet. They are talking about doing something for the next version after 1.2, but based on talking to django devs at pycon, nobody is really sure what that will look like yet.

这篇关于从MySQL切换到Cassandra - 优点/缺点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆