在MySQL上使用NoSQL数据库 [英] Using a NoSQL database over MySQL

查看:187
本文介绍了在MySQL上使用NoSQL数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在Java堆栈上运行的Web应用程序(Struts 2 + Spring + Hibernate)并且在MySQL中持久存在。我查看了NoSQL数据库,它们比RDBMS更容易推理和使用。这是一个音乐流媒体应用程序,存储艺术家信息,并允许用户保存播放列表。

I have a web application running on Java stack (Struts 2 + Spring + Hibernate) and persisted in MySQL. I looked at NoSQL databases and they are certainly easy to reason about and work with than a RDBMS. It's a music streaming app which stores artist information and allows users to save playlists.

我想知道切换到NoSQL DB(CouchDB?,MongoDB?)是否有任何优势(性能?,硬件成本?,简化代码?,可扩展性?)? ,卡珊德拉?)。切换到NoSQL数据库会损失/获得什么?

I am wondering whether there are any advantages (performance?, hardware cost?, simplified code?, scalability?) of switching to a NoSQL DB (CouchDB?, MongoDB?, Cassandra?). What would I lose/gain by switching to a NoSQL database?

请指教。

推荐答案

对NoSQL的礼貌解释已成为 Not Only SQL 。如果您的数据确实是真正的关系,或者您的功能取决于连接和ACIDity之类的东西,那么您应该以关系方式存储该数据。在这篇文章中,我将解释如何将MySQL与两个 NoSQL数据存储一起使用。现代的,网络规模的数据存储就是要了解如何为工作挑选最好的工具。

The polite interpretation of "NoSQL" has become Not Only SQL. If you have data that is indeed truly relational, or if your functionality depends on things like joins and ACIDity, then you should store that data in a relational way. In this post, I'll explain how I use MySQL alongside two NoSQL data stores. Modern, web-scale data storage is all about understanding how to pick the best tool(s) for the job(s).

这就是说,NoSQL真的是一种反应事实上,关系方法和思维方式已经应用于实际上并不适合的问题(通常是具有数千万行或更多行的巨大表格)。一旦表变大,典型的SQL最佳实践就是手动分片数据 - 即将表1中的记录1到10,000,000,表B中的10,000,001到20,000,001,等等上。然后,通常在应用程序模型层中,根据该方案执行查找。这就是所谓的应用程序感知缩放。这是时间密集且容易出错的,但是为了在长桌面商店维护MySQL的同时扩展某些东西,它已成为一个或多或少的标准MO。 NoSQL代表应用程序 - 不知道替代。

That said, NoSQL is really a reaction to the fact that the relational method and way of thinking has been applied to problems where it's not actually a very good fit (typically huge tables with tens of millions of rows or more). Once tables get that large, the typical SQL "best practice" has been to manually shard the data -- that is, putting records 1 through 10,000,000 in table A, 10,000,001 through 20,000,001 in table B, and so on. Then, typically in the application model layer, the lookups are performed according to this scheme. This is what's called application-aware scaling. It's time-intensive and error prone, but to scale something up while maintaining MySQL for the long table store, it's become a more or less standard MO. NoSQL represents, to me, the application-unaware alternative.

键值

当我有一个MySQL原型开始变得太大而不是为了自己的好处时,我亲自移动了尽可能多的数据闪电般快速的 Membase ,其性能优于Memcached并增加了持久性。 Membase是一个分布式键值存储,可以或多或少地线性扩展(例如,Zynga使用它来处理每秒50万个操作数),通过向集群添加更多商品服务器 - 因此它是伟大的适合 Amazon EC2 的云时代, Joyent 等。

When I had a MySQL prototype start getting too big for its own good, I personally moved as much data as possible to the lightning-fast Membase, which outperforms Memcached and adds persistence. Membase is a distributed key-value store that scales more or less linearly (Zynga uses it to handle a half-million ops per second, for instance) by adding more commodity servers into a cluster -- it's therefore a great fit for the cloud age of Amazon EC2, Joyent, etc.

众所周知,分布式键值存储是最佳方式获得巨大的线性规模。键值的弱点是可查询性和索引。但即使在关系世界中,可扩展性的最佳实践是尽可能多地将数据卸载到应用程序服务器上,在商用应用程序服务器上进行内存连接,而不是要求中央RDB集群处理所有这些逻辑。由于简单选择加上应用程序逻辑实际上是实现大规模的最佳方式,即使在 MySQL,过渡到像Membase(或其竞争对手,如 Riak )并不是真的过坏了。

It's well known that distributed key-value stores are the best way to get enormous, linear scale. The weakness of key-value is queryability and indexing. But even in the relational world, the best practice for scalability is to offload as much effort onto the application servers as possible, doing joins in memory on commodity app servers instead of asking the central RDB cluster to handle all of that logic. Since simple select plus application logic are really the best way to achieve massive scale even on MySQL, the transition to something like Membase (or its competitors like Riak) isn't really too bad.

文件商店

有时候 - 尽管我认为不像许多人想的那么频繁 - 应用程序的设计固有地需要二级索引,范围可查询性等.NoSQL方法是通过文档存储 MongoDB 。像Membase一样,Mongo在关系数据库特别弱的一些领域非常好,比如 application-unaware scaling, auto-sharding 维持平坦的响应时间,即使数据集大小为。它比Membase慢得多,做纯水平刻度有点棘手,但好处是它具有很高的可查询性。您可以实时查询参数和范围,也可以使用Map / Reduce对真正庞大的数据集执行复杂的批处理操作。

Sometimes -- though I would argue less often than many think -- an application's design inherently requires secondary indices, range queryability, etc. The NoSQL approach to this is through a document store like MongoDB. Like Membase, Mongo is very good in some areas where relational databases are particularly weak, like application-unaware scaling, auto-sharding, and maintaining flat response times even as dataset size balloons. It's significantly slower than Membase and a bit trickier to do pure horizontal scale, but the benefit is that it's highly queryable. You can query on parameters and ranges in real time, or you can use Map/Reduce to perform complex batch operations on truly enormous data sets.

在我提到的同一个项目中上面,它使用Membase来提供大量的实时播放器数据,我们使用MongoDB来存储分析/指标数据,这实际上是MongoDB的亮点。

On the same project I mentioned above, which uses Membase to serve tons of live player data, we use MongoDB to store analytics/metrics data, which is really where MongoDB shines.

为什么要保留SQL?

我简单地谈到了'真正的关系'信息应保持关系的事实数据库。正如评论员Dan K.指出的那样,我错过了讨论离开RDBMS的缺点的部分,或者至少完全不考虑它。

I touched briefly on the fact that 'truly relational' information should stay in relational databases. As commenter Dan K. points out, I missed the part where I discuss the disadvantages of leaving RDBMS, or at least of leaving it entirely.

首先,有SQL本身。 SQL是众所周知的,并且长期以来一直是行业标准。一些NoSQL数据库,如Google的 App Engine 数据存储区(基于Big Table构建)实现了自己的SQL-喜欢语言(Google的名字很可爱,GQL用于 Google查询语言)。 MongoDB以其令人愉快的 JSON查询对象对查询问题采取了全新的方法。尽管如此,SQL本身是一个从数据中获取信息的强大工具,这通常是数据库的重点。

First, there's SQL itself. SQL is well-known and has been an industry standard for a long time. Some "NoSQL" databases like Google's App Engine Datastore (built on Big Table) implement their own SQL-like language (Google's is called, cutely, GQL for Google Query Language). MongoDB takes a fresh approach to the querying problem with its delightful JSON query objects. Still, SQL itself is a powerful tool for getting information out of data, which is often the whole point of databases to begin with.

保持RDBMS的最重要原因是 ACID ,或原子性,一致性,隔离,耐用性。我不会重新哈希Acid-NoSQL的状态,因为它在这篇文章。可以这么说,有一个合理的理由 Oracle的RDBMS 有这样一个无法实现的巨大市场:某些数据需要纯ACID合规性。如果您的数据确实存在(如果确实如此,您可能很清楚这一事实),那么您的数据库也是如此。保持 pH 低!

The most important reason to stay with RDBMS is ACID, or Atomicity, Consistency, Isolation, Durability. I won't re-hash the state of Acid-NoSQL, as it's well-addressed in this post on SO. Suffice it to say, there's a rational reason Oracle's RDBMS has such a huge market that isn't going anywhere: some data needs pure ACID compliance. If your data does (and if it does, you're probably well aware of that fact), then so does your database. Keep that pH low!

修改:查看Aaronaught的帖子此处。他代表的是从商业到商业的角度来看,这比我想象的要好得多,部分原因是因为我的整个职业生涯都在消费领域度过。

Check out Aaronaught's post here. He represents the business-to-business perspective far better than I could, in part because I've spent my entire career in the consumer space.

这篇关于在MySQL上使用NoSQL数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆