MySQL的扩展解决方案(复制,群集) [英] Scaling solutions for MySQL (Replication, Clustering)

查看:109
本文介绍了MySQL的扩展解决方案(复制,群集)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

启动上,我正在考虑为数据库扩展解决方案. (至少对我来说)事情变得有些混乱(至少对我而言),它具有复制 MySQL群集复制(来自5.1.6版) ),它是MySQL集群的异步版本. MySQL手册解释了其集群常见问题解答,但是很难确定何时使用其中一个.

我将感谢熟悉这些解决方案之间的区别以及优缺点的人提供的任何建议,以及何时建议使用每种解决方案.

解决方案

我一直在大量阅读可用的选项.我还亲自推荐了高性能MySQL第二版.

这是我设法拼凑而成的:

集群

从一般意义上讲,集群是将负载分布在许多服务器上,而这些服务器在外部应用程序中似乎是一台服务器.

MySQL NDB群集

MySQL NDB Cluster是一个分布式,无内存,无共享的存储引擎,具有同步复制和自动数据分割功能(对不起,我从《高性能》一书中借来的字眼是很好的,但它们放在那儿非常好).对于某些应用程序来说,这可能是一个高性能的解决方案,但是Web应用程序通常无法在其上很好地工作.

主要问题是,除了非常简单的查询(仅接触一个表)之外,群集通常还必须在多个节点上搜索数据,从而使网络等待时间增加,并显着减慢查询的完成时间.由于该应用程序将群集视为一台计算机,因此无法告诉它从哪个节点获取数据.

此外,内存需求不适用于许多大型数据库.

连续红杉

这是MySQL的另一种群集解决方案,它充当MySQL服务器之上的中间件.它提供同步复制,负载平衡和故障转移.它还可以确保请求始终从最新副本中获取数据,并自动选择具有新数据的节点.

我已经阅读了一些好东西,总体而言,这听起来很有希望.

联盟

联合类似于集群,因此我在这里也进行了介绍. MySQL通过联合存储引擎提供联合.与NDB群集解决方案类似,它仅适用于简单查询-但对于复杂查询,群集甚至更糟(因为网络等待时间长得多).

复制和负载平衡

MySQL具有内置功能,可以在不同服务器上创建数据库的复制.这可以用于很多事情-在服务器之间分配负载,热备份,创建测试服务器和故障转移.

复制的基本设置涉及一台主服务器主要处理写操作,而一个或多个从服务器仅处理读操作.更高级的变体是 master-master 配置的变体,它允许将写入扩展为通过同时编写多个服务器来实现.

每个配置都有其优点和缺点,但是它们共享的一个问题是复制滞后-由于MySQL复制是异步的,因此并非所有节点在任何时候都具有最新数据.这就要求应用程序知道复制,并结合复制感知查询才能按预期工作.对于某些应用程序来说,这可能不是问题,但是如果您始终需要最新的数据,事情就会变得有些复杂.

复制需要一些负载平衡以在节点之间分配负载.这可以像对应用程序代码进行一些修改一样简单,也可以使用专用的软件和硬件解决方案.

着色和分割

共享是扩展数据库解决方案的常用方法.您将数据拆分为较小的碎片,并将其散布在不同的服务器节点上.这需要应用程序知道对数据存储的修改才能有效地工作,因为它需要知道在哪里可以找到所需的信息.

有一些抽象框架可用于处理数据分片,例如 Hibernate Shards , Hibernate ORM的扩展(不幸的是在Java中.我正在使用PHP). HiveDB 是另一个这样的解决方案,它也支持分片重新平衡.

其他

Sphinx

Sphinx 是一个全文本搜索引擎,其功能远不止测试搜索.对于许多查询,它比MySQL快得多(尤其是对于分组和排序),并且可以并行查询远程系统并汇总结果-这使其在分片中非常有用.

通常,狮身人面像应与其他扩展解决方案一起使用,以获取更多可用的硬件和基础架构.不利之处在于,您再次需要应用程序代码来了解sphinx,以便明智地使用它.

摘要

缩放解决方案根据需要它的应用程序的需求而有所不同.对于我们和大多数Web应用程序,我相信复制(可能是多主服务器)是负载平衡器分配负载的一种方式.为了能够水平扩展,特定问题区域(巨大的表格)的分片也是必不可少的.

我还将对Continentant Sequoia进行一下测试,看看它是否能够真正实现它所承诺的目标,因为它将对应用程序代码进行的更改最少.

At the startup I'm working at we are now considering scaling solutions for our database. Things get somewhat confusing (for me at least) with MySQL, which has the MySQL cluster, replication and MySQL cluster replication (from ver. 5.1.6), which is an asynchronous version of the MySQL cluster. The MySQL manual explains some of the differences in its cluster FAQ, but it is hard to ascertain from it when to use one or the other.

I would appreciate any advice from people who are familiar with the differences between those solutions and what are the pros and cons, and when do you recommend to use each.

解决方案

I've been doing A LOT of reading on the available options. I also got my hands on High Performance MySQL 2nd edition, which I highly recommend.

This is what I've managed to piece together:

Clustering

Clustering in the general sense is distributing load across many servers that appear to an outside application as one server.

MySQL NDB Cluster

MySQL NDB Cluster is a distributed, in-memory, shared-nothing storage engine with synchronous replication and automatic data partioning (excuse me I borrow literally from the High Performance book, but they put it very nicely there). It can be a high performance solution for some applications, but web application generally do not work well on it.

The major problem is that beyond very simple queries (that touch only one table), the cluster will generally have to search for data on several nodes, allowing network latency to creep in and significantly slow down completion time for queries. Since the application treats the cluster as one computer, it can't tell it which node to fetch the data from.

In addition, the in-memory requirement is not workable for many large databases.

Continuent Sequoia

This is another clustering solution for MySQL, that acts as a middleware on top of the MySQL server. It offers synchronous replication, load balancing and failover. It also ensures that requests always get the data from the latest copy, automatically choosing a node that has the fresh data.

I've read some good things on it, and overall it sounds pretty promising.

Federation

Federation is similar to clustering, so I tugged it here as well. MySQL offers federation via the federated storage engine. Similar to the NDB cluster solution, it works well with simple queries only - but even worse the the cluster for complicated ones (since network latency is much higher).

Replication and load balancing

MySQL has the built in capacity to create replications of a database on different servers. This can be used for many things - splitting the load between servers, hot backups, creating test servers and failover.

The basic setup of replication involves one master server handling mostly writes and one or more slaves handling reads only. A more advanced variation is that of the master-master configuration, which allows to scale writes as well by having several servers writing at the same time.

Each configuration has its pros and cons, but one problem they all share is replication lag - since MySQL replication is asynchronous, not all nodes have the freshest data at all time. This requires the application to be aware of the replication and incorporate replication-aware queries to work as expected. For some applications this might not be a problem, but if you always need the freshest data things get somewhat complicated.

Replication requires some load balancing to split the load between the nodes. This can be as simple as some modifications to the application code, or using dedicated software and hardware solutions.

Sharding and partioning

Sharding is commonly used approach to scale database solutions. You split the data into smaller shards and spread them around different server nodes. This requires the application to be aware of the modification to the data storage to work efficiently, as it needs to know where to find the information it needs.

There are abstraction frameworks available to help deal with data sharding, such as Hibernate Shards, an extension to the Hibernate ORM (which unfortunately is in Java. I'm using PHP). HiveDB is another such solution which also supports shard rebalancing.

Others

Sphinx

Sphinx is a full-text search engine, that can be used for far more than test searches. For many queries it is much faster than MySQL (especially for grouping and sorting), and can query remote systems in parallel and aggregate the results - which make it very useful in use with sharding.

In general sphinx should be used with other scaling solutions to get more of the available hardware and infrastructure. The downside is that again you need the application code to be aware of sphinx to use it wisely.

Summary

Scaling solutions differ depending on the needs of the application that needs it. For us and for most web-applications, I believe that replication (probably multi-master) is the way to go with a load balancer distributing the load. Sharding of specific problem areas (huge tables) is also a must for being able to scale horizontally.

I'm also going to give a shot to Continuent Sequoia and see if it can really do what it promises to since it will involve the least amount of changes to application code.

这篇关于MySQL的扩展解决方案(复制,群集)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆