MySQL分片方法? [英] MySQL sharding approaches?

查看:76
本文介绍了MySQL分片方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分片MySQL表的最佳方法是什么. 我能想到的方法是:

What is the best approach for Sharding MySQL tables. The approaches I can think of are :

  1. 应用程序级别分片?
  2. 在MySQL代理层共享?
  3. 用于分片的中央查找服务器吗?

您知道这方面有什么有趣的项目或工具吗?

Do you know of any interesting projects or tools in this area?

推荐答案

最好的方法是将MySQL表分片,除非完全不可避免,否则不要这样做.

The best approach for sharding MySQL tables to not do it unless it is totally unavoidable to do it.

编写应用程序时,通常希望以最大化速度和开发者速度的方式进行编写.仅在必要时才对延迟(直到准备好答案的时间)或吞吐量(每时间单位的答案数)进行优化.

When you are writing an application, you usually want to do so in a way that maximizes velocity, developer speed. You optimize for latency (time until the answer is ready) or throughput (number of answers per time unit) only when necessary.

只有当所有这些分区的总和不再适合单个数据库服务器实例时,才对分区进行分区,然后将其分配给不同的主机(=分片)-原因是写入或读取.

You partition and then assign partitions to different hosts (= shard) only when the sum of all these partitions does no longer fit onto a single database server instance - the reason for that being either writes or reads.

写入情况可能是a)写入频率使该服务器磁盘永久过载,或者b)正在进行的写入过多,因此复制永久滞后于此复制层次结构中.

The write case is either a) the frequency of writes is overloading this servers disks permanently or b) there are too many writes going on so that replication permanently lags in this replication hierarchy.

分片的读取情况是数据的大小太大,以致其工作集不再适合内存,并且数据读取开始击中磁盘,而不是大部分时间从内存中获取数据.

The read case for sharding is when the size of the data is so large that the working set of it no longer fits into memory and data reads start hitting the disk instead of being served from memory most of the time.

仅当您必须进行分片时才能这样做.

Only when you have to shard you do it.

分片的那一刻,您将以多种方式为此付费:

The moment you shard, you are paying for that in multiple ways:

您的许多SQL不再是声明性的.

Much of your SQL is no longer declarative.

通常,在SQL中,您是在告诉数据库所需的数据,然后将其留给优化器以将其转换为数据访问程序.这是一件好事,因为它很灵活,而且因为编写这些数据访问程序是无聊的工作,会损害速度.

Normally, in SQL you are telling the database what data you want and leave it to the optimizer to turn that specification into a data access program. That is a good thing, because it is flexible, and because writing these data access programs is boring work that harms velocity.

在分片环境中,您可能正在将节点A上的表与节点B上的数据连接起来,或者您有一个大于节点的表,在节点A和B上并将其中的数据与节点B上的数据连接起来和C.您将开始手动编写基于应用程序端基于哈希的连接解析以解决该问题(或者您正在重新发明MySQL集群),这意味着您最终将获得大量不再声明性的SQL,但在其中表达了SQL功能.一种过程方法(例如,您在循环中使用SELECT语句).

With a sharded environment you are probably joining a table on node A against data on node B, or you have a table larger than a node, on nodes A and B and are joining data from it against data that is on node B and C. You are starting to write application side hash-based join resolutions manually in order to resolve that (or you are reinventing MySQL cluster), meaning you end up with a lot of SQL that no longer declarative, but is expressing SQL functionality in a procedural way (e.g. you are using SELECT statements in loops).

您将承受大量的网络延迟.

You are incurring a lot of network latency.

通常,SQL查询可以在本地解析,优化器会了解与本地磁盘访问相关的成本,并以使该成本降至最低的方式来解析该查询.

Normally, an SQL query can be resolved locally and the optimizer knows about the costs associated with local disk accesses and resolves the query in a way that minimizes the costs for that.

在分片环境中,查询可以通过在网络上对多个节点运行键值访问(希望具有批处理键访问,而不是每次往返都进行单个键查找)来解决,或者通过向前推动部分WHERE子句来解决到可以应用它们的节点(称为条件下推"),或者同时应用于这两者.

In a sharded environment, queries are resolved by either running key-value accesses across a network to multiple nodes (hopefully with batched key accesses and not individual key lookups per round trip) or by pushing parts of the WHERE clause onward to the nodes where they can be applied (that is called 'condition pushdown'), or both.

但是,即使在最好的情况下,这也会涉及到比本地情况更多的网络往返,并且更加复杂.尤其是由于MySQL优化器对网络延迟一无所知(好吧,MySQL群集正在慢慢变得更好,但是对于群集外的原始MySQL来说仍然是事实).

But even in the best of cases this involves many more network round trips that a local situation, and it is more complicated. Especially since the MySQL optimizer knows nothing about network latency at all (Ok, MySQL cluster is slowly getting better at that, but for vanilla MySQL outside of cluster that is still true).

您正在失去大量的SQL表达能力.

You are losing a lot of expressive power of SQL.

好吧,这可能不太重要,但是外键约束和其他SQL数据完整性机制无法跨越多个分片.

Ok, that is probably less important, but foreign key constraints and other SQL mechanisms for data integrity are incapable of spanning multiple shards.

MySQL没有API,该API允许按工作顺序进行异步查询.

MySQL has no API which allows asynchronous queries that is in working order.

当相同类型的数据驻留在多个节点上时(例如,节点A,B和C上的用户数据),通常需要针对所有这些节点解析水平查询(查找尚未登录的所有用户帐户" 90天或更长时间").数据访问时间随节点数线性增长,除非可以并行请求多个节点并在结果进入时对其进行汇总("Map-Reduce").

When data of the same type resides on multiple nodes (e.g. user data on nodes A, B and C), horizontal queries often need to be resolved against all of these nodes ("Find all user accounts that have not been logged in for 90 days or more"). Data access time grows linearly with the number of nodes, unless multiple nodes can be asked in parallel and the results aggregated as they come in ("Map-Reduce").

先决条件是异步通信API,对于MySQL来说,它尚不具备良好的工作状态.另一种选择是在子过程中进行大量分叉和建立联系,这正在访问一个季节传递过程中的烂醉如泥的世界.

The precondition for that is an asynchronous communication API, which does not exist for MySQL in a good working shape. The alternative is a lot of forking and connections in the child processes, which is visiting the world of suck on a season pass.

一旦开始分片,数据结构和网络拓扑将成为您应用程序的性能指标.为了合理地执行性能,您的应用程序需要意识到这些事情,这意味着实际上只有应用程序级分片才有意义.

Once you start sharding, data structure and network topology become visible as performance points to your application. In order to perform reasonably well, your application needs to be aware of these things, and that means that really only application level sharding makes sense.

问题是,如果您想自动分片(例如,通过对主键进行散列来确定哪行进入哪个节点),或者是否要以手动方式进行功能拆分(与xyz用户案例相关的表"转到该主数据库,而abc和def相关表转到该主数据库".

The question is more if you want to auto-shard (determining which row goes into which node by hashing primary keys for example) or if you want to split functionally in a manual way ("The tables related to the xyz user story go to this master, while abc and def related tables go to that master").

功能分片的优点是,如果操作正确,大多数时候大多数开发人员都看不到它,因为与用户故事相关的所有表都可以在本地使用.这样一来,他们仍然可以尽可能长地受益于声明式SQL,并且由于跨网络传输的次数保持最小,因此还可以减少网络延迟.

Functional sharding has the advantage that, if done right, it is invisible to most developers most of the time, because all tables related to their user story will be available locally. That allows them to still benefit from declarative SQL as long as possible, and will also incur less network latency because the number of cross-network transfers is kept minimal.

功能性分片的缺点是它不允许任何单个表都大于一个实例,并且需要设计人员的手动注意.

Functional sharding has the disadvantage that it does not allow for any single table to be larger than one instance, and it requires manual attention of a designer.

功能性分片的优点是,相对容易地对现有代码库进行了很多更改,但变化幅度不大. http://Booking.com 在过去的几年中已经完成了多次,并且对他们来说效果很好.

Functional sharding has the advantage that it is relatively easily done to an existing codebase with a number of changes that is not overly large. http://Booking.com has done it multiple times in the past years and it worked well for them.

说了这么多,看着你的问题,我相信你在问错问题,或者我完全误解了你的问题陈述.

Having said all that, looking at your question, I do believe that you are asking the wrong questions, or I am completely misunderstanding your problem statement.

这篇关于MySQL分片方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆