如何阻止一个节点上的死锁使整个群集崩溃? [英] How to stop a deadlock on one node from crashing entire cluster?

查看：142 发布时间：2020/5/6 3:29:59 php mysql mysqli mariadb galera

本文介绍了如何阻止一个节点上的死锁使整个群集崩溃?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在MariaDB下运行3x节点Galera群集.该应用程序是使用mysqli扩展名的PHP.

I'm running a 3x node Galera Cluster under MariaDB. The application is in PHP using the mysqli extension.

偶尔我会收到死锁写入.我正在努力改进应用程序，以处理或避免这种故障，但是与此同时，我需要集群在发生这种情况时保持正常运行.

Very occasionally I get a Deadlock on write. I'm working on improving my application to handle or avoid that kind of failure, but in the mean time I need the cluster to stay up when this happens.

问题在于，一旦发生死锁，不仅集群中的一个节点崩溃，而且所有三个节点也会崩溃.死锁发生的节点遭受 MySQL服务器已消失错误，并且max_connect_errors开始永久拒绝连接后，因此需要手动重新启动服务器.

The problem is that as soon as the deadlock occurs, not just one, but all three nodes in the cluster crash. The node where the deadlock originates suffers the MySQL server has gone away error and after max_connect_errors starts refusing connections permanently, thus requiring a manual server restart.

我不明白的是为什么其他节点也会掉线.它们都以"WSREP尚未为应用程序准备节点"开始出错，这意味着整个应用程序崩溃，并且没有数据库节点接受连接.

What I don't get is why the other nodes go down too. They both start erroring with "WSREP has not yet prepared node for application use" which means the entire application crashes with no database nodes accepting connections.

当一个节点遭受罕见的死锁时，如何确保群集的其余部分保持正常运行?

How can I ensure that the rest of the cluster stays up when one node suffers an albeit rare deadlock?

更新:

一个月后，另一个僵局导致了类似的问题.同样，一个节点会破坏一切.

A month later and another deadlock causes a similar problem. Again, one node brings down everything.

第一个连接遇到死锁(在提交阶段)，因此应用程序尝试重播事务.这挂了将近一分钟，然后再次失败.

The first connection gets a deadlock (at commit phase) so the application tries to replay the transaction. This hangs for almost a minute and fails again.

在第一个连接恢复失败后，所有其他连接开始失败，并显示(1205)超出了锁定等待超时"，从而使整个群集无用.

After the first connection fails to recover, all other connections start failing with (1205) "Lock wait timeout exceeded" rendering the entire cluster useless.

我应该补充一点，该应用程序不使用锁.但是，它本身却陷入了困境，只是与常规的事务查询有关.

I should add that the application does not use locks. However it got itself tied in a knot, it's just with regular transactional queries.

如何阻止一个节点上的死锁使整个群集崩溃? [英] How to stop a deadlock on one node from crashing entire cluster?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何阻止一个节点上的死锁使整个群集崩溃? [英] How to stop a deadlock on one node from crashing entire cluster?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭