用于生产中的rds的MySQL Debezium连接器导致死锁 [英] Mysql debezium connector for rds in production caused deadlocks

查看:456
本文介绍了用于生产中的rds的MySQL Debezium连接器导致死锁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在创建从RDS中的Mysql到弹性搜索的数据管道,以创建搜索索引, 为此,请使用debezium cdc及其mysql源和弹性接收器连接器.

We are creating a data pipeline from Mysql in RDS to elastic search for creating search indexes, and for this using debezium cdc with its mysql source and elastic sink connector.

现在,由于mysql进入rds,我们必须授予mysql用户LOCK TABLE权限,以获取我们想要的cdc的两个表,如文档中所述.

Now as the mysql is in rds we have to give the mysql user LOCK TABLE permission for two tables we wanted cdc, as mentioned in docs.

我们还有其他各种mysql用户在执行事务,这可能需要两个表中的任何一个.

We also have various other mysql users performing transactions which may require any of the two tables.

将mysql连接器连接到生产数据库后,立即创建了一个锁,并且整个系统崩溃了,意识到这一点后,我们很快停止了kafka并删除了该连接器,但是锁仍然在增加,并且仅通过停止运行生产代码并手动终止进程来停止所有新查询后,解决了该问题.

As soon as we connected the mysql connector to our production database there was a lock created and our whole system went down, after realising this we soon stopped the kafka and also removed the connector, but the locks where still increasing and it only solved after we stop all the new queries by stopping our production code from running and manually killing the processes.

这可能是什么潜在原因,我们如何防止这种情况发生?

What could be the potential cause for this, and how could we prevent this ?

推荐答案

我只是在猜测,因为我不知道您的查询流量.我认为您看到的锁增加是等待表锁释放的查询积压.

I'm only guessing because I don't know your query traffic. I would assume the locks you saw increasing were the backlog of queries that had been waiting for the table locks to be released.

我的意思是以下顺序是我所相信的:

I mean the following sequence is what I believe happened:

  1. Debezium在您的两个表上启动表锁定.
  2. 该应用程序仍在工作,并且正在尝试执行访问那些锁定表的查询.查询开始等待释放锁.他们将等待长达1年(这是默认的lock_wait_timeout值).
  3. 您花了几分钟的时间来弄清楚您的网站没有响应的原因,因此大量的查询被堆积起来.可能多达max_connections个.在所有允许的连接都充满了被阻止的查询之后,该应用程序将根本无法连接到MySQL.
  4. 最后,您停止试图读取其数据初始快照的Debezium进程.释放表锁.
  5. 立即释放表锁时,可以继续进行等待的查询.

  1. Debezium starts table locks on your two tables.
  2. The application is still working, and it is trying to execute queries that access those locked tables. The queries begin waiting for the lock to be released. They will wait for up to 1 year (this is the default lock_wait_timeout value).
  3. As you spend some minutes trying to figure out why your site is not responding, a large number of blocked queries accumulate. Potentially as many as max_connections. After all the allowed connections are full of blocked queries, then the application cannot connect to MySQL at all.
  4. Finally you stop the Debezium process that is trying to read its initial snapshot of data. It releases its table locks.
  5. Immediately when the table locks are released, the waiting queries can proceed.

  • 但是,如果它们是INSERT/UPDATE/DELETE/REPLACE或SELECT ... FOR UPDATE或其他锁定语句,它们中的许多确实也需要获取锁.
  • 由于有许多这样的查询排队,因此它们更有可能请求重叠的锁,这意味着它们必须等待彼此完成并释放锁.
  • 此外,由于有数百个查询同时执行,因此它们使系统资源(如CPU)超负荷运行,导致系统负载高,这也使它们都变慢了.因此,完成查询需要更长的时间,因此,如果彼此阻塞,则必须等待更长的时间.

同时,应用程序仍在尝试接受请求,因此正在添加更多查询以执行.它们还受到排队和资源枯竭的影响.

Meanwhile the application is still trying to accept requests, and therefore is adding more queries to execute. They are also subject to the queueing and resource exhaustion.

最终,您停止了该应用程序,这至少使等待查询的队列逐渐完成.随着系统负载下降,MySQL能够更有效地处理查询并很快完成所有查询.

Eventually you stop the application, which at least allows the queue of waiting queries to gradually be finished. As the system load goes down, MySQL is able to process the queries more efficiently and finishes them all pretty soon.

另一个答案的建议是为您的Debezium快照使用只读副本.如果您的应用程序可以在一段时间内从MySQL主实例读取,则在Debezium锁定副本时,不会阻止该副本上的查询.最终,Debezium将完成所有数据的读取,并释放锁,然后继续仅读取binlog.然后,该应用可以继续使用副本作为读取实例.

The suggestion by the other answer to use a read replica for your Debezium snapshot is a good one. If your application can read from the master MySQL instance for a while, then no query will be blocked on the replica while Debezium has it locked. Eventually Debezium will finish reading all the data, and release the locks, and then go on to read only the binlog. Then the app can resume using the replica as a read instance.

如果您的binlog使用GTID,则应该能够使Debezium这样的CDC工具从副本中读取快照,然后完成操作后,切换到主数据库以读取binlog.但是,如果您不使用GTID,那将更加棘手.该工具必须知道主副本上的binlog位置与副本上的快照相对应.

If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that's done, switch to the master to read the binlog. But if you don't use GTID, that's a little more tricky. The tool would have to know the binlog position on the master corresponding to the snapshot on the replica.

这篇关于用于生产中的rds的MySQL Debezium连接器导致死锁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆