Postgres在并发更新中陷入僵局 [英] Postgres deadlocks on concurrent upserts

查看:116
本文介绍了Postgres在并发更新中陷入僵局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个应用程序,该应用程序从数据流中读取并将这些信息向上插入数据库。数据是发生在Google云端硬盘上的更改,这意味着影响同一对象的许多事件可能彼此之间非常接近。

We have an application which reads from a data stream and upserts that information into a database. The data is changes which occur on Google Drive which means that many events which impact the same objects can occur very close to each other.

在升级时,我们陷入僵局这些信息进入数据库,这就是日志中显示的内容。我重建并清理了查询的可读性:

We're running into deadlocks when upserting this information into the database, here is what comes out in the log. I have reconstructed and sanitised the query for readability:

ERROR:  deadlock detected
DETAIL:  Process 10586 waits for ShareLock on transaction 166892743; blocked by process 10597.
  Process 10597 waits for ShareLock on transaction 166892741; blocked by process 10586.
  Process 10586: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

  Process 10597: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.167Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.167Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

HINT:  See server log for query details.
CONTEXT:  while locking tuple (3908269,11) in relation "documents"
STATEMENT:  
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

模式:

    Column      |            Type             |                             Modifiers
----------------+-----------------------------+-------------------------------------------------------------------
uid             | uuid                        | not null default gen_random_uuid()
date_created    | timestamp without time zone | not null default now()
sequence_id     | bigint                      | not null default nextval('documents__sequence_id__seq'::regclass)
version         | integer                     | not null default 0
source          | text                        | not null
source_id       | text                        | not null
ingestion_date  | timestamp without time zone | not null

Indexes:
    "documents__pkey" PRIMARY KEY, btree (uid)
    "documents__sequence_id__unique" UNIQUE CONSTRAINT, btree (sequence_id)
    "documents__source__source_id__deleted" UNIQUE, btree (source, source_id)
    "documents__ingestion_date__idx" btree (ingestion_date)
    "documents__source_id__source__idx" btree (source_id, source)

我怀疑问题是这样的:第一笔交易依次锁定了具有source_id alpha,beta,gamma的行,而第二笔交易则以相反的顺序锁定了具有source_id delta,gamma的行,并且死锁发生在他们都锁定了伽玛值和增量的时候,但是这里的时机非常紧张!

I suspect the problem is something like "the first transaction was locking rows with source_id alpha, beta, gamma in sequence, meanwhile the second transaction was locking rows with source_id delta, gamma in the reverse order, and the deadlock happened at the point they both locked gamma and delta", however the timing here is very tight!

对此,解决方案是什么?

What would the solution for this be? Sorting our values list by source_id?

推荐答案

我可以想到三种解决方案:

I can think of three solutions:


  1. 每个语句只插入一行,但是效率很低。

  1. You insert only one row per statement, but that's inefficient.

在插入之前对行进行排序

You sort the rows before inserting them.

如果遇到死锁或序列化错误,请重试事务。

You retry a transaction if it gets a deadlock or serialization error.

我希望第三个解决方案,除非错误经常发生。

I'd prefer the third solution unless the errors happen very often.

这篇关于Postgres在并发更新中陷入僵局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆