SQL Server 2005中的原子UPSERT [英] Atomic UPSERT in SQL Server 2005
问题描述
在SQL Server 2005中执行原子"UPSERT"(存在的话更新,否则插入)的正确模式是什么?
What is the correct pattern for doing an atomic "UPSERT" (UPDATE where exists, INSERT otherwise) in SQL Server 2005?
我在SO上看到很多代码(例如,参见检查是否存在一行,否则插入)并使用以下两部分模式:
I see a lot of code on SO (e.g. see Check if a row exists, otherwise insert) with the following two-part pattern:
UPDATE ...
FROM ...
WHERE <condition>
-- race condition risk here
IF @@ROWCOUNT = 0
INSERT ...
或
IF (SELECT COUNT(*) FROM ... WHERE <condition>) = 0
-- race condition risk here
INSERT ...
ELSE
UPDATE ...
其中< condition>将是自然键的评估.上述方法似乎都不能很好地处理并发.如果我不能让两行具有相同的自然键,则上述所有情况似乎都存在在竞争条件场景中插入具有相同自然键的行的风险.
where < condition > will be an evaluation of natural keys. None of the above approaches seem to deal well with concurrency. If I cannot have two rows with the same natural key, it seems like all of the above risk inserting rows with the same natural keys in race condition scenarios.
我一直在使用以下方法,但很惊讶没有在人们的回应中看到它,所以我很想知道这是怎么回事:
I have been using the following approach but I'm surprised not to see it anywhere in people's responses so I'm wondering what is wrong with it:
INSERT INTO <table>
SELECT <natural keys>, <other stuff...>
FROM <table>
WHERE NOT EXISTS
-- race condition risk here?
( SELECT 1 FROM <table> WHERE <natural keys> )
UPDATE ...
WHERE <natural keys>
请注意,此处提到的竞争条件不同于先前代码中的竞争条件.在较早的代码中,问题是幻像读取(行在其他会话之间插入到UPDATE/IF之间或SELECT/INSERT之间).在上面的代码中,竞争条件与DELETE有关.在(WHERE NOT EXISTS)执行之后但在INSERT执行之前,是否可以将另一个行删除匹配的行?目前尚不清楚WHERE NOT EXISTS在哪里与UPDATE结合使用来锁定任何内容.
Note that the race condition mentioned here is a different one from the ones in the earlier code. In the earlier code, the issue was phantom reads (rows being inserted between the UPDATE/IF or between the SELECT/INSERT by another session). In the above code, the race condition has to do with DELETEs. Is it possible for a matching row to be deleted by another session AFTER the (WHERE NOT EXISTS) executes but before the INSERT executes? It's not clear where the WHERE NOT EXISTS puts a lock on anything in conjunction with the UPDATE.
这是原子的吗?我找不到在SQL Server文档中将其记录在何处.
Is this atomic? I can't locate where this would be documented in SQL Server documentation.
我意识到这可以通过事务来完成,但是我想我需要将事务级别设置为SERIALIZABLE以避免幻像读取问题?这样的常见问题肯定是过分杀伤力的吗?
I realise this could be done with transactions, but I think I would need to set the transaction level to SERIALIZABLE to avoid the phantom read problem? Surely that is overkill for such a common problem?
推荐答案
INSERT INTO <table>
SELECT <natural keys>, <other stuff...>
FROM <table>
WHERE NOT EXISTS
-- race condition risk here?
( SELECT 1 FROM <table> WHERE <natural keys> )
UPDATE ...
WHERE <natural keys>
- 在第一个INSERT中存在争用条件.密钥在内部查询SELECT期间可能不存在,但在INSERT时确实存在,从而导致密钥冲突.
- 在INSERT和UPDATE之间存在争用条件.在INSERT的内部查询中检查该键时,该键可能存在,但在UPDATE运行时已消失.
- 如果密钥可能丢失,请始终先插入.处理唯一约束违规,后退以更新.
- 如果密钥可能存在,请始终先进行更新.如果未找到行,则插入.处理可能的唯一约束违规,回退以更新.
对于第二种竞争情况,人们可能会争辩说,并发线程无论如何都会删除该键,因此它并不是真正的丢失更新.
For the second race condition one could argue that the key would have been deleted anyway by the concurrent thread, so it's not really a lost update.
最佳解决方案通常是尝试最可能的情况,并在错误失败时处理错误(当然,在事务内部):
The optimal solution is usually to try the most likely case, and handle the error if it fails (inside a transaction, of course):
除了正确性之外,此模式还对速度而言是最佳的:尝试插入和处理异常比进行虚假锁定更为有效.锁定意味着逻辑页面读取(可能意味着物理页面读取),而IO(甚至逻辑)比SEH昂贵.
Besides correctness, this pattern is also optimal for speed: is more efficient to try to insert and handle the exception than to do spurious lockups. Lockups mean logical page reads (which may mean physical page reads), and IO (even logical) is more expensive than SEH.
更新 @Peter
为什么没有一个陈述是原子的"?假设我们有一个简单的表:
Why isn't a single statement 'atomic'? Let's say we have a trivial table:
create table Test (id int primary key);
现在,如果我从两个线程运行一个循环中的单个语句,那将是原子的",正如您所说的,不存在任何竞争条件:
Now if I'd run this single statement from two threads, in a loop, it would be 'atomic', as you say, an no race condition can exist:
insert into Test (id)
select top (1) id
from Numbers n
where not exists (select id from Test where id = n.id);
在短短几秒钟内,就会发生主键冲突:
Yet in only a couple of seconds, a primary key violation occurs:
第2级第1线第4行2627消息
违反主键约束'PK__Test__24927208'.无法在对象'dbo.Test'中插入重复的密钥.
Msg 2627, Level 14, State 1, Line 4
Violation of PRIMARY KEY constraint 'PK__Test__24927208'. Cannot insert duplicate key in object 'dbo.Test'.
那是为什么?您是正确的,因为SQL查询计划将在DELETE ... FROM ... JOIN
,WITH cte AS (SELECT...FROM ) DELETE FROM cte
和其他许多情况下做正确的事情".但是在这些情况下有一个关键的区别:子查询"是指更新或删除操作的目标.对于这种情况,查询计划确实会使用适当的锁,实际上,在某些情况下,例如在实施队列
Why is that? You are correct in that SQL query plan will do the 'right thing' on DELETE ... FROM ... JOIN
, on WITH cte AS (SELECT...FROM ) DELETE FROM cte
and in many other cases. But there is a crucial difference in these cases: the 'subquery' refers to the target of an update or delete operation. For such cases the query plan will indeed use an appropriate lock, in fact I this behavior is critical on certain cases, like when implementing queues Using tables as Queues.
但是在原始问题以及我的示例中,查询优化器将子查询视为查询中的子查询,而不是某些需要特殊锁保护的特殊扫描更新"类型查询.结果是,并发的观察者可以将子查询查找的执行视为不同的操作,从而破坏了语句的原子"行为.除非采取特殊的预防措施,否则多个线程都可以尝试插入相同的值,这两个线程都确信已经检查过并且该值尚不存在.只有一个可以成功,另一个将违反PK. QED.
But in the original question, as well as in my example, the subquery is seen by the query optimizer just as a subquery in a query, not as some special 'scan for update' type query that needs special lock protection. The result is that the execution of the subquery lookup can be observed as a distinct operation by a concurent observerver, thus breaking the 'atomic' behavior of the statement. Unless special precaution is taken, multiple threads can attempt to insert the same value, both convinced they had checked and the value doesn't already exists. Only one can succeed, the other will hit the PK violation. QED.
这篇关于SQL Server 2005中的原子UPSERT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!