使用T-SQL Merge语句时如何避免插入重复记录 [英] How to avoid inserting duplicate records when using a T-SQL Merge statement

查看:478
本文介绍了使用T-SQL Merge语句时如何避免插入重复记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用T-SQL的MERGE语句插入许多记录,但是当源表中有重复的记录时,我的查询无法插入.失败原因是:

I am attempting to insert many records using T-SQL's MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused by:

  1. 目标表具有基于两列的主键
  2. 源表中可能包含重复记录,这些记录违反了目标表的主键约束(引发了违反主键约束")

我正在寻找一种方法来更改我的MERGE语句,以便它忽略源表中的重复记录和/或将尝试/捕获INSERT语句以捕获可能发生的异常(即,所有其他INSERT语句将运行不管可能发生的坏蛋有多少)-也许还有更好的方法来解决此问题?

I'm looking for a way to change my MERGE statement so that it either ignores duplicate records within the source table and/or will try/catch the INSERT statement to catch exceptions that may occur (i.e. all other INSERT statements will run regardless of the few bad eggs that may occur) - or, maybe, there's a better way to go about this problem?

这是我要解释的查询示例.下面的示例将100k条记录添加到临时表中,然后尝试将这些记录插入目标表中-

Here's a query example of what I'm trying to explain. The example below will add 100k records to a temp table and then will attempt to insert those records in the target table -

编辑 在我的原始文章中,我仅在示例表中包括两个字段,这些字段让SO朋友提供了DISTINCT解决方案,以避免MERGE语句中出现重复项.我应该提到,在我的实际问题中,表有15个字段,而在这15个字段中,两个字段是集群主键.因此DISTINCT关键字不起作用,因为我需要选择所有15个字段,并忽略基于其中两个字段的重复项.

EDIT In my original post I only included two fields in the example tables which gave way to SO friends to give a DISTINCT solution to avoid duplicates in the MERGE statement. I should have mentioned that in my real-world problem the tables have 15 fields and of those 15, two of the fields are a CLUSTERED PRIMARY KEY. So the DISTINCT keyword doesn't work because I need to SELECT all 15 fields and ignore duplicates based on two of the fields.

我已经更新了下面的查询,以包含更多字段col4.我需要在合并中包含col4,但只需要确保仅col2和col3是唯一的即可.

I have updated the query below to include one more field, col4. I need to include col4 in the MERGE, but I only need to make sure that ONLY col2 and col3 are unique.

-- Create the source table
CREATE TABLE #tmp (
col2 datetime NOT NULL,
col3 int NOT NULL,
col4 int
)
GO

-- Add a bunch of test data to the source table
-- For testing purposes, allow duplicate records to be added to this table
DECLARE @loopCount int = 100000
DECLARE @loopCounter int = 0
DECLARE @randDateOffset int
DECLARE @col2 datetime
DECLARE @col3 int
DECLARE @col4 int

WHILE (@loopCounter) < @loopCount
BEGIN
    SET @randDateOffset = RAND() * 100000
    SET @col2 = DATEADD(MI,@randDateOffset,GETDATE())
    SET @col3 = RAND() * 1000
    SET @col4 = RAND() * 10
    INSERT INTO #tmp
    (col2,col3,col4)
    VALUES
    (@col2,@col3,@col4);

    SET @loopCounter = @loopCounter + 1
END

-- Insert the source data into the target table
-- How do we make sure we don't attempt to INSERT a duplicate record? Or how can we 
-- catch exceptions? Or?
MERGE INTO dbo.tbl1 AS tbl
    USING (SELECT * FROM #tmp) AS src
    ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)
    WHEN NOT MATCHED THEN 
        INSERT (col2,col3,col4)
        VALUES (src.col2,src.col3,src.col4);
GO

推荐答案

已满足您的新规范.只插入最高的col4值:这次我使用group by来防止重复的行.

Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.

MERGE INTO dbo.tbl1 AS tbl 
USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src 
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3) 
WHEN NOT MATCHED THEN  
    INSERT (col2,col3,col4) 
    VALUES (src.col2,src.col3,src.col4); 

这篇关于使用T-SQL Merge语句时如何避免插入重复记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆