SQL INSERT 但避免重复 [英] SQL INSERT but avoid duplicates

查看:158
本文介绍了SQL INSERT 但避免重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做一些快速插入但避免重复到表中.为了论证起见,我们将其称为 MarketPrices,我一直在尝试两种方法,但不确定如何进行基准测试,哪种方法会更快.

I want to do some quick inserts but avoid duplicates into a Table. For argument's sake lets call it MarketPrices, I've been experimenting with two ways of doing it but not sure how to benchmark which will be faster.

INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
SELECT @SecurityCode, @BuyPrice,  @SellPrice, @IsMarketOpen
EXCEPT
SELECT SecurityCode, BuyPrice, SellPrice, j.bool as IsActive FROM MarketPrices
CROSS JOIN (SELECT 0 as bool UNION SELECT 1 as bool ) as j

DECLARE @MktId int
SET @MktId = (SELECT SecurityId FROM MarketPrices 
              where SecurityCode = @SecurityCode 
              and BuyPrice=@BuyPrice 
              and SellPrice = @SellPrice)

IF (@MktId is NULL)  
BEGIN
    INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
    VALUES
    (@SecurityCode,@BuyPrice, @SellPrice, @IsMarketOpen)
END

假设 @whatever 是存储过程中的输入参数.

Assume that @whatever is an input parameter in the stored procedure.

我希望能够在 BuyPrice 或 SellPrice 或两者与之前发生的所有其他事件不同时为每个 SecurityCode 插入新记录.我不关心 IsMarketOpen.​​

I want to be able to insert a new record for every SecurityCode when the BuyPrice or SellPrice or both are different from every other previous occurance. I don't care about IsMarketOpen.

上述任何一种方法有什么明显的愚蠢之处吗?一个比另一个快吗?

Is there anything glaringly stupid about either of the above approaches? Is one faster than the other?

推荐答案

EDIT:防止并发环境中的竞争条件,在相关子查询中使用WITH (UPDLOCK)EXCEPT'd <代码>选择.我在下面写的测试脚本不需要它,因为它使用了仅对当前连接可见的临时表,但在实际环境中,对用户表进行操作时,它是必要的.

EDIT: to prevent race conditions in concurrent environments, use WITH (UPDLOCK) in the correlated subquery or EXCEPT'd SELECT. The test script I wrote below doesn't require it, since it uses temporary tables that are only visible to the current connection, but in a real environment, operating against user tables, it would be necessary.

MERGE 不需要 UPDLOCK.

灵感来自 mcl 的回答:unique index &让数据库抛出错误,我决定对 条件插入与. try/catch.

Inspired by mcl's answer re: unique index & let the database throw an error, I decided to benchmark conditional inserts vs. try/catch.

结果似乎支持 try/catch 上的条件插入,但 YMMV.这是一个非常简单的场景(一列,小表等),在一台机器上执行,等等.

The results appear to support the conditional insert over try/catch, but YMMV. It's a very simple scenario (one column, small table, etc), executed on one machine, etc.

以下是结果(SQL Server 2008,内部版本 10.0.1600.2):

Here are the results (SQL Server 2008, build 10.0.1600.2):

duplicates (short table)    
  try/catch:                14440 milliseconds / 100000 inserts
  conditional insert:        2983 milliseconds / 100000 inserts
  except:                    2966 milliseconds / 100000 inserts
  merge:                     2983 milliseconds / 100000 inserts

uniques
  try/catch:                 3920 milliseconds / 100000 inserts
  conditional insert:        3860 milliseconds / 100000 inserts
  except:                    3873 milliseconds / 100000 inserts
  merge:                     3890 milliseconds / 100000 inserts

  straight insert:           3173 milliseconds / 100000 inserts

duplicates (tall table)
  try/catch:                14436 milliseconds / 100000 inserts
  conditional insert:        3063 milliseconds / 100000 inserts
  except:                    3063 milliseconds / 100000 inserts
  merge:                     3030 milliseconds / 100000 inserts

请注意,即使在独特的插入上,尝试/捕获的开销也比条件插入要稍微.我想知道这是否因版本、CPU、内核数量等而异.

Notice, even on unique inserts, there's slightly more overhead to try/catch than a conditional insert. I wonder if this varies by version, CPU, number of cores, etc.

我没有对 IF 条件插入进行基准测试,只是 WHERE.我假设 IF 变量会显示更多的开销,因为 a) 你会有两个语句,b) 你需要将这两个语句包装在一个事务中并将隔离级别设置为可序列化 (!).如果有人想要对此进行测试,您需要将临时表更改为常规用户表(可序列化不适用于本地临时表).

I did not benchmark the IF conditional inserts, just WHERE. I assume the IF variety would show more overhead, since a) would you have two statements, and b) you would need to wrap the two statements in a transaction and set the isolation level to serializable (!). If someone wanted to test this, you would need to change the temp table to a regular user table (serializable doesn't apply to local temp tables).

脚本如下:

-- tested on SQL 2008.
-- to run on SQL 2005, comment out the statements using MERGE
set nocount on

if object_id('tempdb..#temp') is not null drop table #temp
create table #temp (col1 int primary key)
go

-------------------------------------------------------

-- duplicate insert test against a table w/ 1 record

-------------------------------------------------------

insert #temp values (1)
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-- comment this batch out for SQL 2005
declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-------------------------------------------------------

-- unique insert test against an initially empty table

-------------------------------------------------------

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, straight insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, try/catch: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, except: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

-- comment this batch out for SQL 2005
truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 1, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, merge: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

-------------------------------------------------------

-- duplicate insert test against a table w/ 100000 records

-------------------------------------------------------

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-- comment this batch out for SQL 2005
declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

这篇关于SQL INSERT 但避免重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆