SQL Server:合并性能 [英] SQL Server : MERGE performance

查看:99
本文介绍了SQL Server:合并性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个500万行的数据库表.聚簇索引是自动增量标识"列. PK是代码生成的256字节VARCHAR,它是URL的SHA256哈希,这是表上的非聚集索引.

I have a database table with 5 million rows. The clustered index is auto-increment identity column. There PK is a code generated 256 byte VARCHAR which is a SHA256 hash of a URL, this is a non-clustered index on the table.

表如下:

CREATE TABLE [dbo].[store_image](
    [imageSHAID] [nvarchar](256) NOT NULL,
    [imageGUID] [uniqueidentifier] NOT NULL,
    [imageURL] [nvarchar](2000) NOT NULL,
    [showCount] [bigint] NOT NULL,
    [imageURLIndex]  AS (CONVERT([nvarchar](450),[imageURL],(0))),
    [autoIncID] [bigint] IDENTITY(1,1) NOT NULL,
 CONSTRAINT [PK_imageSHAID] PRIMARY KEY NONCLUSTERED 
(
    [imageSHAID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE CLUSTERED INDEX [autoIncPK] ON [dbo].[store_image] 
(
    [autoIncID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

  • imageSHAID是图像URL的SHA256哈希,例如" http://blah.com/image1.jpg ",将其哈希为256个长度的varchar .

    • imageSHAID is a SHA256 hash of an image URL e.g. "http://blah.com/image1.jpg", it is hashed into a varchar of 256 length.

      imageGUID是代码生成的guid,其中我标识了图像(以后将用作索引,但现在我已将该列省略为索引)

      imageGUID is a code generated guid in which I identify the image (it will be used as an index later, but for now I have omitted this column as an index)

      imageURL是图像的完整URL(最多2000个字符)

      imageURL is the full URL of the image (up to 2000 characters)

      showCount是显示图像的次数,每次显示该特定图像时,该次数都会增加.

      showCount is the number of times the image is shown, this is incremented each time this particular image is shown.

      imageURLIndex是由450个字符限制的计算列,这使我可以在应选择的imageURL上进行文本搜索,它是可索引的(为简洁起见,再次省略了索引)

      imageURLIndex is a computed column limited by 450 characters, this allows me to do text searches on the imageURL should I choose to, it is indexable (again index is omitted for brevity)

      autoIncID是聚簇索引,应允许更快地插入数据.

      autoIncID is the clustered index, should allow faster inserting of data.

      我会定期从临时表合并到store_image表中.临时表的结构如下(非常类似于store_image表):

      Periodically I merge from a temp table into the store_image table. The temp table structure is as follows (very similar to the store_image table):

      CREATE TABLE [dbo].[store_image_temp](
          [imageSHAID] [nvarchar](256) NULL,
          [imageURL] [nvarchar](2000) NULL,
          [showCount] [bigint] NULL,
      ) ON [PRIMARY]
      
      GO
      

      运行合并过程时,我使用以下代码将DataTable写入临时表:

      When the merge process is run, I write a DataTable to the temp table using the following code:

      using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null))
      {
          bulk.DestinationTableName = "[dbo].[store_image_temp]";
          bulk.WriteToServer(imageTableUpsetDataTable);
      }
      

      然后我运行合并命令,通过基于imageSHAID从临时表进行合并来更新store_image表中的showCount.如果store_image表中当前不存在该图像,则创建它:

      I then run the merge command to update the showCount in the store_image table by merging from the temp table based on the imageSHAID. If the image doesn't currently exist in the store_image table, I create it:

      merge into store_image as Target using [dbo].[store_image_temp] as Source
      on Target.imageSHAID=Source.imageSHAID 
      when matched then update set 
      Target.showCount=Target.showCount+Source.showCount 
      when not matched then insert values (Source.imageSHAID,NEWID(), Source.imageURL, Source.showCount);
      

      我通常在任何一个合并过程中尝试将temp表中的2k-5k行合并到store_image表中.

      I'm typically trying to merge 2k-5k rows from the temp table to the store_image table at any one merge process.

      我曾经在SSD(仅连接SATA 1)上运行此数据库,并且运行速度非常快(不到200毫秒).我的SSD空间不足,因此将数据库交换到1TB 7200高速缓存旋转磁盘上,因为此后完成时间超过6-100秒(6000-100000MS).运行批量插入时,我可以看到磁盘活动约为1MB-2MB/秒,CPU使用率较低.

      I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 200 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 6-100 seconds (6000 - 100000MS). When the bulk insert is running I can see disk activity of around 1MB-2MB/sec, low CPU usage.

      这是这种数据量的典型写入时间吗?对我来说似乎有点慢,是什么原因导致性能下降?当然,在为imageSHAID编制索引之后,我们应该期望比这更快的查找时间吗?

      Is this a typical write time for this amount of data? It seems a little slow to me, what is causing the slow performance? Surely with the imageSHAID being indexed we should expect quicker seek times than this?

      任何帮助将不胜感激.

      感谢您的时间.

      推荐答案

      MERGE中的UPDATE子句将更新showCount.这需要在聚集索引上进行键查找.

      Your UPDATE clause in the MERGE updates showCount. This requires a key lookup on the clustered index.

      但是,聚簇索引也被声明为非唯一.即使基础列是唯一的,这也将信息提供给优化器.

      However, the clustered index is also declared non-unique. This gives information to the optimiser even though the underlying column is unique.

      所以,我将进行这些更改

      So, I'd make these changes

      • 集群主键为autoIncID
      • imageSHAID上的当前PK是独立的唯一索引(不是约束),并为showCount添加INCLUDE.唯一约束不能包含INCLUDEs
      • the clustered primary key to be autoIncID
      • the current PK on imageSHAID to be a standalone unique index (not constraint) and add an INCLUDE for showCount. Unique constraints can't have INCLUDEs

      更多观察结果:

      • you don't need nvarchar for the hash or URL columns. These are not unicode.
      • A hash is also fixed length so can be char(64) (for SHA2-512).
      • The length of a column defines how much memory to assign to the query. See this for more: is there an advantage to varchar(500) over varchar(8000)?

      这篇关于SQL Server:合并性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆