在 SQL Server(C# 客户端)中批量插入大量数据的最快方法是什么 [英] What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

查看:32
本文介绍了在 SQL Server(C# 客户端)中批量插入大量数据的最快方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 C# 客户端将批量数据插入 SQL Server 2005 数据库时遇到了一些性能瓶颈,我正在寻找加快该过程的方法.

I am hitting some performance bottlenecks with my C# client inserting bulk data into a SQL Server 2005 database and I'm looking for ways in which to speed up the process.

我已经在使用 SqlClient.SqlBulkCopy(基于 TDS)来加速网络上的数据传输,这很有帮助,但我仍在寻找更多.

I am already using the SqlClient.SqlBulkCopy (which is based on TDS) to speed up the data transfer across the wire which helped a lot, but I'm still looking for more.

我有一个看起来像这样的简单表格:

I have a simple table that looks like this:

 CREATE TABLE [BulkData](
 [ContainerId] [int] NOT NULL,
 [BinId] [smallint] NOT NULL,
 [Sequence] [smallint] NOT NULL,
 [ItemId] [int] NOT NULL,
 [Left] [smallint] NOT NULL,
 [Top] [smallint] NOT NULL,
 [Right] [smallint] NOT NULL,
 [Bottom] [smallint] NOT NULL,
 CONSTRAINT [PKBulkData] PRIMARY KEY CLUSTERED 
 (
  [ContainerIdId] ASC,
  [BinId] ASC,
  [Sequence] ASC
))

我在平均大约 300 行的块中插入数据,其中每个块中的 ContainerId 和 BinId 是恒定的,并且 Sequence 值为 0-n,并且这些值是基于主键预先排序的.

I'm inserting data in chunks that average about 300 rows where ContainerId and BinId are constant in each chunk and the Sequence value is 0-n and the values are pre-sorted based on the primary key.

%Disk time 性能计数器在 100% 上花费了大量时间,因此很明显磁盘 IO 是主要问题,但我获得的速度比原始文件副本低几个数量级.

The %Disk time performance counter spends a lot of time at 100% so it is clear that disk IO is the main issue but the speeds I'm getting are several orders of magnitude below a raw file copy.

如果我:

  1. 在我进行插入时删除主键,稍后重新创建它
  2. 插入到具有相同模式的临时表中,并定期将它们传输到主表中,以保持发生插入的表的大小
  3. 还有什么吗?

--根据我收到的回复,让我澄清一下:

-- Based on the responses I have gotten, let me clarify a little bit:

Portman:我使用的是聚集索引,因为当数据全部导入时,我需要按该顺序依次访问数据.我在导入数据时并不特别需要索引.与完全删除约束以进行导入相比,在执行插入时使用非聚集 PK 索引是否有任何优势?

Portman: I'm using a clustered index because when the data is all imported I will need to access data sequentially in that order. I don't particularly need the index to be there while importing the data. Is there any advantage to having a nonclustered PK index while doing the inserts as opposed to dropping the constraint entirely for import?

Chopeen:数据是在许多其他机器上远程生成的(我的 SQL 服务器目前只能处理大约 10 个,但我希望能够添加更多).在本地机器上运行整个过程是不切实际的,因为它必须处理 50 倍的输入数据才能生成输出.

Chopeen: The data is being generated remotely on many other machines (my SQL server can only handle about 10 currently, but I would love to be able to add more). It's not practical to run the entire process on the local machine because it would then have to process 50 times as much input data to generate the output.

Jason:在导入过程中我没有对表进行任何并发查询,我将尝试删除主键,看看是否有帮助.

Jason: I am not doing any concurrent queries against the table during the import process, I will try dropping the primary key and see if that helps.

推荐答案

以下是在 SQL Server 中禁用/启用索引的方法:

Here's how you can disable/enable indexes in SQL Server:

--Disable Index ALTER INDEX [IX_Users_UserID] SalesDB.Users DISABLE
GO
--Enable Index ALTER INDEX [IX_Users_UserID] SalesDB.Users REBUILD

这里有一些资源可以帮助您找到解决方案:

Here are some resources to help you find a solution:

一些批量加载速度比较

使用 SqlBulkCopy 将数据从客户端快速加载到 SQL Server

优化批量复制性能

一定要查看 NOCHECK 和 TABLOCK 选项:

Definitely look into NOCHECK and TABLOCK options:

表格提示 (Transact-SQL)

插入 (Transact-SQL)

这篇关于在 SQL Server(C# 客户端)中批量插入大量数据的最快方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆