什么是批量最快的方式插入很多在SQL Server中的数据(C#客户端) [英] What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

查看:105
本文介绍了什么是批量最快的方式插入很多在SQL Server中的数据(C#客户端)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打我的C#客户端插入批量数据到SQL Server 2005数据库的一些性能瓶颈,我想找到一种方法来加快这一进程。

I am hitting some performance bottlenecks with my C# client inserting bulk data into a SQL Server 2005 database and I'm looking for ways in which to speed up the process.

我已经使用了SqlClient.SqlBulkCopy(这是基于TDS),以加快跨越这帮助了很多线数据传输,但我仍然在寻找更多。

I am already using the SqlClient.SqlBulkCopy (which is based on TDS) to speed up the data transfer across the wire which helped a lot, but I'm still looking for more.

我有一个简单的表,看起来像这样:

I have a simple table that looks like this:

 CREATE TABLE [BulkData](
 [ContainerId] [int] NOT NULL,
 [BinId] [smallint] NOT NULL,
 [Sequence] [smallint] NOT NULL,
 [ItemId] [int] NOT NULL,
 [Left] [smallint] NOT NULL,
 [Top] [smallint] NOT NULL,
 [Right] [smallint] NOT NULL,
 [Bottom] [smallint] NOT NULL,
 CONSTRAINT [PKBulkData] PRIMARY KEY CLUSTERED 
 (
  [ContainerIdId] ASC,
  [BinId] ASC,
  [Sequence] ASC
))

我在平均约300行,其中ContainerId和BinId是恒定在每个块和序列值是0-n和值是pre排序根据主键块插入数据。

I'm inserting data in chunks that average about 300 rows where ContainerId and BinId are constant in each chunk and the Sequence value is 0-n and the values are pre-sorted based on the primary key.

%磁盘时间性能计数器花费在100%了很多时间,所以很显然,磁盘IO是主要问题,但我得到的速度低于原始文件副本几个数量级。

The %Disk time performance counter spends a lot of time at 100% so it is clear that disk IO is the main issue but the speeds I'm getting are several orders of magnitude below a raw file copy.

这是否帮助任何如果我:

Does it help any if I:


  1. 删除主键,而我做的插入,后来重新创建

  2. 请插入到一个临时表具有相同的架构,并定期将其转移到主表,以保持表的插入的地方正在发生的小尺寸

  3. 还有别的吗?

-
根据我得到的答复,让我澄清一点:

-- Based on the responses I have gotten, let me clarify a little bit:

波特曼:我使用一个聚集索引,因为当数据被所有进口我需要按顺序访问数据顺序。我并不特别需要在那里索引时导入数据。是否有任何优势,有在做刀片,而不是完全放弃约束进口非聚集索引PK?

Portman: I'm using a clustered index because when the data is all imported I will need to access data sequentially in that order. I don't particularly need the index to be there while importing the data. Is there any advantage to having a nonclustered PK index while doing the inserts as opposed to dropping the constraint entirely for import?

Chopeen:正在远程生成的数据在许多其他机器(我的SQL服务器只能处理目前约有10位,但我很想能够添加更多)。它不是实际在本地机器上运行的整个过程,因为它会然后必须处理50倍之多的输入数据以产生输出。

Chopeen: The data is being generated remotely on many other machines (my SQL server can only handle about 10 currently, but I would love to be able to add more). It's not practical to run the entire process on the local machine because it would then have to process 50 times as much input data to generate the output.

杰森:我不是在导入过程中做对表中的任何并发查询,我会尽量放弃主键,看看有没有什么帮助。

Jason: I am not doing any concurrent queries against the table during the import process, I will try dropping the primary key and see if that helps.

推荐答案

您已经在使用<一个href=\"http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx\">SqlBulkCopy,这是一个良好的开端。

You're already using SqlBulkCopy, which is a good start.

然而,仅仅使用SqlBulkCopy类并不一定意味着SQL将执行批量复制。特别是,有一些必须满足SQL Server执行一个有效的批量插入一些要求。

However, just using the SqlBulkCopy class does not necessarily mean that SQL will perform a bulk copy. In particular, there are a few requirements that must be met for SQL Server to perform an efficient bulk insert.

延伸阅读:

  • Prerequisites for Minimal Logging in Bulk Import
  • Optimizing Bulk Import Performance

出于好奇,为什么你的索引建立这样呢?这似乎是ContainerId / BinId /序列为更适合成为一个非聚集索引。有没有你想要这个索引特殊原因要集群?

Out of curiosity, why is your index set up like that? It seems like ContainerId/BinId/Sequence is much better suited to be a nonclustered index. Is there a particular reason you wanted this index to be clustered?

这篇关于什么是批量最快的方式插入很多在SQL Server中的数据(C#客户端)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆