需要有关在SQL Server上使用SqlBulkCopy推送信封的建议 [英] Need recommendations on pushing the envelope with SqlBulkCopy on SQL Server

查看：99 发布时间：2020/9/24 5:28:14 sql-server scalability bulkinsert sqlbulkcopy database-performance

本文介绍了需要有关在SQL Server上使用SqlBulkCopy推送信封的建议的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在设计一个应用程序，其中一个方面是应该能够将大量数据接收到SQL数据库中。我将数据库结构设计为具有bigint身份的单个表，如下所示：

  CREATE TABLE MainTable 
（ 
 _id bigint IDENTITY（1,1）不是空主键簇，
 field1，field2，... 
）

由于与我的问题无关，我将省略打算如何执行查询。

我已经编写了一个原型，该原型使用SqlBulkCopy将数据插入到此表中。它在实验室中似乎工作得很好。我能够以〜3K记录/秒的速度插入数千万条记录（完整记录本身相当大，〜4K）。由于此表上的唯一索引是自动递增bigint，因此即使在大量行被压入后，我也没有看到速度下降。

考虑到实验室SQL Server是虚拟的在配置相对较弱的机器（4 Gb RAM，与其他VM磁盘sybsystem共享）的情况下，我期望在物理机上获得显着更高的吞吐量，但这并没有发生，或者说性能提升可忽略不计。我可以，也许可以在物理计算机上将插入速度提高25％。即使配置了3个驱动器RAID0（其性能比单个驱动器快3倍）（由基准测试软件衡量），我也没有任何改善。基本上：更快的驱动器子系统，专用的物理CPU和双RAM几乎没有转化为任何性能提升。，16Gb），我得到了相同的结果。因此，添加更多内核不会改变插入速度。

这时，我一直在尝试以下软件参数，而没有任何明显的性能提升：

修改SqlBulkInsert.BatchSize参数

同时从多个线程插入，并调整线程数

在SqlBulkInsert中使用表锁定选项

通过使用共享内存驱动程序从本地进程插入来消除网络延迟

我试图将性能提高至少2-3倍，而我最初的想法是投掷更多的硬件可以完成操作，但到目前为止还没有。

所以，有人可以推荐我吗？

这里可能会怀疑哪些资源是瓶颈？如何确认？

考虑到只有一个SQL Server系统，是否有一种方法可以尝试获得可靠的可扩展批量插入改进？

更新我确定加载应用程序不是问题。它在一个单独的线程的临时队列中创建记录，因此当有插入时，它像这样（简化）：

  ===>开始记录时间
 int batchCount =（queue.Count-1）/ targetBatchSize + 1; 
 Enumerable.Range（0，batchCount）.AsParallel（）。 
 WithDegreeOfParallelism（MAX_DEGREE_OF_PARALLELISM）.ForAll（i => 
 {
 var batch = queue.Skip（i * targetBatchSize）.Take（targetBatchSize）; 
 var data = MYRECORDTYPE。 MakeDataTable（batch）; 
 var bcp = GetBulkCopy（）; 
 bcp.WriteToServer（data）; 
}）; 
 ====>结束记录时间

记录时间，并且创建队列的部分永远不会占用任何重要的块

UPDATE2 我已经实现了收集该周期中每个操作需要多长时间，其布局如下：

queue.Skip（）。Take（）-可忽略

MakeDataTable（batch）-10％

GetBulkCopy（）-可忽略

WriteToServer（data）-90％

UPDATE3 我正在为SQL的标准版本进行设计，因此我不能依赖分区，因为它仅在企业版中可用。但是我尝试了一种分区方案的变体：

创建了16个文件组（G0至G15），

制作了16个仅可插入（T0至T15）的表，每个表都绑定到其各自的组。表根本没有索引，甚至没有聚集的int身份。

插入数据的线程将遍历所有16个表。这几乎可以保证每个批量插入操作都使用自己的表

这确实使批量插入量提高了20％。 CPU内核，LAN接口，驱动器I / O未最大化，并且仅在最大容量的25％处使用。

UPDATE4 现在已经尽力了。我能够使用以下技术将插入项推入合理的速度：

每个批量插入项进入其自己的表，然后将结果合并为主要对象

为每个批量插入重新创建表，并使用表锁

使用的IDataReader实现包含要监视的SQL等待类型和性能计数器的列表（文档中没有链接锚，但是在文档的优化大容量部分中，这大约占文档的75％）

UPDATE

我花了一段时间才找到链接，但是这篇SQLBits演讲也很值得一看-幻灯片如果您没有时间观看整个东西。它重复了此处链接的一些材料，但还涵盖了有关如何处理特定性能计数器的高发生率的其他一些建议。

I am designing an application, one aspect of which is that it is supposed to be able to receive massive amounts of data into SQL database. I designed the database stricture as a single table with bigint identity, something like this one:
```
CREATE TABLE MainTable
(
   _id bigint IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    field1, field2, ...
)
```
I will omit how am I intending to perform queries, since it is irrelevant to the question I have.

I have written a prototype, which inserts data into this table using SqlBulkCopy. It seemed to work very well in the lab. I was able to insert tens of millions records at a rate of ~3K records/sec (full record itself is rather large, ~4K). Since the only index on this table is autoincrementing bigint, I have not seen a slowdown even after significant amount of rows was pushed.

Considering that the lab SQL server was a virtual machine with relatively weak configuration (4Gb RAM, shared with other VMs disk sybsystem), I was expecting to get significantly better throughput on the physical machine, but it didn't happen, or lets say the performance increase was negligible. I could, maybe get 25% faster inserts on physical machine. Even after I configured 3-drive RAID0, which performed 3 times faster than a single drive (measured by a benchmarking software), I got no improvement. Basically: faster drive subsystem, dedicated physical CPU and double RAM almost didn't translate into any performance gain.

I then repeated the test using biggest instance on Azure (8 cores, 16Gb), and I got the same result. So, adding more cores did not change insert speed.

At this time I have played around with following software parameters without any significant performance gain:
- Modifying SqlBulkInsert.BatchSize parameter
- Inserting from multiple threads simultaneously, and adjusting # of threads
- Using table lock option on SqlBulkInsert
- Eliminating network latency by inserting from a local process using shared memory driver
I am trying to increase performance at least 2-3 times, and my original idea was that throwing more hardware would get tings done, but so far it doesn't.

So, can someone recommend me:
- What resource could be suspected a bottleneck here? How to confirm?
- Is there a methodology I could try to get reliably scalable bulk insert improvement considering there is a single SQL server system?
UPDATE I am certain that load app is not a problem. It creates record in a temporary queue in a separate thread, so when there is an insert it goes like this (simplified):
```
===>start logging time
int batchCount = (queue.Count - 1) / targetBatchSize + 1;
Enumerable.Range(0, batchCount).AsParallel().
    WithDegreeOfParallelism(MAX_DEGREE_OF_PARALLELISM).ForAll(i =>
{
    var batch = queue.Skip(i * targetBatchSize).Take(targetBatchSize);
    var data = MYRECORDTYPE.MakeDataTable(batch);
    var bcp = GetBulkCopy();
    bcp.WriteToServer(data);
});
====> end loging time
```
timings are logged, and the part that creates a queue never takes any significant chunk

UPDATE2 I have implemented collecting how long each operation in that cycle takes and the layout is as follows:
- queue.Skip().Take() - negligible
- MakeDataTable(batch) - 10%
- GetBulkCopy() - negligible
- WriteToServer(data) - 90%
UPDATE3 I am designing for standard version of SQL, so I cannot rely on partitioning, since it's only available in Enterprise version. But I tried a variant of partitioning scheme:
- created 16 filegroups (G0 to G15),
- made 16 tables for insertion only (T0 to T15) each bound to its individual group. Tables are with no indexes at all, not even clustered int identity.
- threads that insert data will cycle through all 16 tables each. This makes it almost a guarantee that each bulk insert operation uses its own table
That did yield ~20% improvement in bulk insert. CPU cores, LAN interface, Drive I/O were not maximized, and used at around 25% of max capacity.

UPDATE4 I think it is now as good as it gets. I was able to push inserts to a reasonable speeds using following techniques:
- Each bulk insert goes into its own table, then results are merged into main one
- Tables are recreated fresh for every bulk insert, table locks are used
- Used IDataReader implementation from here instead of DataTable.
- Bulk inserts done from multiple clients
- Each client is accessing SQL using individual gigabit VLAN
- Side processes accessing the main table use NOLOCK option
- I examined sys.dm_os_wait_stats, and sys.dm_os_latch_stats to eliminate contentions
I have a hard time to decide at this point who gets a credit for answered question. Those of you who don't get an "answered", I apologize, it was a really tough decision, and I thank you all.

UPDATE5: Following item could use some optimization:
- Used IDataReader implementation from here instead of DataTable.
Unless you run your program on machine with massive CPU core count, it could use some re-factoring. Since it is using reflection to generate get/set methods, that becomes a major load on CPUs. If performance is a key, it adds a lot of performance when you code IDataReader manually, so that it is compiled, instead of using reflection
解决方案
For recommendations on tuning SQL Server for bulk loads, see the Data Loading and Performance Guide paper from MS, and also Guidelines for Optimising Bulk Import from books online. Although they focus on bulk loading from SQL Server, most of the advice applies to bulk loading using the client API. This papers apply to SQL 2008 - you don't say which SQL Server version you're targetting
Both have quite a lot of information which it's worth going through in detail. However, some highlights:
- Minimally log the bulk operation. Use bulk-logged or simple recovery. You may need to enable traceflag 610 (but see the caveats on doing this)
- Tune the batch size
- Consider partitioning the target table
- Consider dropping indexes during bulk load
Nicely summarised in this flow chart from Data Loading and Performance Guide:

As others have said, you need to get some peformance counters to establish the source of the bottleneck, since your experiments suggest that IO might not be the limitation. Data Loading and Performance Guide includes a list of SQL wait types and performance counters to monitor (there are no anchors in the document to link to but this is about 75% through the document, in the section "Optimizing Bulk Load")

UPDATE

It took me a while to find the link, but this SQLBits talk by Thomas Kejser is also well worth watching - the slides are available if you don't have time to watch the whole thing. It repeats some of the material linked here but also covers a couple of other suggestions for how to deal with high incidences of particular performance counters.

这篇关于需要有关在SQL Server上使用SqlBulkCopy推送信封的建议的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

需要有关在SQL Server上使用SqlBulkCopy推送信封的建议 [英] Need recommendations on pushing the envelope with SqlBulkCopy on SQL Server

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

需要有关在SQL Server上使用SqlBulkCopy推送信封的建议 [英] Need recommendations on pushing the envelope with SqlBulkCopy on SQL Server

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭