并行插入单个表的最快方法 [英] Fastest way to insert in parallel to a single table

查看:155
本文介绍了并行插入单个表的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司被寄生的共生伙伴关系所诅咒.要从寄生虫获取数据,我们必须使用痛苦缓慢的odbc连接.我最近确实注意到了,尽管可以通过并行运行查询(甚至在同一张表上)来获得更高的吞吐量.

My company is cursed by a symbiotic partnership turned parasitic. To get our data from the parasite, we have to use a painfully slow odbc connection. I did notice recently though that I can get more throughput by running queries in parallel (even on the same table).

有一个特别大的表,我想从中提取数据并将其移到我们的本地表中.并行运行查询可以更快地获取数据,但是我也可以想象这可能会导致尝试将多个查询中的数据一次写入同一表中而引起问题.

There is a particularly large table that I want to extract data from and move it into our local table. Running queries in parallel I can get data faster, but I also imagine that this could cause issues with trying to write data from multiple queries into the same table at once.

关于如何最好地处理这种情况,您能给我什么建议,以便我可以利用并行使用查询的更快速度?

What advice can you give me on how to best handle this situation so that I can take advantage of the increased speed of using queries in parallel?

我在这里得到了很多反馈,但是我认为我是通过链接服务器(使用odbc驱动程序)提取数据这一事实并不清楚.换句话说,这意味着我可以运行普通的INSERT语句,并且我相信它将提供比SqlBulkCopy或BULK INSERT更好的性能(实际上,我不认为BULK INSERT甚至是一种选择).

I've gotten some great feedback here, but I think I wasn't completely clear on the fact that I'm pulling data via a linked server (which uses the odbc drivers). In other words that means I can run normal INSERT statements and I believe that would provide better performance than either SqlBulkCopy or BULK INSERT (actually, I don't believe BULK INSERT would even be an option).

推荐答案

您是否已阅读

  1. 运行与可用CPU数量一样多的加载过程.如果你有 32个CPU,运行32个并行负载.如果您有8个CPU,则并行运行8个 加载.
  2. 如果您可以控制输入文件的创建,请进行输入 大小可以均匀地除以您的加载线程数 想要并行运行.还要确保所有记录都属于一个 分区,如果要使用交换机分区策略.
  3. 如果要在BULK上运行进程,请使用BULK插入而不是BCP. SQL Server计算机.
  4. 使用表分区来获得另外8-10%的收益,但前提是您的输入 保证文件匹配您的分区功能,这意味着 一个文件中的所有记录必须位于同一分区中.
  5. 使用TABLOCK避免一次锁定行.
  6. 使用ROWS PER BATCH = 2500,如果您是,则使用接近此的值 将多个流导入到一个表中.
  1. Run as many load processes as you have available CPUs. If you have 32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel loads.
  2. If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel. Also make sure all records belong to one partition if you want to use the switch partition strategy.
  3. Use BULK insert instead of BCP if you are running the process on the SQL Server machine.
  4. Use table partitioning to gain another 8-10%, but only if your input files are GUARANTEED to match your partitioning function, meaning that all records in one file must be in the same partition.
  5. Use TABLOCK to avoid row at a time locking.
  6. Use ROWS PER BATCH = 2500, or something near this if you are importing multiple streams into one table.

对于SQL Server 2008,在某些情况下,您可以使用

For SQL Server 2008, there are certain circumstances where you can utilize minimal logging for a standard INSERT SELECT:

SQL Server 2008增强了它可以以最少的方式处理的方法 记录.它支持最少记录的常规INSERT SELECT 陈述.此外,打开跟踪标志610可使SQL Server 2008支持针对新密钥的非空B树进行最少日志记录 导致分配新页面的范围.

SQL Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition, turning on trace flag 610 lets SQL Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.

这篇关于并行插入单个表的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆