SSIS程序包全表加载缓慢 [英] SSIS Package Full Table Load Slow

查看:177
本文介绍了SSIS程序包全表加载缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个SSIS程序包,开发团队显然将其称为慢程序。由于他们没有SSIS ETL人员,因此作为一名DBA,我尝试进行深入研究。以下是我找到的信息:
SQL Server已于2014年版本升级-就地升级到2017年,因此具有两个版本的SSIS。

We have an SSIS package that is apparently termed as 'slow' by the development team. Since they do not have a person with SSIS ETL, as a DBA I tried digging into it. Below is the information I found: SQL Server was 2014 version upgraded -inplace to 2017 so it has SSIS of both versions.


  1. 它们加载了将大小为200 GB的SQL Server表插入SSIS,然后使用命令行zip功能将数据压缩为平面文件。

  2. 数据流任务只需在视图中单击 select * -视图只包含表,没有其他花哨的连接。

  3. 在进行故障排除时,我发现在SQL Server上几乎没有负载,可能因为select命令是在单线程中运行,而不是利用SQL Server内核。

  4. 当我运行相同的select *命令(仅5秒钟,因为它是200 GB的表)时,即使我的命令是单线程的。

  5. 该软件包具有一个配置文件,该SQL文件显示了SQL作业(这是该软件包的运行方式)以及一些连接设置。

  6. 在BIDS中打开程序包,将defaultBufferMaxRows仅显示为10000(可能是默认值va lue)(因为配置文件或任何变量都没有客户值,所以我猜这也是程序包所使用的。)

  1. They load a SQL Server table of size 200 GB into SSIS and then zip the data into flatfile using command line zip functionality.
  2. The data flow task simple hits a select * from view - the view is nothing but containing the table with no other fancy joins.
  3. While troubleshooting I found that on SQL Server, there is hardly any load coming, possibly because the select command is running in single thread and not utilizing SQL server cores.
  4. When I run the same select * command (only for 5 seconds, since it is 200 GB table), even my command is single threaded.
  5. The package has a configuration file that the SQL job shows (this is how the package runs) with some connection settings.
  6. Opening the package in BIDS show defaultBufferMaxRows as 10000 only (possibly default value) (since configuration file or any variables does not has a customer value, I guess this is what the package is using too).

SQL和SSIS在同一服务器上。已为SQL分配了最大的内存,可为SSIS和OS留出大约100 GB的空间。

Both SQL and SSIS are on same server. SQL has been allocated max memory leaving around 100 GB for SSIS and OS.

请与我分享有关如何强制SQL Server使用多个线程运行此select命令的所有想法,以便整个表

Kindly share any ideas on how can I force the SQL Server to run this select command using multiple threads so that entire table gets inside SSIS buffer pool faster.

Edit :我知道 bcp 可以比任何进程更快地读取数据并将其保存到平面文件,但此时必须保持对SSIS软件包的更改最小,并探索可以纳入SSIS软件包的选项。

Edit: I am aware that bcp can read data faster than any process and save it to flatfile but at this point changes to the SSIS package has to be kept minimum and exploring options that can be incorporated within SSIS package.

Edit2 :并行性非常适合我为许多其他查询验证的SQL Server。该表为200 GB。仅仅是SSIS的东西并没有尽我所能地重击数据库。

Edit2: Parallelism works perfectly for my SQL Server as I verified for a lot of other queries.The table in question is 200 GB. It is something with SSIS only which is not hammering my DB as hard as it should.

Edit3 :我已经取得了一些进展,将缓冲区值调整为100 MB和最大行数达到100000,现在该程序包似乎做得更好。当我直接使用dtexec实用程序在服务器上运行此程序包时,它每秒产生40-50 MB的良好负载,但是通过SQL作业,它永远不会产生超过10 MB的负载。因此,我试图找出这种现象。

Edit3: I have made some progress, adjusted the buffer value to 100 MB and max rows to 100000 and now the package seem to be doing better. when I run this package on the server directly using dtexec utility, it generates good load of 40- 50 MB per second but through SQL job it never generates lod more than 10 MB. so I am trying to figure out this behavior.

Edit4 :我发现当我直接从登录到服务器并调用dtexec实用程序运行程序包时,它运行良好,因为它在DB上产生了良好的负载,导致数据I\O保持稳定在30-50 MB\sec之间。
SQL作业中的同一件事永远不会超过10 MB\sec。

Edit4: I found that when I run the package directly from logging to the server and invoking dtexec utility, it runs good because it generates good load on the DB causing data I\O to remain steady between 30-50 MB\sec. The same thing from SQL job never exceeds the I\O more than 10 MB\sec.

我什至尝试使用代理运行程序包并选择cmdline操作但没有变化。代理人确实在这里很烂,在这里有什么问题的任何指针?

I even tried to run the package using agent and opting for cmdline operation but no changes. Agent literally sucks here, any pointers on what could be wrong here?

最终尝试:
我被观察困扰了,我终于:
1)相同的包装通过调用dtexc实用程序
从Windows节点的命令提示符运行时,运行速度快3倍2)完全相同的程序包被具有Windows和SQL Server上的sysadmin权限的SQL Agent调用时,运行速度比上述速度慢3倍

Final Try: I am stumped at the observation I have finally: 1)Same package runs 3x faster when run from command prompt from windows node by invoking dtexc utility 2) Exact same package runs 3 times slower than above when involked by SQL agent which has sysadmin permissions on windows as well as SQL Server

在两种情况下,我都试图查看它们调用的DTEXEC版本,并且它们都调用相同的版本。所以为什么我会这么慢是出于我的理解。

In both cases, I tried to see the version of DTEXEC they invoke, and they both invoke the same version. So why one would be so slow is out of my understanding.

推荐答案

我不认为有一个通用的解决方案此问题,因为在特殊情况下您没有提供太多信息。由于您的数据流任务中包含两个组件(OLE DB源和平面文件目标),因此我将尝试给出与每个组件相关的一些建议。

I don't think that there is a general solution to this issue since it is a particular case that you didn't provide much information. Since there are two components in your data flow task (OLE DB Source and Flat File Destination), I will try to give some suggestions related to each component.

在为每个组件提供建议之前,最好提及以下内容:

Before giving suggestions for each component, it is good to mention the following:


  1. 如果数据流任务中未应用任何转换,则不建议使用此任务。最好使用 bcp实用程序

  2. 检查TempDb和数据库日志大小。

  3. 如果存在聚集索引,请尝试重建它。如果不是,请尝试创建聚簇索引。

  4. 要检查正在减慢包执行速度的组件,请在Visual Studio中打开该包,然后尝试删除平面文件目标并将其替换为虚拟脚本组件(编写任何无用的代码,例如: string s =; 。然后运行包;如果足够快,则说明问题是由平面文件目标引起的,否则您需要对OLE DB源进行故障排除。

  5. 尝试在SQL Server Management Studio中执行查询并显示执行计划。

  6. 检查程序包目标服务器版本属性,并确保它正确无误。

  1. If no transformations are applied within the data flow task, It is not recommended to use this task. It is preferable to use bcp utility
  2. Check the TempDb and the database log size.
  3. If a clustered index exists, try to rebuild it. If not, try to create a clustered index.
  4. To check the component that is slowing the package execution, open the package in Visual Studio and try to remove the flat file destination and replace it with a dummy Script Component (write any useless code, for example: string s = "";). And then run the package; if it is fast enough, then the problem is caused by the Flat File Destination, else you need to troubleshoot the OLE DB Source.
  5. Try executing the query in the SQL Server management studio and shows the execution plan.
  6. Check the package TargetServerVersion property within the package configuration and make sure it is correct.


< h2> OLE DB源

如前所述,您正在使用 Select * from view 查询,其中数据存储在表中,包含大量数据。 SQL Server查询优化器可能会发现,使用表扫描读取数据要比从索引读取更为有效,尤其是在您的表没有聚集索引(行存储或列存储)的情况下。

OLE DB Source

As you mentioned, you are using a Select * from view query where data is stored in a table that contains a considerable amount of data. The SQL Server query optimizer may find that reading data using Table Scan is more efficient than reading from indexes, especially if your table does not have a clustered index (row store or column store).

您可以尝试通过许多方法来改善数据负载:

There are many things you may try to improve data load:


  1. 尝试替换从视图中选择*

  2. 尝试更改OLE DB连接管理器中使用的数据提供程序:SQL Server本机客户端,用于SQL Server的Microsoft OLE DB提供程序 (不是旧版本)

  3. 尝试增加 DefaultBufferMaxRows DefaultBufferSize 属性。 更多信息

  4. 尝试使用带有特定列名的SQL Command代替而不是选择视图名(视图数据访问模式表)。 更多信息

  5. 尝试加载数据块

  1. Try replacing the Select * from view with the original query used to create the view.
  2. Try changing the data provider used in the OLE DB Connection Manager: SQL Server Native Client, Microsoft OLE DB provider for SQL Server (not the old one).
  3. Try increasing the DefaultBufferMaxRows and DefaultBufferSize properties. more info
  4. Try replacing using SQL Command with specific column names instead of selecting the view name (Table of View data access mode). more info
  5. Try to load data in chunks


平面文件目标



  1. 检查平面文件目录是否不在安装SQL Server实例的同一驱动器上

  2. 检查平面文件不在繁忙的驱动器上

  3. 尝试将数据导出到多个平面文件中,而不是一个大文件(将数据拆分成较小的文件),因为当导出的数据大小增加时,单个文件,写入该文件的速度变慢,然后程序包的速度变慢。 (检查上面的第5条建议)

  1. Check that the flat file directory is not located on the same drive where SQL Server instance is installed
  2. Check that the flat file is not located on a busy drive
  3. Try to export data into multiple flat files instead of one huge file (split data into smaller files) , since when the exported data size increase in a single file, writing to this file become slower, then the package will become slower. (Check the 5th suggestion above)

这篇关于SSIS程序包全表加载缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆