如何在SSIS中删除DataFlow Task中的列? [英] How can I delete the columns in DataFlow Task in SSIS?

查看:150
本文介绍了如何在SSIS中删除DataFlow Task中的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用SQL Server 2016,我有一个非常忙的DataFlow task.在我的DataFlow task中,出于某些原因,我使用Multicast component.在我的DataFlow中创建新的流程之后,我需要删除新流程中的某些列,因为它们无用.

I use SQL Server 2016 and I have a very busy DataFlow task. In my DataFlow task, I use Multicast component for some reason. After creating a new Flow in my DataFlow, I need to delete some of the columns in the new flow because they are useless.

仅需了解更多信息,我需要这样做,因为我的流程中有200多个列,而这些列中的少于10个.

Just for more information, I need to do that because I have more than 200 columns in my flow and I need less than 10 of those columns.

如何在SSIS中删除DataFlow Task中的列?

How can I delete the columns in DataFlow Task in SSIS?

推荐答案

可以添加某种额外的组件.但是,这绝不会降低复杂性或提高性能.从逻辑上考虑一下,您正在添加一个需要维护的附加接口.从性能角度看,任何消除列的方法都意味着将一组行从一个缓冲区复制到整个其他缓冲区.这称为异步转换,最好在此处进行描述此处.您可以想象,复制行的效率不如就地更新行.

You can add an extra component of some sort. However, this will never reduce complexity or improve performance. Just thinking about it, logically, you are adding an additional interface that needs to be maintained. Performance-wise, anything that will eliminate columns means copying one set of rows from one buffer to a whole other buffer. This is called an asynchronous transformation, and it is better described here and here. You can imagine that copying rows is less efficient than updating them in place.

以下是一些减少复杂性的建议,这些建议又可以提高性能:

Here are some recommendations for reducing complexity, which will, in turn, improve performance:

  • 减少源中的列.如果您选择的列 随后不以任何方式使用它们,然后将它们从查询中删除 或从源组件中取消选中它们.以这种方式删除列会将其从缓冲区中删除,这将占用更少的内存.
  • 减少数据流中的组件数量.很长的数据流很容易创建,难以测试甚至难以维护.数据流期望一个工作单元,即从此处到那里的数据流,中间有一些东西.实际上,这就是数据流的亮点,它们通过内存限制和最大线程数来保护自己免受复杂性的影响.最好将工作分为单独的数据流或存储的proc.您可以将数据暂存到表中并读取两次,而不是例如使用多播.
  • 使用数据库. SSIS既是编排工具,又是数据移动工具.我经常发现,使用简单的数据流来暂存数据,然后调用存储过程来处理数据,它们总是胜过多合一数据流.
  • 增加写入数据的次数.这是完全违反直觉的,但是如果您以较小的一组操作来处理数据,则它运行速度更快且更易于测试.给定一个干净的表盘,我通常会设计一个ETL,将数据从源写入到登台表,执行从阶段表到另一个表的清理步骤,还可以选择添加一个整合步骤,以将来自不同源的数据组合到另一个表中, ,最后是加载目标表的最后一步.请注意,每个源都被推送到其自己的目标表,然后利用数据库进行合并.第一步和最后一步设置为快速运行,并避免在任一端锁定或阻塞.
  • 批量加载.当您确保正在进行批量加载时,上一步确实做得很好.这可能是一件棘手的事情,但是通常您可以通过在OLEDB目标中使用快速加载"并使用oledb命令从不到达那里.删除索引并重新添加索引要比就地加载要快(少数例外).
  • Reduce the columns at the source. If you are selecting columns that are not subsequently used in any way, then remove them from the query or uncheck them from the source component. Removing columns in this way removes them from the buffer, which will occupy less memory.
  • Reduce the number of components in the dataflow. Very long dataflows are easy to create, a pain to test and even harder to maintain. Dataflows are expecting a unit of work, i.e. a data stream from here to there with a few things in the middle. This is where dataflows shine, in fact, they protect themselves from complexity with memory limitations and a max number of threads. It is better to divide the work into separate dataflows or stored procs. You could stage the data into a table and read it twice, rather than use a multicast, for example.
  • Use the database. SSIS is as much an orchestration tool as it is a data-moving tool. I have often found that using simple dataflows to stage the data, followed by calls to stored procedures to process the data, always out-performs an all-in-one dataflow.
  • Increase the number of times you write the data. This is completely counter intuitive, but if you process data in smaller sets of operations, it is faster running and easier to test. Given a clean slate, I will often design an ETL to write data from the source to a staging table, perform a cleansing step from the stage table to another, optionally, add a conforming step to combine data from different sources to yet another table and, finally, a last step to load a target table. Note that each source is pushed to its own target table and later combined, leveraging the database. The first and last steps are set up to run fast and avoid locking or blocking at either end.
  • Bulk Load. The prior step really does well, when you insure that bulk loading is happening. This can be a tricky thing, but generally you can get there by using "fast load" in the OLEDB destination and by never using the oledb command. Removing indexes and re-adding them is faster than loading in place (with few exceptions).

这些指南将使您朝着一般的方向前进,但确实会发布更多问题来调整特定的性能问题.

These guidelines will get you headed in the general direction, but do post more questions for tuning specific performance problems.

这篇关于如何在SSIS中删除DataFlow Task中的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆