何时使用Lookup&在SSIS中合并联接转换 [英] When to use Lookup & Merge join Transformation in SSIS

查看:102
本文介绍了何时使用Lookup&在SSIS中合并联接转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是SSIS(集成服务)的新手.我在使用Lookup&时遇到了两难选择在SSIS中合并联接转换.

I am New to SSIS(Integration services). I am bit in dilemma when to use Lookup & Merge join Transformations in SSIS.

请不要告诉我它们之间的区别,我已经知道它们了.我想知道在哪些情况下必须使用Lookup&合并联接.

Please don't tell me differences between them, I know them already. I want to know in which scenarios I have to use Lookup & merge join.

推荐答案

使用MERGE JOIN可以想到的唯一原因是当您有2个巨大的未排序数据源,并且无法将它们全部拉入内存以在管道中对它们进行排序时

The only reason I can think of with the MERGE JOIN is when you have 2 huge unsorted datasources and you cannot pull them all in memory to sort them in the pipeline.

请浏览互联网世界,您肯定会发现一些东西.

Please explore internet world you will find something for sure..

编辑

使用合并联接代替查找

如果需要在数据流中进行一次性联接(而不是多次查找),请考虑使用合并联接"转换而不是查找"转换.杰米·汤姆森(Jamie Thomson)在一篇很棒的文章中比较了这两种方法,并证明了使用合并联接比使用查找要有效得多.这样做的主要原因是合并联接采用流方法,而不是花时间预缓存其值.在SSIS 2012中,流逻辑也得到了进一步的改进–当一个源比另一个源快很多时,合并连接现在可以防止一个输入获得太多的缓冲区.

If you need to do a one-time join in your data flow (as opposed to multiple lookups), consider using a Merge Join transform instead of a Lookup transform. Jamie Thomson has a great post which compares the two approaches, and demonstrates that using Merge Join can be a lot more efficient than using a Lookup. The main reason for this is that Merge Join takes a streaming approach, rather than taking time to pre-cache its values. The streaming logic was further improved in SSIS 2012 as well – the Merge Join now prevents one input from getting too many buffers when one source is a lot faster than the other.

在考虑此方法时,请牢记以下几点:

Keep the following things in mind when considering this approach:

两个输入都必须排序.理想情况下,可以将此类推入源查询中.如果尚未对数据进行排序(即没有索引),则排序的成本可能会超过此方法带来的好处. 源组件直到读取完所有数据才结束,因此,如果传入数据的行数较少,并且您要连接的数据集要大得多(在这种特定客户的情况下就是这种情况) ,合并联接方法将不是理想的选择.在这些类型的方案中,部分缓存查找往往效果最佳.

Both inputs must be sorted. Ideally, this sort can be pushed into the source query. If the data isn’t already sorted (i.e. no indexes), the cost of the sort might outweigh the benefits of this approach. A source component doesn’t end until it has read all of its data, so if your incoming data has a small number of rows, and you’re joining with a much larger data set (which is the case in this particular customer’s scenario), the Merge Join approach isn’t going to be ideal. A Partial Cache lookup tends to work best in these types of scenarios.

礼貌:

  1. MSDN网站
  2. 马特·马森
  1. MSDN WEBSITE
  2. MattMasson

这篇关于何时使用Lookup&在SSIS中合并联接转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆