Order By的排序方式与SSIS(集成服务)中的Sort Transformation方式不同 [英] Order By sorts differently than Sort Transformation in SSIS (Integration services)

查看:91
本文介绍了Order By的排序方式与SSIS(集成服务)中的Sort Transformation方式不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用SSIS由于使用两个OLE DB源进行排序,因此在加入时遇到了问题.

Using SSIS I've encountered issues when joining due to sorting using two OLE DB Sources.

我发现,如果我使用订单转换和排序转换,则会产生不同的结果,从而导致合并联接到产品不正确的数据

I've discovered that if I use an order by vs. a sort transformation it produces different results that causes a merge join to product incorrect data

示例:

表1

Id int PK
JoinKey varchar(25)
OriginalValue varchar(25)

表2

Id int PK
JoinKey varchar(25)
ExpectedValue varchar(25)

SSIS中的程序流程如下:

The Program Flow in SSIS is as follows:

  1. 从表1中选择OLE数据源
  2. 通过JoinKey asc然后按ExpectedValue asc对Table1排序转换
  3. 从Table2中按JoinKey asc顺序选择的OLE数据源
  4. 使用Join1上的Table1和Table2合并联接转换(左外部联接),并选择OriginalValue和ExpectedValue

这样做-对于某些数据集,我最终得到了ExpectedValue的空值.直接在SQL中进行查询会返回预期的值.

In doing this - I end up with a null value for ExpectedValue, for some of my data set. Doing the query directly in SQL returns the values expected.

如果我在步骤5之前添加了排序转换,则它正确加入了ExpectedValue,但是SSIS发出警告验证警告.数据已按照指定的顺序排序,因此可以删除该转换."

If I add a Sort Transformation before step 5, it correctly joins the ExpectedValue however then SSIS has the warning "Validation warning. The data is already sorted as specified so the transform can be removed."

两个问题:

  1. 是否不应该使用order by和sort转换来产生相同的结果?
  2. 为什么在实际需要排序时SSIS会显示警告?

我已经浏览了一些相关的文章,但是它们似乎只关注效率而不是功能上的差异.

I've browse some related posts but they seem to focus on efficiency not differences in functionality.

  1. 为什么我们需要在ssis中进行合并联接转换之前使用排序转换
  2. SQL Server集成服务左联接
  1. Is it better to sort data at the application layer, or with an order by clause?
  2. why we need to use sort transformation before merge join transformation in ssis
  3. SQL Server Integration Service Left Join

推荐答案

我过去曾遇到过这种情况,答案是否定的,在源SQL中使用ORDER BY进行排序并不总是与使用Sort产生相同的结果.在您的数据流任务中进行转换.某些特殊(非字母数字)字符(可能为NULLS)的处理方式有所不同;我不记得确切.我不记得找到任何有关此的文档(这是几年前的事),但是我通过自己的测试证实了这一点.

I have experienced this in the past and the answer is NO, sorting with an ORDER BY in your Source SQL does not always produce the same results as using a Sort Transformation in your Dataflow task. There is a difference in the way certain special (non alpha-numeric) characters (and maybe NULLS) are handled; I can't remember exactly which. I don't recall finding any documentation about this (it was years ago), but I confirmed it through my own testing.

我的结论是,要在数据流中进行JOIN,请确保在JOIN的两侧都使用相同的排序方法.

My conclusion was, to do a JOIN in the dataflow, make sure you use the same method of sorting on both sides of the JOIN.

关于第二个问题,SSIS在数据源上具有IsSorted属性.如果将DataSource上的IsSorted属性设置为true,那么如果尝试执行排序转换",则它所知道的只是IsSorted为true,并且会发出不需要排序的警告.它不知道是否需要排序转换"才能匹配使用排序转换的另一个源.

As for your second question, SSIS has an IsSorted property on a datasource. If you set the IsSorted property on the DataSource to true, then if you try to do a Sort Transformation, all it knows is that IsSorted is true and it will give the warning that sorting is not needed. It doesn't know that the Sort Transformation is "needed" in order to match another source that used a Sort Transformation.

如果要在JOIN的两侧使用排序转换,请将源的IsSorted属性设置为false.

If you want to use Sort Transformations on both sides of the JOIN, set the IsSorted property of your source to false.

这篇关于Order By的排序方式与SSIS(集成服务)中的Sort Transformation方式不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆