DTS/SSIS与Informatica Power Center [英] DTS/SSIS vs. Informatica Power Center

查看:112
本文介绍了DTS/SSIS与Informatica Power Center的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定这是一个非常模糊的问题,很难回答,但我对此主题的一般想法将不胜感激.

I'm sure that this is a pretty vague question that is difficult to answer but I would be grateful for any general thoughts on the subject.

让我给你一个快速的背景.

Let me give you a quick background.

十年前,我们曾经编写数据加载程序,从旧版应用程序读取输入平面文件并将其加载到我们的Datamart中.最初,我们的加载程序是用VB6编写的,并在平面文件中定位,并为每个记录执行以下一般过程:

A decade ago, we used to write data loads reading input flat files from legacy applications and load them into our Datamart. Originally, our load programs were written in VB6 and cursored through the flat file and for each record, performed this general process:

1)查找记录.如果找到,请更新 2)否则插入新记录

1) Look up the record. If found, update it 2) else insert new record

然后,我们最终更改了此过程,以使用SQL Server DTS临时表中的平面文件,然后我们将对临时表和目标生产表执行大规模的基于集合的联接,并从临时表中获取数据并使用它来更新目标表.插入了未加入的记录.

Then we ended up changing this process to use SQL Server to DTS the flat file in a temp table and then we would perform a massive set base join on the temp table with the target production table, taking the data from the temp table and using it to update the target table. Records that didn't join were inserted.

这是流程的简化,但是从本质上讲,该流程已从迭代方法转到基于集合",不再一次执行更新1条记录.结果,我们获得了巨大的性能提升.

This is a simplification of the process, but essentially, the process went from an iterative approach to "set based", no longer performing updates 1 record at a time. As a result, we got huge performance gains.

然后,我认为在DLL中创建了一组功能强大的共享功能,以使用这种方法执行常见功能/更新模式.它极大地简化了开发过程,并真正减少了开发时间.

Then we created what was in my opinion a powerful set of shared functions in a DLL to perform common functions/update patterns using this approach. It greatly abstracted the development and really cut down on the development time.

然后,一个ETL工具Informatica PowerCenter出现了,mgt希望对该工具进行标准化,并重写使用DTS的旧VB负载.

Then Informatica PowerCenter, an ETL tool, came around and mgt wants to standardize on the tool and rewrite the old VB loads that used DTS.

我听说PowerCenter会反复处理记录,但是我知道它确实做了一些优化技巧,所以我很好奇Informatica的性能.

I heard that PowerCenter processes records iteratively, but I know that it does do some optimization tricks, so I am curious how Informatica would perform.

任何人在使用DTS或SSIS方面都有过任何经验,以便能够对一般情况下的性能更好地做出直觉的评价吗?

Does anyone have any experience with using DTS or SSIS to be able to make a gut performance predition as to which would generally perform better?

推荐答案

我加入了同时使用Informatica PowerCenter 8.1.1.的组织.尽管我不能说一般的Informatica设置,但是我可以说Informatica在这家公司效率极低.主要问题是Informatica在后端生成了一些非常糟糕的SQL代码.当我观察它对探查器的操作以及查看文本日志时,它为需要插入/更新/删除的每一行生成了单独的插入,更新和删除语句.我没有尝试修复Inf​​ormatica实施,而只是将其替换为SSIS 2008.

I joined an organization that used both Informatica PowerCenter 8.1.1. Although I can't speak for general Informatica setups, I can say that at this company Informatica was exceedingly inefficient. The main problem is that Informatica generated some really henious SQL code in the back-end. When I watched what it was doing with profiler and from reviewing the text logs, it generated separate insert, update, and delete statements for each row that needed to be inserted/updated/deleted. Instead of trying to fix the Informatica implementation, I simply replaced it with SSIS 2008.

我对Informatica遇到的另一个问题是管理并行化.在DTS和SSIS中,并行化任务都非常简单-不要定义优先级约束,您的任务将并行运行.在Informatica中,您可以定义一个起点,然后定义用于并行运行进程的分支.除非我通过链接工作集或任务明确定义了并行进程的数量,否则我无法找到一种方法来限制并行进程的数量.

Another problem I had with Informatica was managing parallelization. In both DTS and SSIS, parallelizing tasks was pretty simple -- don't define precedence constraints and your tasks will run in parallel. In Informatica, you define a starting point and then define the branches for running processes in parallel. I couldn't find a way for it to limit the number of parallel processes unless I explicitly defined them by chaining the worklets or tasks.

就我而言,SSIS的表现明显优于Informatica.我们使用Informatica进行加载的过程大约花费了8到12个小时.我们使用SSIS和SQL Server代理作业的加载过程大约需要1-2个小时.我敢肯定,如果我们对Informatica进行了适当的调整,我们可以将负载减少到3-4个小时,但是我仍然认为这样做不会做得更好.

In my case, SSIS substantially outperformed Informatica. Our load process with Informatica took about 8-12 hours. Our load process with SSIS and SQL Server Agent Jobs was about 1-2 hours. I am certain had we properly tuned Informatica we could have reduced the load to 3-4 hours, but I still don't think it would have done much better.

这篇关于DTS/SSIS与Informatica Power Center的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆