从多个文件中获取数据并将其加载到 ssis 中的原始文件目的地(原始文件应该是动态的) [英] fetching data from multiple file and loading it into raw file destination(raw file should be dynamic) in ssis

查看:22
本文介绍了从多个文件中获取数据并将其加载到 ssis 中的原始文件目的地(原始文件应该是动态的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个源文件夹,其中包含 4 个 csv 文件,每个文件中的列数不同.我只需要从每个 csv 中获取 3 列(所有 4 个文件中的这 3 列的元数据相同),并从源文件夹中可用的所有文件中加载 Raw Destination 中的列.原始目标输出文件名必须与我们正在获取的输入文件名 + 时间戳一样.

I have a source folder which contains 4 csv files with different no of columns in each of the file. I need to fetch only 3 columns(metadata same this 3 columns in all the 4 files) from each csv and load the columns inside Raw Destination from all the files avaiable in source folder. And Raw destination Output file name has to be like wht the inputfilename we are fetching + time stamp.

在下一个级别,我需要将此输出原始作为原始源获取并将此记录插入到 oledb 目标中.并且目标表也必须是动态的.

And at next level, i need to fetch this output raw as raw source and insert this records into oledb destination . and the destination table also has to be in dynamic.

例如我有 4 个 csv 文件,test1.csv(10 列).test2.csv(8), test3.csv(6), test4.csv(10) 以及时间戳.

所有这 4 个文件都有 position_id、asofdate、sumassured 列的共同点,现在我只想将这 3 列加载到原始目的地.如果我加载 test1.csv,那么我的原始目标输出文件名必须是 RW_test1_20120119_222222.RW.同样,如果我将第二个文件的文件名加载为原始目标输出..

all this 4 files has columns position_id, asofdate, sumassured in common, now i want to load only these 3 columns to raw destination. If i load test1.csv then my raw destination outputfile name has to be RW_test1_20120119_222222.RW. similalrly if i load second file its filename as raw destination output..

谢谢

萨蒂什

推荐答案

一如既往,分解你的问题,直到你把它变成你可以管理的东西.

As always, decompose your problems until you've got it into a something you can manage.

按照下面的两个问题和答案将生成一个包,其中包含配置为对文件夹 @[User::InputFolder] 中的 CSV 进行操作的 OLEDB 连接管理器.3 个变量 CurrentFileName、InputFolder 和 Query 已定义为在 Query 上设置的表达式.@[User::Query] 的表达式看起来像 "SELECT position_id, asofdate, sumassured FROM " + @[User::CurrentFileName]

Following the two questions and answers below will result in a package with an OLEDB Connection Manager configured to operate on CSVs in the folder @[User::InputFolder]. 3 variables CurrentFileName, InputFolder and Query have been defined with an expression set on Query. The expression for your @[User::Query] would look like "SELECT position_id, asofdate, sumassured FROM " + @[User::CurrentFileName]

SSIS 任务用于不一致的列数导入?

此时,您的包裹应该类似于下面的中心部分.验证您可以正确枚举文件夹中的所有 CSV,并且 OLEDB 查询部分有效.

At this point, your package should resemble the center piece below. Verify you can correctly enumerate all of the CSVs in the folder and the OLEDB query piece works.

我不是 RAW 文件使用方面的专家,因此可能有更好的方式与它们交互.这将使用第四个变量 RawFileName.在其上设置一个表达式,例如 @[User::InputFolder] + "RawFile.raw" 这将导致文件被写入 C:\ssisdata\so\satishkumar\RawFile.raw

I'm not an expert on RAW file usage so there may be better ways of interacting with them. This will use the fourth variable, RawFileName. Set an expression on it like @[User::InputFolder] + "RawFile.raw" which would result in the file being written to C:\ssisdata\so\satishkumar\RawFile.raw

我的一般方法是让数据流带有一个脚本任务,该任务不将行发送到 RAW 文件目标.

My general approach is to have a dataflow with a script task that sends no rows into a RAW File Destination.

将目的地配置为

  • 访问模式:来自变量的文件名
  • 变量名:User::RawFileName
  • 写入选项:始终创建

这里的概念是将所有数据附加到在初始步骤中创建的 RAW 文件中.

The concept here is to append all the data into the RAW file that was created in the initial step.

您的源应该已经配置为

  • OLE DB 连接管理器:FlatFile
  • 数据访问方式:来自变量的 SQL 命令
  • 变量名:User::Query

将目的地配置为

  • 访问模式:来自变量的文件名
  • 变量名:User::RawFileName
  • 写入选项:附加

至此,foreach枚举器已经完成,所有数据都已经加载到暂存文件中.现在是时候使用它并将数据发送到目的地了.

At this point, the foreach enumerator has completed and all the data has been loaded into the staging file. Now it is time to consume that and send data on to the destination.

Raw File Source Transformation 拖到您的数据流上.不出所料,您将配置为

Drag a Raw File Source Transformation onto your data flow. Unsurprisingly, you will configure as

  • 访问模式:来自变量的文件名
  • 变量名:User::RawFileName

不是模拟目标,而是将其连接到正确的数据目标.

Instead of Simulate destination, wire it up to the correct data destination.

使用带有 GETDATE/GETUTCDATE 的表达式来定义文件名时要小心,因为它们会不断地被评估.2005 年,我们使用了 FileName_HHMMSS 并且遇到了问题,因为在创建文件和使用该文件的下一个任务之间的处理没有在同一秒内完成.相反,我使用动态但固定的起点获得了更好的成功,通常是系统变量 StartTime @[System::StartTime]

Be careful when using an expression with GETDATE/GETUTCDATE to define filenames as they are constantly evaluated. In 2005, we had used FileName_HHMMSS and had issues because processing didn't complete in the same second between the creation of a file and the next task that consumed the file. Instead, I have had better success using a dynamic but fixed starting point and generally, that is the system variable, StartTime @[System::StartTime]

这篇关于从多个文件中获取数据并将其加载到 ssis 中的原始文件目的地(原始文件应该是动态的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆