SSIS - 如何从文件路径位于另一个文本文件中的文本文件加载数据? [英] SSIS - How do I load data from text files where the path of files is inside another text file?

查看:20
本文介绍了SSIS - 如何从文件路径位于另一个文本文件中的文本文件加载数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,其中包含要加载到数据库中的文件列表.

I have a text file that contains a list of files to load into database.

列表包含两列:

FilePath,Type
c:f1.txt,A
c:f2.txt,B
c:f3.txt,B

我想将此文件作为 SSIS 的源提供.然后我希望它逐行通过它.对于每一行,我希望它读取 FilePath 列中的文件并检查类型.

I want to provide this file as the source to SSIS. I then want it to go through it line by line. For each line, I want it to read the file in the FilePath column and check the Type.

如果 type 是 A 那么我希望它忽略位于当前行的 FilePath 列的文件的前 4 行,然后将该文件中的其余数据加载到表中.如果类型是 B,那么我希望它打开文件并将文件的第一列复制到表 1 中,将第二列复制到表 2 中的所有行.

If type is A then I want it to ignore the first 4 lines of the file that is located at the FilePath column of the current line and then load rest of the data inside that file in a table. If type is B then I want it to open the file and copy first column of the file into table 1 and second column into table 2 for all of the lines.

如果有人可以提供我需要遵循的高级步骤列表,我将不胜感激.

I would really appreciate if someone can please provide me a high level list of steps I need to follow.

感谢任何帮助.

推荐答案

这是在 SSIS 中执行此操作的一种方法.以下步骤与 SSIS 2008 R2 相关.

Here is one way of doing it within SSIS. Below steps are with respect to SSIS 2008 R2.

  • 创建一个 SSIS 包并创建三个包变量,即 FileNameFilesToReadType.FilesToRead 变量将保存文件列表及其类型信息.我们将有一个循环来遍历每条记录,并在每次循环时将信息存储在 FileNameType 变量中.
  • Create an SSIS package and create three package variables namely FileName, FilesToRead and Type. FilesToRead variable will hold the list of files and their types information. We will have a loop that will go through each of those records and store the information in FileName and Type variables every time it loops through.

  • 在控制流选项卡上,放置一个数据流任务,然后是一个 ForEach 循环容器.数据流任务将读取包含必须处理的文件列表的文件.然后循环将遍历每个文件.您的控制流选项卡最终看起来像这样.目前,由于未配置任何内容,因此会出现错误.我们很快就会谈到这一点.

  • 在连接管理器部分,您需要四个连接.
  • 首先,您需要一个 OLE DB 连接来连接到数据库.将其命名为 SQLServer.
  • 其次,一个平面文件连接管理器来读取包含文件和类型列表的文件.此平面文件连接管理器将包含配置的两列,即 FileNameType 将其命名为 Files.
  • 第三,另一个平面文件连接管理器,用于读取类型 A 的所有文件.将其命名为 Type_A.在此平面文件连接管理器中,在文本框要跳过的标题行 中输入值 4,以便始终跳过前四行.
  • 第四,多一个平面文件连接管理器,用于读取所有类型 B 的文件.将其命名为 Type_B.
  • On the connection manager section, you need four connections.
  • First, you need an OLE DB connection to connect to the database. Name this as SQLServer.
  • Second, a flat file connection manager to read the file that contains the list of files and types. This flat file connection manager will contain two columns configured namely FileName and Type Name this as Files.
  • Third, another flat file connection manager to read all files of type A. Name this as Type_A. In this flat file connection manager, enter the value 4 in the text box Header rows to skip so that the first four rows are always skipped.
  • Fourth, one more flat file connection manager to read all files of type B. Name this as Type_B.

  • 让我们回到控制流.双击第一个数据流任务.在数据流任务中,放置一个平面文件源,它将使用连接管理器Files 读取所有文件,然后放置一个Recordset Destination.在记录集目标中配置变量 FilesToRead.您的第一个数据流任务如下所示.
  • Let's get back to control flow. Double-click on the first data flow task. Inside the data flow task, place a flat file source that would read all the files using the connection manager Files and then place a Recordset Destination. Configure the variable FilesToRead in the recordset destination. Your first data flow task would like as shown below.

  • 现在,让我们再次回到控制流选项卡.如下所示配置 ForEach 循环.此循环将遍历存储在变量 FilesToRead 中的记录集.由于记录集包含两列,每次循环记录时,变量 FileNameType 将被分配当前记录的值.
  • Now, let's go back to control flow tab again. Configure the ForEach loop as shown below. This loop will go through the recordset stored in the variable FilesToRead. Since, the recordset contains two columns, each time a record is looped through, the variables FileName and Type will be assigned the value of the current record.

  • 在每个循环容器内部,有两个数据流任务,分别是Type A filesType B files.您可以根据需要配置这些数据流任务中的每一个,以从连接管理器读取文件.但是,我们需要根据正在读取的文件禁用任务.
  • A 类文件 数据流任务只有在处理 A 类文件时才应启用.
  • 同样,B 类文件数据流任务应该只在处理 B 类文件时启用.
  • 要实现此目的,请单击Type A files 数据流任务,然后按 F4 以显示属性.单击 Expression 属性上可用的 Ellipsis 按钮.
  • 在属性表达式编辑器上,选择 Disable 属性并输入表达式 !(@[User::Type] == "A")
  • Inside, the for each loop container, there are two data flow tasks namely Type A files and Type B files. You can configure each of these data flow tasks according to your requirements to read the files from connection managers. However, we need to disable the tasks based on the file that is being read.,
  • Type A files data flow task should be enabled only when A type files are being processed.
  • Similarly, Type B files data flow task should be enabled only when B type files are being processed.
  • To achieve this, click on the Type A files data flow task and press F4 to bring the properties. Click on the Ellipsis button available on the Expression property.
  • On the Property Expressions Editor, select Disable Property and enter the expression !(@[User::Type] == "A")

  • 同样,单击Type B files 数据流任务并按F4 调出属性.单击 Expression 属性上可用的 Ellipsis 按钮.
  • 在属性表达式编辑器中,选择 Disable 属性并输入表达式 !(@[User::Type] == "B")
  • Similarly, click on the Type B files data flow task and press F4 to bring the properties. Click on the Ellipsis button available on the Expression property.
  • On the Property Expressions Editor, select Disable Property and enter the expression !(@[User::Type] == "B")

  • 这是一个仅包含列表中的 A 类型文件的示例 Files.txt.当包执行读取这个文件时,你会注意到只有Type A files数据流任务.
  • Here is a sample Files.txt containing only A type file in the list. When the package is executed to read this file, you will notice that only the Type A files data flow task.

  • 这是另一个示例 Files.txt,仅包含列表中的 B 类型文件.当包执行读取这个文件时,你会注意到只有Type B files数据流任务.
  • Here is another sample Files.txt containing only B type files in the list. When the package is executed to read this file, you will notice that only the Type B files data flow task.

  • 如果 Files.txt 同时包含 A 和 B 类型文件,循环将根据正在处理的文件类型执行适当的数据流任务.
  • 假设您的类型 A 的平面文件具有三列布局,如下所示,使用逗号分隔值.此处的文件数据使用带有所有特殊字符的 Notepad++ 显示.CR LF 表示行以回车和换行结束.这个文件存放在路径C:f1.txt
  • Let's assume that your flat files of type A have three column layout like as shown below with comma separated values. The file data here is shown using Notepad++ with all special characters. CR LF denotes that the lines are ending with Carriage return and Line Feed. This file is stored in the path C:f1.txt

  • 我们需要数据库中有一个表来导入数据.让我们在 SQL Server 数据库中创建一个名为 dbo.Table_A 的表,如下所示.
  • We need a table in the database to import the data. Let's create a table named dbo.Table_A in the SQL Server database as shown here.

  • 现在,转到 SSIS 包.以下是配置名为 Type_A 的平面文件连接管理器的详细信息.为连接管理器命名.您需要在标题行中指定值 4 以跳过文本框.您的平面文件连接管理器应如下所示.
  • Now, go to the SSIS package. Here are the details to configure the Flat File connection manager named Type_A. Give a name to the connection manager. You need specify the value 4 in the Header rows to skip textbox. Your flat file connection manager should look something like this.

  • 在高级"选项卡上,您可以根据需要重命名列名称.

  • 既然配置了连接管理器,我们需要配置数据流任务Type A files来处理相应的文件.双击数据流任务Type A files.在任务中放置一个平面文件源和 OLE DB 目标.
  • Now that the connection manager is configured, we need to configure data flow task Type A files to process the corresponding files. Double-click on the data flow task Type A files. Place a Flat file source and OLE DB Destination inside the task.

  • 平面文件源必须配置为从平面文件连接管理器读取文件.

  • 数据流任务没有做任何特别的事情.它只是读取类型 A 的平面文件并将数据插入到表 dbo.Table_A 中.现在,我们需要配置 OLE DB Destination 以将数据插入数据库.平面文件连接管理器中配置的列名和表不一样.因此,它们必须手动映射.

  • 现在,数据流任务已配置.我们必须使从 Files.txt 读取的文件路径正确传递.为此,请单击 Type_A 平面文件连接管理器并按 F4 以显示属性.将 DelayValidation 属性设置为 True.单击 Expressions 属性上的 Ellipsis 按钮.
  • Now, that the data flow task is configured. We have to make that the file path being read from the Files.txt is passed correctly. To do this, click on the Type_A flat file connection manager and press F4 to bring the properties. Set the DelayValidation property to True. Click on the Ellipsis button on the Expressions property.

  • 在属性表达式构建器上,选择 ConnectionString 属性并将其设置为表达式 @[User::FileName]
  • On the Property Expression builder, select ConnectionString property and set it to the Expression @[User::FileName]

  • 这是一个仅包含 A 类文件的示例 Files.txt 文件.

  • 这里是样本类型 A 的文件 f01.txt 和 f02.txt

  • 包执行后,会在Table_A表中找到如下数据

  • 对于 B 类文件,必须遵循上述配置步骤.但是,由于文件处理逻辑不同,数据流任务看起来会略有不同.数据流任务类型 B 文件将是这样的.由于您必须将 B 类文件中的两列插入到不同的表中.您必须使用多播转换来创建输入数据的克隆.您可以使用每个多播输出传递到不同的转换或目标.

希望能帮助您完成任务.

Hope that helps you to achieve your task.

这篇关于SSIS - 如何从文件路径位于另一个文本文件中的文本文件加载数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆