如何跳过设置为 foreach ado 枚举器的 foreach 循环容器中的项目? [英] How can I skip items in a foreach loop container set to foreach ado enumerator?

查看:26
本文介绍了如何跳过设置为 foreach ado 枚举器的 foreach 循环容器中的项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 SSIS 导入模块,它循环访问文件夹中的文件,对于每个文件,循环访问其中的工作表.在这些文件中,我只想要具有相同命名约定的特定工作表上的数据,即 2006 - claim report2007 - claim report 等等.

I have an SSIS import module that loops through files in a folder and for each file, loops through the worksheets within them. Within these files I only want data on specific Worksheets which have the same naming convention i.e. 2006 - claims report, 2007 - claims report and so on.

无论如何,我是否只能导入具有该命名约定的工作表并跳过所有其他工作表?

Is there anyway that I can only import the worksheets that have that naming convention and skip all others?

这会是 Foreach 容器中在变量值上使用正则表达式的一些脚本,还是需要与 Foreach 容器一起使用的表达式?

Would this be a bit of scripting within the Foreach container using regex on the Variable value or would it entail an expression used with the Foreach container?

推荐答案

是的,一种可能的选择是利用脚本任务仅处理您喜欢的工作表.

Yes, one possible option is to make use of Script Task to process only the Worksheets of your preference.

以下示例是使用 SSIS 2008 R2Excel 2010 创建的.此示例的工作文件夹是 C:\Temp\.我认为逻辑应该仍然适用于以前的版本.

Following example was created using SSIS 2008 R2 and Excel 2010. The working folder for this example is C:\Temp\. I think the logic should still hold good for previous versions.

在文件夹路径 C:\Temp\ 中,创建一个名为 Country_States.xlsx 的 Excel 2007 电子表格文件,其中包含三个工作表,即 US_1US_2Canada_1.

In the folder path C:\Temp\, create an Excel 2007 spreadsheet file named Country_States.xlsx with three worksheets namely US_1, US_2 and Canada_1.

US_1 of Country_States.xlsx 包含以下数据

US_1 of Country_States.xlsx contained the following data

US_2 of Country_States.xlsx 包含以下数据

US_2 of Country_States.xlsx contained the following data

Canada_1 of Country_States.xlsx 包含以下数据

Canada_1 of Country_States.xlsx contained the following data

使用以下创建脚本在 SQL Server 中创建一个名为 dbo.Destination 的表.Excel 工作表数据将插入到此表中.

Create a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.

CREATE TABLE [dbo].[Destination](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [State] [nvarchar](255) NULL,
    [Country] [nvarchar](255) NULL,
    [FilePath] [nvarchar](255) NULL,
    [SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

该表当前为空.

创建一个新的 SSIS 包,并在该包上创建以下 6 个变量.FolderPath 将包含存储 Excel 文件的文件夹.FilePattern 将包含将循环通过的文件的扩展名,此示例仅适用于 .xlsx.FilePath 将由 Foreach Loop 容器分配一个值,但我们需要一个有效的路径来开始设计时,它当前填充了路径 C:\temp\Country_States.xlsx 第一个 Excel 文件.SheetName 将包含实际的工作表名称,但我们需要填充初始值 US_1$ 以避免设计时错误.ProcessTheSheet 将包含 true/false 并设置为默认值 false.PatternToMatch 将包含仅用于匹配我们选择的工作表的模式.

Create a new SSIS package and on the package, create the following 6 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for .xlsx. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the path C:\temp\Country_States.xlsx of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial value US_1$ to avoid design time error. ProcessTheSheet will contain true/false and set with the default value of false. PatternToMatch will contain the pattern that will use to match only the worksheets of our choice.

在包的连接管理器中,使用以下配置创建 ADO.NET 连接并将其命名为 ExcelSchema.

In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.

在 .Net Providers for OleDb 下选择提供程序 Microsoft Office 12.0 Access Database Engine OLE DB Provider.提供文件路径C:\temp\Country_States.xlsx

Select the provider Microsoft Office 12.0 Access Database Engine OLE DB Provider under .Net Providers for OleDb. Provide the file path C:\temp\Country_States.xlsx

点击左侧的All部分,将属性Extended Properties设置为Excel 12.0,表示Excel的版本.在这种情况下,12.0 表示 Excel 2007 或更高版本.单击测试连接"以确保连接成功.

Click on the All section on the left side and set the property Extended Properties to Excel 12.0 to denote the version of Excel. Here in this case 12.0 denotes Excel 2007 or above. Click on the Test Connection to make sure that the connection succeeds.

创建一个名为 Excel 的 Excel 连接管理器,如下所示.

Create an Excel connection manager named Excel as shown below.

创建一个名为 SQLServer 的 OLE DB 连接 SQL Server.所以,我们应该在包装上有三个连接,如下所示.

Create an OLE DB Connection SQL Server named SQLServer. So, we should have three connections on the package as shown below.

我们需要进行以下连接字符串更改,以便 Excel 文件在文件循环时动态更改.

We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.

在连接 ExcelSchema 上,配置表达式 ServerName 以使用变量 FilePath.点击省略号按钮配置表达式.

On the connection ExcelSchema, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

同样在连接 Excel 上,配置表达式 ServerName 以使用变量 FilePath.单击省略号按钮以配置表达式.

Similarly on the connection Excel, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

在控制流上,将两个 Foreach 循环容器一个放在另一个中.第一个 Foreach Loop container 名为 Loop files 将循环遍历这些文件.第二个 Foreach Loop 容器 将通过容器内的工作表.在内部的每个循环容器中,放置一个脚本任务,该任务将仅验证应处理的工作表和一个数据流任务,该任务将读取 Excel 文件并将数据加载到 SQL 中

On the Control Flow, place two Foreach Loop containers one within the other. The first Foreach Loop container named Loop files will loop through the files. The second Foreach Loop container will through the sheets within the container. Within the inner For each loop container, place a Script Task that will validate only the sheets that should be processed and a Data Flow Task that will read the Excel files and load data into SQL

配置第一个名为循环文件的 Foreach 循环容器,如下所示:

Configure the first Foreach loop container named Loop files as shown below:

配置第一个名为循环表的 Foreach 循环容器,如下所示:

Configure the first Foreach loop container named Loop sheets as shown below:

脚本任务应使用以下代码配置,以根据存储在 PatternToMatch 变量中的模式验证 SheetName 值.如果 SheetName 与模式匹配,则变量 ProcessTheSheet 将设置为 True,否则将设置为 False.

Script Task should be configured with following code that verifies the SheetName value against the pattern stored in PatternToMatch variable. If the SheetName matches the pattern, then the variable ProcessTheSheet is set to True or else it will be set to False.

SSIS 2008 及更高版本的 C# 代码

包括语句using System.Text.RegularExpressions; for RegEx.

Include the statement using System.Text.RegularExpressions; for RegEx.

public void Main()
{
    Variables varCollection = null;
    Dts.VariableDispenser.LockForRead("User::SheetName");
    Dts.VariableDispenser.LockForRead("User::PatternToMatch");
    Dts.VariableDispenser.LockForWrite("User::ProcessTheSheet");
    Dts.VariableDispenser.GetVariables(ref varCollection);

    string sheetName = varCollection["User::SheetName"].Value.ToString();
    string pattern = varCollection["User::PatternToMatch"].Value.ToString();
    Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
    Match match = Regex.Match(sheetName, pattern);

    varCollection["User::ProcessTheSheet"].Value = match.Success;

    Dts.TaskResult = (int)ScriptResults.Success;
}

右键连接脚本任务和数据流任务的连接器,然后选择编辑选项.它应该带来优先约束编辑器对话框.将求值操作设置为表达式并将表达式设置为 @[User::ProcessTheSheet] 此表达式将允许包继续执行数据流任务,前提是 Sheetname 与变量 PatternToMatch 中提供的模式匹配.您可以注意到连接器包含 fx,这意味着存在表达式并且颜色也从绿色变为蓝色.

Right the connector that joins the Script Task and the Data Flow Task and select Edit option. It should bring the Precedence Constraint Editor dialog. Set the Evaluation operation to Expression and set the Expression to @[User::ProcessTheSheet] This expression will allow the package to continue to Data Flow Task only if the Sheetname matches the pattern provided in the variable PatternToMatch. You can notice that the connector contains fx, which means there is an expression in place and also the color changes from Green to Blue.

在数据流任务中,放置一个 Excel Source、Derived Column 和 OLE DB Destination,如下所示:

Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:

配置 Excel 源以读取适当的 Excel 文件和当前正在循环的工作表.

Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.

配置派生列以创建文件名和工作表名称的新列.这只是为了演示这个例子,没有任何意义.

Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.

配置 OLE DB 目标以将数据插入 SQL 表中.

Configure the OLE DB destination to insert the data into the SQL table.

下面的屏幕截图显示了包的成功执行.

Below screenshot shows successful execution of the package.

PatternToMatch 变量设置为值 CA* 时,该表仅填充来自 Sheet Canada_1 的值.

When the PatternToMatch variable is set to the value CA*, the table is populated only with the values from the Sheet Canada_1.

从表中删除所有行.将 PatternToMatch 变量值更改为 US*,该表仅填充了工作表 US_1 和 US_2 中的值.

Deleted all the rows from the table. Changed the PatternToMatch variable value to US*, the table is populated only with the values from the sheets US_1 and US_2.

希望有所帮助.

这篇关于如何跳过设置为 foreach ado 枚举器的 foreach 循环容器中的项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆