如何使用 SSIS 包循环遍历 Excel 文件并将它们加载到数据库中? [英] How to loop through Excel files and load them into a database using SSIS package?

查看:46
本文介绍了如何使用 SSIS 包循环遍历 Excel 文件并将它们加载到数据库中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要创建一个 SSIS 包,以便将数据从多个 Excel 文件导入到 SQL 数据库中.我计划使用嵌套的 Foreach 循环容器来实现这一点.一个 Foreach 文件枚举器并嵌套在其中,一个 Foreach ADO.net 架构行集枚举器

I need to create an SSIS package for importing data from multiple Excel files into an SQL database. I plan on using nested Foreach Loop containers to achieve this. One Foreach File Enumerator and nested within that, a Foreach ADO.net Schema Rowset Enumerator

需要考虑的问题:Excel 文件之间的工作表名称不同,但结构保持不变.

Problem to consider: Sheet names are different between excel files but structure remains the same.

我创建了一个 Excel 连接管理器,但架构行集枚举器不接受枚举器配置中的连接管理器.

I have created an Excel Connection Manager, but the Schema Rowset Enumerator is not accepting the connection manager in the Enumerator configuration.

经过研究,我发现您可以使用Jet Ole db provider 连接到excel文件.但是,我只能指定 Microsoft Access 数据库文件作为数据源.尝试插入 Excel 文件作为数据源失败

After researching, I found that you can use the Jet Ole db provider to connect to an excel file. However, I can only specify Microsoft Access Database Files as the data source. Attempting to insert an Excel File as the data source fails

经过更多研究,我发现您可以使用带有连接字符串而不是 DSN 的 Odbc 数据提供程序.插入指定 Excel 文件的连接字符串后,这也失败了

After more research I found that you can use the Odbc Data Provider with a connection string instead of a DSN. After inserting a connection string specifying the Excel file this also failed

我被告知不要使用脚本任务来完成此任务,即使在尝试从工作表中提取数据的最后一次努力之后,我还是通过索引访问工作表,我发现不同 Excel 文件中工作表的索引是不同的

I have been told not to use a Script Task to accomplish this and even after trying a last ditch effort to extract data from sheets be accessing the sheets by index I found that the index for the sheets in the different excel files are different

任何帮助将不胜感激

推荐答案

这是一种可能的方法,它基于 Excel 文件中不会有任何空白工作表并且所有工作表都遵循完全相同的假设结构体.另外,假设文件扩展名仅为 .xlsx

Here is one possible way of doing this based on the assumption that there will not be any blank sheets in the Excel files and also all the sheets follow the exact same structure. Also, under the assumption that the file extension is only .xlsx

以下示例是使用 SSIS 2008 R2Excel 2007 创建的.此示例的工作文件夹是 F:Temp

Following example was created using SSIS 2008 R2 and Excel 2007. The working folder for this example is F:Temp

在文件夹路径 F:Temp 中,创建一个名为 States_1.xlsx 的 Excel 2007 电子表格文件,其中包含两个工作表.

In the folder path F:Temp, create an Excel 2007 spreadsheet file named States_1.xlsx with two worksheets.

Sheet 1 States_1.xlsx 包含以下数据

Sheet 2 包含以下数据

在文件夹路径 F:Temp 中,创建另一个名为 States_2.xlsx 的 Excel 2007 电子表格文件,其中包含两个工作表.

In the folder path F:Temp, create another Excel 2007 spreadsheet file named States_2.xlsx with two worksheets.

Sheet 1 States_2.xlsx 包含以下数据

Sheet 2 of States_2.xlsx 包含以下数据

Sheet 2 of States_2.xlsx contained the following data

使用以下创建脚本在 SQL Server 中创建一个名为 dbo.Destination 的表.Excel 工作表数据将插入到此表中.

Create a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.

CREATE TABLE [dbo].[Destination](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [State] [nvarchar](255) NULL,
    [Country] [nvarchar](255) NULL,
    [FilePath] [nvarchar](255) NULL,
    [SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

该表当前为空.

创建一个新的 SSIS 包,并在该包上创建以下 4 个变量.FolderPath 将包含存储 Excel 文件的文件夹.FilePattern 将包含将循环通过的文件的扩展名,此示例仅适用于 .xlsx.FilePath 将由 Foreach Loop 容器分配一个值,但我们需要一个有效的路径来开始设计时,它当前填充了路径 F:TempStates_1.xlsx 第一个 Excel 文件.SheetName 将包含实际的工作表名称,但我们需要填充初始值 Sheet1$ 以避免设计时错误.

Create a new SSIS package and on the package, create the following 4 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for .xlsx. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the path F:TempStates_1.xlsx of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial value Sheet1$ to avoid design time error.

在包的连接管理器中,使用以下配置创建一个 ADO.NET 连接并将其命名为 ExcelSchema.

In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.

在 .Net Providers for OleDb 下选择提供程序 Microsoft Office 12.0 Access Database Engine OLE DB Provider.提供文件路径F:TempStates_1.xlsx

Select the provider Microsoft Office 12.0 Access Database Engine OLE DB Provider under .Net Providers for OleDb. Provide the file path F:TempStates_1.xlsx

点击左侧的All部分,将属性Extended Properties设置为Excel 12.0,表示Excel的版本.在这种情况下,12.0 表示 Excel 2007.单击测试连接"以确保连接成功.

Click on the All section on the left side and set the property Extended Properties to Excel 12.0 to denote the version of Excel. Here in this case 12.0 denotes Excel 2007. Click on the Test Connection to make sure that the connection succeeds.

创建一个名为 Excel 的 Excel 连接管理器,如下所示.

Create an Excel connection manager named Excel as shown below.

创建一个名为 SQLServer 的 OLE DB 连接 SQL Server.所以,我们应该在包装上有三个连接,如下所示.

Create an OLE DB Connection SQL Server named SQLServer. So, we should have three connections on the package as shown below.

我们需要进行以下连接字符串更改,以便 Excel 文件在文件循环时动态更改.

We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.

在连接 ExcelSchema 上,配置表达式 ServerName 以使用变量 FilePath.点击省略号按钮配置表达式.

On the connection ExcelSchema, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

同样在连接 Excel 上,配置表达式 ServerName 以使用变量 FilePath.单击省略号按钮以配置表达式.

Similarly on the connection Excel, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

在控制流上,将两个 Foreach 循环容器一个放在另一个中.第一个 Foreach Loop container 名为 Loop files 将循环遍历这些文件.第二个 Foreach Loop 容器 将通过容器内的工作表.在内部的每个循环容器中,放置一个数据流任务,它将读取 Excel 文件并将数据加载到 SQL 中

On the Control Flow, place two Foreach Loop containers one within the other. The first Foreach Loop container named Loop files will loop through the files. The second Foreach Loop container will through the sheets within the container. Within the inner For each loop container, place a Data Flow Task that will read the Excel files and load data into SQL

配置第一个名为循环文件的 Foreach 循环容器,如下所示:

Configure the first Foreach loop container named Loop files as shown below:

配置第一个名为循环表的 Foreach 循环容器,如下所示:

Configure the first Foreach loop container named Loop sheets as shown below:

在数据流任务中,放置一个Excel Source、Derived Column和OLE DB Destination,如下图:

Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:

配置 Excel 源以读取适当的 Excel 文件和当前正在循环的工作表.

Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.

配置派生列以创建文件名和工作表名称的新列.这只是为了演示这个例子,没有任何意义.

Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.

配置 OLE DB 目标以将数据插入 SQL 表中.

Configure the OLE DB destination to insert the data into the SQL table.

下面的截图显示了包的成功执行.

Below screenshot shows successful execution of the package.

下面的屏幕截图显示,在此答案开头创建的 2 个 Excel 电子表格中的 4 个工作簿中的数据已正确加载到 SQL 表 dbo.Destination 中.

Below screenshot shows that data from the 4 workbooks in 2 Excel spreadsheets that were creating in the beginning of this answer is correctly loaded into the SQL table dbo.Destination.

希望有所帮助.

这篇关于如何使用 SSIS 包循环遍历 Excel 文件并将它们加载到数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆