如何遍历Excel文件并使用SSIS包将它们加载到数据库中? [英] How to loop through Excel files and load them into a database using SSIS package?

查看:53
本文介绍了如何遍历Excel文件并使用SSIS包将它们加载到数据库中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要创建一个SSIS包,用于将多个Excel文件中的数据导入到SQL数据库中.我计划使用嵌套的Foreach循环容器来实现此目的.一个Foreach文件枚举器,并嵌套在其中,一个Foreach ADO.net架构行集枚举器

I need to create an SSIS package for importing data from multiple Excel files into an SQL database. I plan on using nested Foreach Loop containers to achieve this. One Foreach File Enumerator and nested within that, a Foreach ADO.net Schema Rowset Enumerator

要考虑的问题:excel文件之间的工作表名称不同,但结构保持不变.

Problem to consider: Sheet names are different between excel files but structure remains the same.

我已经创建了一个Excel Connection Manager,但是架构行集枚举器不接受Enumerator配置中的连接管理器.

I have created an Excel Connection Manager, but the Schema Rowset Enumerator is not accepting the connection manager in the Enumerator configuration.

研究后,我发现您可以使用Jet Ole db提供程序连接到excel文件.但是,我只能将Microsoft Access数据库文件指定为数据源.尝试将Excel文件作为数据源插入失败

After researching, I found that you can use the Jet Ole db provider to connect to an excel file. However, I can only specify Microsoft Access Database Files as the data source. Attempting to insert an Excel File as the data source fails

经过更多研究,我发现您可以将Odbc数据提供程序与连接字符串一起使用,而不是DSN.插入指定Excel文件的连接字符串后,这也失败了

After more research I found that you can use the Odbc Data Provider with a connection string instead of a DSN. After inserting a connection string specifying the Excel file this also failed

有人告诉我不要使用脚本任务来完成此任务,即使在尝试通过索引从工作表中提取数据的最后一次尝试从工作表中提取数据后,我发现不同excel文件中工作表的索引也不同

I have been told not to use a Script Task to accomplish this and even after trying a last ditch effort to extract data from sheets be accessing the sheets by index I found that the index for the sheets in the different excel files are different

任何帮助将不胜感激

推荐答案

以下是执行此操作的一种可能方法,该方法基于以下假设:Excel文件中将没有任何空白工作表,并且所有工作表都完全相同结构体.另外,假设文件扩展名仅为.xlsx

Here is one possible way of doing this based on the assumption that there will not be any blank sheets in the Excel files and also all the sheets follow the exact same structure. Also, under the assumption that the file extension is only .xlsx

以下示例是使用 SSIS 2008 R2 Excel 2007 创建的.此示例的工作文件夹为F:\Temp\

Following example was created using SSIS 2008 R2 and Excel 2007. The working folder for this example is F:\Temp\

在文件夹路径F:\Temp\中,创建一个具有两个工作表的名为States_1.xlsx的Excel 2007电子表格文件.

In the folder path F:\Temp\, create an Excel 2007 spreadsheet file named States_1.xlsx with two worksheets.

Sheet 1包含以下数据

Sheet 2包含以下数据

在文件夹路径F:\Temp\中,创建另一个具有两个工作表的Excel 2007电子表格文件,名称为States_2.xlsx.

In the folder path F:\Temp\, create another Excel 2007 spreadsheet file named States_2.xlsx with two worksheets.

Sheet 1包含以下数据

Sheet 2包含以下数据

使用以下创建脚本在SQL Server中创建一个名为dbo.Destination的表. Excel工作表数据将插入到此表中.

Create a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.

CREATE TABLE [dbo].[Destination](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [State] [nvarchar](255) NULL,
    [Country] [nvarchar](255) NULL,
    [FilePath] [nvarchar](255) NULL,
    [SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

该表当前为空.

创建一个新的SSIS包,并在该包上创建以下4个变量. FolderPath 将包含存储Excel文件的文件夹. FilePattern 将包含将循环遍历的文件的扩展名,此示例仅适用于.xlsx. FilePath 将由Foreach循环容器分配一个值,但是在设计时我们需要一个有效的路径开头,并且当前已使用第一个Excel文件的路径F:\Temp\States_1.xlsx进行填充. SheetName 将包含实际的工作表名称,但是我们需要填充初始值Sheet1$以避免设计时间错误.

Create a new SSIS package and on the package, create the following 4 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for .xlsx. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the path F:\Temp\States_1.xlsx of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial value Sheet1$ to avoid design time error.

在程序包的连接管理器中,使用以下配置创建ADO.NET连接,并将其命名为 ExcelSchema .

In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.

为OleDb选择.Net提供程序下的提供程序Microsoft Office 12.0 Access Database Engine OLE DB Provider.提供文件路径F:\Temp\States_1.xlsx

Select the provider Microsoft Office 12.0 Access Database Engine OLE DB Provider under .Net Providers for OleDb. Provide the file path F:\Temp\States_1.xlsx

单击左侧的All部分,并将属性扩展属性"设置为Excel 12.0,以表示Excel的版本.在这种情况下,此处12.0表示Excel 2007.单击测试连接"以确保连接成功.

Click on the All section on the left side and set the property Extended Properties to Excel 12.0 to denote the version of Excel. Here in this case 12.0 denotes Excel 2007. Click on the Test Connection to make sure that the connection succeeds.

创建一个名为Excel的Excel连接管理器,如下所示.

Create an Excel connection manager named Excel as shown below.

创建一个名为SQLServer的OLE DB连接SQL Server.因此,我们应该在包装上具有三个连接,如下所示.

Create an OLE DB Connection SQL Server named SQLServer. So, we should have three connections on the package as shown below.

我们需要进行以下连接字符串更改,以使Excel文件随文件循环而动态更改.

We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.

在连接 ExcelSchema 上,将表达式ServerName配置为使用变量FilePath.单击省略号按钮以配置表达式.

On the connection ExcelSchema, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

类似地,在连接 Excel 上,将表达式ServerName配置为使用变量FilePath.单击省略号按钮以配置表达式.

Similarly on the connection Excel, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.

在控制流"上,将两个Foreach循环容器放置在另一个容器中.第一个名为Foreach Loop container的Loop文件将循环遍历这些文件.第二个Foreach Loop container将遍历容器内的纸张.在内部的每个循环容器中,放置一个数据流任务,该任务将读取Excel文件并将数据加载到SQL

On the Control Flow, place two Foreach Loop containers one within the other. The first Foreach Loop container named Loop files will loop through the files. The second Foreach Loop container will through the sheets within the container. Within the inner For each loop container, place a Data Flow Task that will read the Excel files and load data into SQL

配置第一个名为循环文件的Foreach循环容器,如下所示:

Configure the first Foreach loop container named Loop files as shown below:

配置第一个名为循环工作表的Foreach循环容器,如下所示:

Configure the first Foreach loop container named Loop sheets as shown below:

在数据流任务内部,放置一个Excel Source,Derived Column和OLE DB Destination,如下所示:

Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:

配置Excel Source以读取适当的Excel文件和当前正在循环通过的工作表.

Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.

配置派生列以为文件名和图纸名称创建新列.这只是为了演示此示例,但没有任何意义.

Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.

配置OLE DB目标以将数据插入到SQL表中.

Configure the OLE DB destination to insert the data into the SQL table.

下面的屏幕快照显示了程序包的成功执行.

Below screenshot shows successful execution of the package.

下面的屏幕快照显示,在此答案开头创建的2个Excel电子表格中的4个工作簿中的数据已正确加载到SQL表dbo.Destination中.

Below screenshot shows that data from the 4 workbooks in 2 Excel spreadsheets that were creating in the beginning of this answer is correctly loaded into the SQL table dbo.Destination.

希望有帮助.

这篇关于如何遍历Excel文件并使用SSIS包将它们加载到数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆