导入具有可变标题的Excel文件 [英] Importing excel files having variable headers
问题描述
我有SSIS软件包,它将excel文件加载到数据库中。我创建了Excel Source任务,以将excel列名映射到数据库表列名及其工作正常。
I have the SSIS package, which will load the excel file into Database. I have created Excel Source task to map the excel column name to Database table column name and its working fine.
在极少数情况下,我们收到的excel文件列名带有某些空格(例如:列名称为 ABC,但我们收到的是 ABC),这会导致映射问题和SSIS失败。
In rare case, We are receiving the excel file column name with some space (for example : Column name is "ABC" but we are receiving "ABC ") and which cause the mapping issue and SSIS got failed.
是否可以在不打开excel的情况下修剪列名称。
Is there any possible to trim the column name without opening the excel.
注意:页面名称将是动态的,列位置可能会更改(例如:列 ABC可能存在于第一行,第二行或..)。
推荐答案
第一总而言之,我的解决方案基于@DrHouseofSQL和@Bhouse答案,因此您必须先阅读@DrHouseofSQL答案,然后再阅读@BHouse答案,然后继续此答案
注意:页面名称将是动态的,列位置可能会更改(例如:列 ABC可能存在于第一行或第二行行或...
Note : Page name will be dynamic and Column position may change (eg: Column "ABC may exist in first row or second row or ...
这种情况有点复杂,可以使用以下解决方法解决:
This situation is a little complex and can be solved using the following workaround:
- 在导入数据的数据流任务之前添加脚本任务
- 您必须使用脚本任务打开excel文件并获取工作表名称和标题行
- 构建查询并将其存储在变量中
- 第二个数据流任务,您必须使用上面存储的查询作为源(请注意,您必须将
Delay Validation
属性设置为true )
- Add a script task before the data flow task that import the data
- You have to use the script task to open the excel file and get the Worksheet name and the header row
- Build the Query and store it in a variable
- in the second Data Flow task you have to use the query stored above as source (Note that you have to set
Delay Validation
property to true)
解决方案详细信息
Solution Details
- 首先创建一个SSIS变量输入字符串(即@ [User :: strQuery])
- 添加另一个包含Excel文件路径的变量(即@ [User :: ExcelFilePath])
- 添加脚本任务,然后选择
@ [User :: strQuery]
作为ReadWrite变量,然后选择@ [ User :: ExcelFilePath]
作为只读变量(在脚本任务窗口中) - 将脚本语言设置为VB.Net,并在在脚本编辑器窗口中,编写以下脚本:
- First create an SSIS variable of type string (i.e. @[User::strQuery])
- Add another variable that contains the Excel File Path (i.e. @[User::ExcelFilePath])
- Add A Script Task, and select
@[User::strQuery]
as ReadWrite Variable, and@[User::ExcelFilePath]
as ReadOnly Variable (in the script task window) - Set the Script Language to VB.Net and in the script editor window write the following script:
注意:您必须导入 System.Data .OleDb
在下面的代码中,我们搜索excel的前15行以查找标题,您可以如果在15行之后可以找到标题,则增加数字。我也假设列的范围是从 A
到 I
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intFirstRow As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:I15]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
For intCount As Integer = 0 To 15
If Not String.IsNullOrEmpty(dtTable.Rows(intCount)(0).ToString) Then
'+1 because datatable is zero based indexed, +1 because we want to start from the second row
intFirstRow = intCount + 2
End If
Next
End Using
If intFirstRow = 0 Then Throw New Exception("header not found")
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & "A" & intFirstRow.ToString & ":I]"
Dts.TaskResult = ScriptResults.Success
End Sub
- 然后,您必须添加一个Excel连接管理器,然后选择要导入的Excel文件(只需选择一个示例来定义其元数据
- 将默认值
选择*从[Sheet1 $ A2:I]
分配给变量@ [User :: strQuery]
- 在数据流任务中添加一个Excel Source,从变量中选择 SQL命令,然后选择
@ [User :: strQuery]
- 转到列标签并使用@BHouse建议的相同方式命名列
- Then you have to add an Excel connection manager, and choose the excel file that you want to import (just select a sample to define the metadata for the first time only)
- Assign a default value of
Select * from [Sheet1$A2:I]
to the variable@[User::strQuery]
- In the Data Flow Task add an Excel Source, choose SQL Command from variable, and select
@[User::strQuery]
- Go to the columns tab and name the columns in the same way that @BHouse suggested
< img src = https://i.stack.imgur.com/i49yI.jpg alt =图片>
图片来自@BHouse答案
Image taken from @BHouse answer
- 设置DataFlow任务
延迟验证
物业True
- 向DataFlow Task添加其他组件
- Set the DataFlow Task
Delay Validation
property toTrue
- Add other components to DataFlow Task
< h1> UPDATE 1:
从OP注释中:有时会有空数据的excel。(即)我们只有标题行不是不是数据...在这种情况下,它会导致整个任务失败
解决方案:
如果您的excel文件不包含任何数据(仅标头),则必须执行以下步骤:
If your excel file contains no data (only header) you have to do these steps:
- 添加类型为boolean *(即
@ [User :: ImportFile]
) - 添加
@ [User :: ImportFile]
到脚本任务ReadWrite变量 - 在脚本任务中检查文件是否包含行
- 如果是,则设置
@ [User :: ImportFile]
= True,否则@@ [User :: ImportFile]
= False - 双击将脚本任务连接到DataFlow的箭头(优先约束)
- 将其类型设置为约束和表达式
-
写下以下表达式
- Add an SSIS variable of type boolean *(i.e.
@[User::ImportFile]
) - Add
@[User::ImportFile]
to the script task ReadWrite variables - In the Script Task check if the file contains rows
- If yes Set
@[User::ImportFile]
= True, else@[User::ImportFile]
= False - Double Click on the arrow (precedence constraint) that connect the script task to the DataFlow
- Set its type to Constraint and Expression
Write the following expression
@[User::ImportFile] == True
注意:新的脚本任务代码为:
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intFirstRow As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:I15]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
For intCount As Integer = 0 To 15
If Not String.IsNullOrEmpty(dtTable.Rows(intCount)(0).ToString) Then
'+1 because datatable is zero based indexed, +1 because we want to start from the second row
intFirstRow = intCount + 2
End If
Next
End Using
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
If intFirstRow = 0 OrElse _
intFirstRow > dtTable.Rows.Count Then
Dts.Variables.Item("ImportFile").Value = False
Else
Dts.Variables.Item("ImportFile").Value = True
End If
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & "A" & intFirstRow.ToString & ":I]"
Dts.TaskResult = ScriptResults.Success
End Sub
更新2:
从OP注释:是否有其他解决方法可用于处理数据流任务在不跳过所有数据流任务的情况下,实际上其中一项任务将记录文件名和数据计数以及所有这些文件名和数据计数,在此处均会丢失
解决方案:
- 只需添加另一个DATA FLOW任务
- 使用另一个连接器并使用表达式
@ [User :: ImportFile] == False
(第一个连接器的相同步骤),将此数据流与脚本任务连接。 em> - 在DataFlow任务中添加一个SCript组件作为源
- 创建要导入到日志的输出列
- 创建包含您需要导入的信息的行
- 添加日志目标
- Just add another DATA FLOW task
- Connect this dataflow with the script task using another connector and with the expression
@[User::ImportFile] == False
(same steps of the first connector) - In the DataFlow Task add a SCript Component as a Source
- Create the Output columns you want to import to Logs
- Create a Row that contains the information you need to import
- Add the Log Destination
或而不是添加另一个数据流任务
,您可以添加执行SQL任务
在日志表中插入一行
Or Instead of adding another Data Flow Task
, you can add an Execute SQL Task
to insert a row in the Log Table
这篇关于导入具有可变标题的Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!