为什么在导入UTF-8平面文件时SSIS不能识别换行{LF}行定界符? [英] Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

查看:136
本文介绍了为什么在导入UTF-8平面文件时SSIS不能识别换行{LF}行定界符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用SSIS将数据从utf-8编码的平面文件导入SQL Server 2008.这是行尾数据在Notepad ++中的样子:

我还有几张图像显示文件连接管理器的外观:

您可以在文件连接管理器预览中看到数据正确显示.当我尝试导入此数据时,不会导入任何行.我收到一条错误消息,指示未找到行分隔符.您可以在文件连接管理器图像中看到,标题行分隔符和行分隔符都设置为{LF}.这足以生成正确的预览,因此我迷失了为什么它无法导入的原因.我尝试了很多使结果为零的事情:

  • 在SSMS中尝试使用向导导入...结果相同
  • 尝试使用数据转换,没有影响
  • 尝试将行分隔符设置为(0a),结果相同

[平面文件源[582]]警告: 到达数据文件的末尾时, 读取标题行.确保 标头行定界符和 要跳过的标题行是正确的.

感谢您的关注,非常感谢您能提供的任何帮助.

解决方案

原因:

SSIS无法读取文件,并由于列分隔符 Ç (带有cedilla的"c" )和 由于行分隔符 {LF} (换行符).

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

这里是一个示例SSIS软件包,显示了如何使用Script Component解决问题,最后还有另一个示例可以模拟您的问题.

分辨率:

下面的示例程序包是用SSIS 2008 R2编写的.它读取带有行定界符 {LF} 作为单个列值的平面文件;然后使用Script Component拆分数据以将信息插入SQL Server 2008 R2数据库中的表中.

使用 Notepad ++ 创建一个简单的带有几行的平面文件.下面的示例文件在每行上以 Ç 分隔,以 Product Id List Price 信息作为列定界符,并且每行以结尾> {LF} 定界符.

在Notepad ++上,单击 Encoding ,然后单击 Encoding in UTF-8 ,以 UTF-8 编码保存平面文件./p>

该示例将使用一个名为 Sora SQL Server 2008 R2数据库.使用以下给定脚本创建一个名为dbo.ProductListPrice的新表. SSIS会将平面文件数据插入此表中.

USE Sora;
GO

CREATE TABLE dbo.ProductListPrice
(
        ProductId   nvarchar(30)    NOT NULL
    ,   ListPrice   numeric(12,2)   NOT NULL
);
GO

使用 Business Intelligence Development Studio(BIDS)2008 R2 创建SSIS程序包.将该包命名为 SO_6268205.dtsx .创建一个名为 Sora.ds 的数据源,以连接到 SQL Server 2008 R2 中的数据库Sora.

右键单击程序包内的任何位置,然后单击 Variables 以查看变量窗格.在包范围 SO_6268205 中创建一个数据类型为 String 的名为 ColumnDelimiter 的新变量,并将该变量的值设置为 Ç

右键单击 Connection Managers 并单击 New Flat File Connection... 创建一个连接以读取平面文件.

平面文件连接管理器编辑器 General 页面上,执行以下操作:

  • 连接管理器名称设置为 ProductListPrice
  • 说明设置为 Flat file connection manager to read product list price information.
  • 选择平面文件路径.我在路径 C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt
  • 中有文件
  • 标题行分隔符
  • 中选择 {LF}
  • 检查 Column names in the first data row
  • 点击 Columns 页面

平面文件连接管理器编辑器 Columns 页面上,确认 Column delimiter 为空白并被禁用.点击 Advanced 页面.

平面文件连接管理器编辑器 Advanced 页面上,执行以下操作.

  • 名称设置为 LineData
  • 验证是否将列定界符设置为 {LF}
  • DataType 设置为 Unicode string [DT_WSTR]
  • OutputColumnWidth 设置为 255
  • 单击 Preview 页面.

平面文件连接管理器编辑器 Preview 页面上,验证显示的数据看起来正确无误,然后单击 OK

您将在包底部的 Connection Managers 选项卡上看到数据源 Sora 和平面文件连接管理器 ProductListPrice

Data Flow Task拖放到程序包的 Control Flow 标签上,并将其命名为 File to database - Without Cedilla delimiter

双击数据流任务,将视图切换到包上的 Data Flow 选项卡.将 Flat File Source 拖放到数据流选项卡上.双击平面文件源以打开 Flat File Source Editor .

平面文件源编辑器 Connection Manager 页面上,选择平面文件连接管理器 ProductListPrice,然后单击列" 页面.

平面文件源编辑器 Columns 页面上,检查列 LineData 并单击 .

Script Component 拖放到 Flat File Source 下面的 Data Flow 标签上,选择 Transformation ,然后单击 OK .将绿色箭头从平面文件源连接到脚本组件.双击脚本组件以打开 Script Transformation Editor .

脚本转换编辑器中单击输入列,然后选择 LineData 列.点击输入和输出页面.

脚本转换编辑器 Inputs and Outputs 页面上,执行以下操作.

  • 将输入名称更改为FlatFileInput
  • 将输出名称更改为 SplitDataOutput
  • 选择输出列,然后单击 Add Column .再次重复此操作以添加另一列.
  • 将第一列命名为 ProductId
  • 将列 ProductId DataType 设置为 Unicode string [DT_WSTR]
  • Length 设置为 30

脚本转换编辑器 Inputs and Outputs 页面上,执行以下操作.

  • 将第二列命名为 ListPrice
  • 将列 ListPrice DataType 设置为 numeric [DT_NUMERIC]
  • Precision 设置为 12
  • Scale 设置为 2
  • 点击"脚本"页面以修改脚本

脚本转换编辑器 Script 页面上,执行以下操作.

  • ReadOnlyVariables 上单击省略号按钮,然后选择变量 User::ColumnDelimiter
  • 点击 Edit Script...

在脚本编辑器中粘贴以下C#.该脚本执行以下任务.

  • 使用变量 User :: ColumnDelimiter 中定义的列定界符值 Ç ,方法 FlatFileInput_ProcessInputRow 拆分输入值并将其分配给在脚本组件转换中定义的两个输出列.

C#中的脚本组件代码

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void PreExecute()
    {
        base.PreExecute();
    }

    public override void PostExecute()
    {
        base.PostExecute();
    }

    public override void FlatFileInput_ProcessInputRow(FlatFileInputBuffer Row)
    {
        const int COL_PRODUCT = 0;
        const int COL_PRICE = 1;

        char delimiter = Convert.ToChar(this.Variables.ColumnDelimiter);
        string[] lineData = Row.LineData.ToString().Split(delimiter);

        Row.ProductId = String.IsNullOrEmpty(lineData[COL_PRODUCT]) 
                            ? String.Empty 
                            : lineData[COL_PRODUCT];

        Row.ListPrice = String.IsNullOrEmpty(lineData[COL_PRICE]) 
                            ? 0 
                            : Convert.ToDecimal(lineData[COL_PRICE]);
    }
}

OLE DB Destination 拖放到 Data Flow 标签上.将绿色箭头从脚本组件连接到 OLE DB目标.双击 OLE DB目标以打开 OLE DB Destination Editor .

OLE DB目标编辑器 Connection Manager 页面上,执行以下操作.

  • OLE DB连接管理器
  • 中选择 Sora
  • 数据访问模式
  • 中选择 Table or view - fast load
  • 从表或视图的名称中选择 [dbo].[ProductListPrice]
  • 点击"映射"页面

如果输入和输出列名称相同,则在 OLE DB目标编辑器上单击

页面Mappings将自动映射列.点击 OK .

配置所有组件后,

数据流标签应该看起来像这样.

SQL Server Management Studio(SSMS)中执行查询select * from dbo.ProductListPrice,以查找表中的行数.在执行包之前,它应该为空.

执行程序包.您会注意到该程序包成功处理了 9 行.平面文件包含 10 行,但第一行是带有列名的标题.

SQL Server Management Studio(SSMS)中执行查询select * from dbo.ProductListPrice,以查找成功插入表中的 9 行.数据应与平面文件数据匹配.

上面的示例说明了如何使用脚本组件手动分割数据,因为平面文件连接管理器在配置列定界符 Ç <时遇到错误/strong>

问题模拟:

此示例显示了一个单独的 Flat File Connection Manager ,该文件配置了列定界符 Ç ,该行执行但遇到警告并且不处理任何行.

Connection Managers 上单击鼠标右键,然后单击 New Flat File Connection... 创建一个连接以读取平面文件.在平面文件连接管理器编辑器 General 页面上,执行以下操作:

  • 连接管理器名称设置为 ProductListPrice_Cedilla
  • 将说明设置为 Flat file connection manager with Cedilla column delimiter.
  • 我的文件位于路径 C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt 中.选择平面文件路径.
  • 标题行分隔符
  • 中选择 {LF}
  • 检查 Column names in the first data row
  • 点击 Columns 页面

平面文件连接管理器编辑器 Columns 页面上,执行以下操作:

  • 行定界符设置为 {LF}
  • 列分隔符字段可能被禁用.点击 Reset Columns
  • 列定界符设置为 Ç
  • 点击 Advanced 页面

平面文件连接管理器编辑器 Advanced 页面上,执行以下操作:

  • 名称设置为 ProductId
  • ColumnDelimiter 设置为 Ç
  • DataType 设置为 Unicode string [DT_WSTR]
  • Length 设置为 30
  • 点击列ListPrice

平面文件连接管理器编辑器 Advanced 页面上,执行以下操作:

  • 名称设置为 ListPrice
  • ColumnDelimiter 设置为 {LF}
  • DataType 设置为 numeric [DT_NUMERIC]
  • DataPrecision 设置为 12
  • DataScale 设置为 2
  • 点击 OK

Data Flow task拖放到 Control Flow 选项卡上,并将其命名为 File to database - With Cedilla delimiter .禁用第一个数据流任务.

使用 Flat File Source OLE DB Destination

配置第二个数据流任务

双击平面文件源以打开 Flat File Source Editor .在平面文件源编辑器 Connection Manager 页面上,选择平面文件连接管理器 ProductListPrice_Cedilla,然后单击页面来配置列.点击 OK .

执行程序包.所有组件将显示绿色,表示该过程已成功完成,但不会处理任何行.您会看到 Flat File Source OLE DB Destination

之间没有行号指示

单击 Progress 标签,您将注意到以下警告消息.

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

I am trying to import data from a utf-8 encoded flat file into SQL Server 2008 using SSIS. This is what the end of the row data looks like in Notepad++:

I have a couple more images showing what the file connection manager looks like:

You can see that the data shows correctly in the file connection manager preview. When I try to import this data, no rows import. I get an error message indicating that the row delimiter was not found. You can see in the file connection manager images that the header row delimiter and the row delimiter are both set to {LF}. This was sufficient to generate the correct preview, so I am lost to why it did not work to import. I have tried a number of things that have brought zero results:

  • Tried using the Wizard import in SSMS...same results
  • Tried using data conversion, no impact
  • Tried setting the row delimiter to (0a), same results

[Flat File Source [582]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct.

Thanks for looking at this and I really appreciate any help you can offer.

解决方案

Cause:

SSIS fails to read the file and displays the below warning due to the column delimiter Ç ("c" with cedilla) and not due to the line delimiter {LF} (Line Feed).

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

Here is a sample SSIS package that shows how to resolve the issue using Script Component and at the end there is another example that simulates your issue.

Resolution:

Below sample package is written in SSIS 2008 R2. It reads a flat file with row delimiter {LF} as a single column value; then splits the data using Script Component to insert the information into a table in SQL Server 2008 R2 database.

Use Notepad++ to create a simple flat file with few rows. The below sample file has Product Id and List Price information on each row separated by Ç as column delimiter and each row ends with {LF} delimiter.

On the Notepad++, click Encoding and then click Encoding in UTF-8 to save the flat file in UTF-8 encoding.

The sample will use an SQL Server 2008 R2 database named Sora. Create a new table named dbo.ProductListPrice using the below given script. SSIS will insert the flat file data into this table.

USE Sora;
GO

CREATE TABLE dbo.ProductListPrice
(
        ProductId   nvarchar(30)    NOT NULL
    ,   ListPrice   numeric(12,2)   NOT NULL
);
GO

Create an SSIS package using Business Intelligence Development Studio (BIDS) 2008 R2. Name the package as SO_6268205.dtsx. Create a data source named Sora.ds to connect to the database Sora in SQL Server 2008 R2.

Right-click anywhere inside the package and then click Variables to view the variables pane. Create a new variable named ColumnDelimiter of data type String in the package scope SO_6268205 and set the variable with the value Ç

Right-click on the Connection Managers and click New Flat File Connection... to create a connection to read the flat file.

On the General page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Connection manager name to ProductListPrice
  • Set Description to Flat file connection manager to read product list price information.
  • Select the flat file path. I have the file in the path C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt
  • Select {LF} from Header Row Delimiter
  • Check Column names in the first data row
  • Click Columns page

On the Columns page of the Flat File Connection Manager Editor, verify that the Column delimiter is blank and disabled. Click Advanced page.

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions.

  • Set the Name to LineData
  • Verify that the Column delimiter is set to {LF}
  • Set the DataType to Unicode string [DT_WSTR]
  • Set the OutputColumnWidth to 255
  • Click the Preview page.

On the Preview page of the Flat File Connection Manager Editor, verify that the displayed data looks correct and click OK.

You will see the data source Sora and the flat file connection manager ProductListPrice on the Connection Managers tab at the bottom of the package.

Drag and drop Data Flow Task onto the Control Flow tab of the package and name it as File to database - Without Cedilla delimiter

Double-click the Data Flow Task to switch the view to the Data Flow tab on the package. Drag and drop a Flat File Source on the Data Flow tab. Double-click the Flat File Source to open Flat File Source Editor.

On the Connection Manager page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice and click Columns page.

On the Columns page of the Flat File Source Editor, check the column LineData and click OK.

Drag and drop a Script Component onto the Data Flow tab below the Flat File Source, select Transformation and click OK. Connect the green arrow from Flat File Source to Script Component. Double-click Script Component to open Script Transformation Editor.

Click Input Columns on Script Transformation Editor and select LineData column. Click Inputs and Outputs page.

On the Inputs and Outputs page of the Script Transformation Editor, perform the following actions.

  • Change the inputs name to FlatFileInput
  • Change the outputs name to SplitDataOutput
  • Select Output Columns and click Add Column. Repeat this again to add another column.
  • Name the first column ProductId
  • Set the DataType of column ProductId to Unicode string [DT_WSTR]
  • Set the Length to 30

On the Inputs and Outputs page of the Script Transformation Editor, perform the following actions.

  • Name the second column ListPrice
  • Set the DataType of column ListPrice to numeric [DT_NUMERIC]
  • Set the Precision to 12
  • Set the Scale to 2
  • Click Script page to modify the script

On the Script page of the Script Transformation Editor, perform the following actions.

  • Click the ellipsis button against ReadOnlyVariables and select the variable User::ColumnDelimiter
  • Click Edit Script...

Paste the below C# in the Script Editor. The script performs the following tasks.

  • Using the column delimiter value Ç defined in the variable User::ColumnDelimiter, the method FlatFileInput_ProcessInputRow splits the incoming value and assigns it to the two output columns defined in the Script Component transformation.

Script component code in C#

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void PreExecute()
    {
        base.PreExecute();
    }

    public override void PostExecute()
    {
        base.PostExecute();
    }

    public override void FlatFileInput_ProcessInputRow(FlatFileInputBuffer Row)
    {
        const int COL_PRODUCT = 0;
        const int COL_PRICE = 1;

        char delimiter = Convert.ToChar(this.Variables.ColumnDelimiter);
        string[] lineData = Row.LineData.ToString().Split(delimiter);

        Row.ProductId = String.IsNullOrEmpty(lineData[COL_PRODUCT]) 
                            ? String.Empty 
                            : lineData[COL_PRODUCT];

        Row.ListPrice = String.IsNullOrEmpty(lineData[COL_PRICE]) 
                            ? 0 
                            : Convert.ToDecimal(lineData[COL_PRICE]);
    }
}

Drag and drop OLE DB Destination onto the Data Flow tab. Connect the green arrow from Script Component to OLE DB Destination. Double-click OLE DB Destination to open OLE DB Destination Editor.

On the Connection Manager page of the OLE DB Destination Editor, perform the following actions.

  • Select Sora from OLE DB Connection Manager
  • Select Table or view - fast load from Data access mode
  • Select [dbo].[ProductListPrice] from Name of the table or the view
  • Click Mappings page

Click Mappings page on the OLE DB Destination Editor would automatically map the columns if the input and output column names are same. Click OK.

Data Flow tab should look something like this after configuring all the components.

Execute the query select * from dbo.ProductListPrice in the SQL Server Management Studio (SSMS) to find the number of rows in the table. It should be empty before executing the package.

Execute the package. You will notice that the package successfully processed 9 rows. The flat file contains 10 lines but the first row is header with column names.

Execute the query select * from dbo.ProductListPrice in the SQL Server Management Studio (SSMS) to find the 9 rows successfully inserted into the table. The data should match with flat file data.

The above example illustrated how to manually split the data using Script Component because the Flat File Connection Manager encounters error when configured the column delimiter Ç

Issue Simulation:

This example shows a separate Flat File Connection Manager configured with column delimiter Ç, which executes but encounters a warning and does not process any lines.

Right-click on the Connection Managers and click New Flat File Connection... to create a connection to read the flat file. On the General page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Connection manager name to ProductListPrice_Cedilla
  • Set Description to Flat file connection manager with Cedilla column delimiter.
  • I have the file in the path C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt Select the flat file path.
  • Select {LF} from Header Row Delimiter
  • Check Column names in the first data row
  • Click Columns page

On the Columns page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Row delimiter to {LF}
  • The column delimiter field may be disabled. Click Reset Columns
  • Set Column delimiter to Ç
  • Click Advanced page

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions:

  • Set the Name to ProductId
  • Set the ColumnDelimiter to Ç
  • Set the DataType to Unicode string [DT_WSTR]
  • Set the Length to 30
  • Click column ListPrice

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions:

  • Set the Name to ListPrice
  • Set the ColumnDelimiter to {LF}
  • Set the DataType to numeric [DT_NUMERIC]
  • Set the DataPrecision to 12
  • Set the DataScale to 2
  • Click OK

Drag and drop a Data Flow task onto the Control Flow tab and name it as File to database - With Cedilla delimiter. Disable the first data flow task.

Configure the second data flow task with Flat File Source and OLE DB Destination

Double-click the Flat File Source to open Flat File Source Editor. On the Connection Manager page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice_Cedilla and click Columns page to configure the columns. Click OK.

Execute the package. All the components will display green color to indicate that the process was success but no rows will be processed. You can see that there are no rows numbers indication between the Flat File Source and OLE DB Destination

Click the Progress tab and you will notice the following warning message.

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

这篇关于为什么在导入UTF-8平面文件时SSIS不能识别换行{LF}行定界符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆