带有áåóñ的散装扁平utf-8 [英] Bulk insert flat utf-8 with áåóñ

查看:161
本文介绍了带有áåóñ的散装扁平utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个普通文件,我想批量插入SQL Server(2016年).

I have a flat file witch I would like to bulk insert into SQL server (2016).

 BULK INSERT table FROM 'path\flatfile.dat'
   WITH 
      (
        FORMATFILE='path\fileformat.fmt', 
        CODEPAGE = '65001', 
        DATAFILETYPE = 'Char'
      );

平面文件没有字段终止符:

Flat file has no field terminator:

                      9999030000000001001ab Baujahr                                                  0
                      9999030000000001004from construction year                                      0
                      9999030000000001006à partir d'année de construction                            0
                      9999030000000001007da anno                                                     0
                      9999030000000001008año construcción desde                                      0
                      9999030000000001009Vanaf bouwjaar                                              0
                      9999030000000001010fra byggeår                                                 0
                      9999030000000001011från årsmodell                                              0
                      9999030000000001012fra årsmodell                                               0
                      9999030000000001013Vuosimallista                                               0

但是在格式文件中,我设置了字段的长度

but in format file I set the lenght of field

12.0
    7
    1       SQLCHAR           0       22      ""         1     Field1            ""
    2       SQLCHAR           0       4       ""         2     field2            ""
    3       SQLCHAR           0       3       ""         3     field3            ""
    4       SQLCHAR           0       9       ""         4     field4            ""
    5       SQLCHAR           0       3       ""         5     field             ""
    6       SQLCHAR           0       60      ""         6     Textfield         ""
    7       SQLCHAR           0       1       "\r\n"     7     flag              ""

除此文件外,其他格式相似的其他文件也很好用. 我遇到的错误是包含字符áåóñ...的行被截断了...

The other files I have with similar format works great except this one. The error I'm getting is truncation of rows which contains caracters áåóñ...

    Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 3, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 5, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 7, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6

有什么想法吗?

推荐答案

目前尚不清楚BCP在使用UTF-8时是否完全支持格式文件的使用-您的经验表明并非如此. 导入UTF-8文件时,字节数和字符数之间似乎有些混淆.我认为这可能是BCP中的错误.

It's unclear whether the use of format files is fully supported by BCP when working with UTF-8 - your experience suggests that it isn't. It appears that there's some confusion between byte count and character count when importing a UTF-8 file. I think this is possibly a bug in BCP.

具有以下测试表

USE tempdb
GO
CREATE table t1
(f1  nvarchar(max),
f2  nvarchar(max),
f3  nvarchar(max),
f4  nvarchar(max),
f5  nvarchar(max),
f6  nvarchar(max),
f7  nvarchar(max))

我能够使用问题中的格式文件以及相同的BULK INSERT命令导入您的文件.

I was able to import your file using the format file in the question, and the same BULK INSERT command.

但是,值得注意的是,最后一列(我表中的f7)用左开头填充了许多前导空格,这些空格等于第6列中的双字节字符数.

However, what's notable is that the final column (f7 in my table) is left-padded with a number of leading spaces equal to the number double-byte characters in column 6.

在第3行(第6列的值à partir d'année de construction)上,第7列用两个空格填充.第5行和第8行也是如此,它们也有两个双字节字符.

On row 3 (column 6 value à partir d'année de construction), column 7 is padded with two spaces. The same is true of rows 5 and 8, which also have two double-byte characters.

在第9行(第6列的值fra årsmodell)上,第7列填充了一个空格.第7行也是如此.

On row 9 (column 6 value fra årsmodell), column 7 is padded with one space. The same is true of row 7.

BCP似乎无法正确跟踪字节数和字符数.

It looks like the byte-count and character-count are not being tracked correctly by BCP.

假设输入文件的格式是固定的,那么我能看到的解决此问题的唯一方法是将数据文件导入具有单个nvarchar(max)列的表中,然后使用LEFTRIGHTSUBSTRING语句从行中提取数据值:

Assuming the format of your input file is fixed, the only option I can see to work around this issue would be to import the data file into a table with single nvarchar(max) column, then use LEFT, RIGHT and SUBSTRING statements to extract data values from the rows:

CREATE TABLE t2
(f1  nvarchar(max))

BULK INSERT t2 FROM 'file\location' WITH (CODEPAGE = '65001',DATAFILETYPE = 'Char' );

SELECT LEFT(f1,22), SUBSTRING(f1,23,4), SUBSTRING(f1,27,3),
       SUBSTRING(f1,30,9), SUBSTRING(f1,39,3), SUBSTRING(f1,42,60), 
       SUBSTRING(f1,102,1)
FROM t2

修改

测试了XML格式的文件,该文件也无法正常工作.尽管使用XML格式的文件可以使文件正确加载,但第7列的值会丢失并用空格代替.

Tested an XML format file, which doesn't work correctly either. Although using an XML format file allows the file to load without error, the value for column 7 is lost and replaced by a space.

格式化文件

<?xml version="1.0"?>  
<BCPFORMAT  
       xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"  
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  
  <RECORD>  
    <FIELD ID="1" xsi:type="CharFixed" LENGTH="22"/>  
    <FIELD ID="2" xsi:type="CharFixed" LENGTH="4"/>  
    <FIELD ID="3" xsi:type="CharFixed" LENGTH="3"/>  
    <FIELD ID="4" xsi:type="CharFixed" LENGTH="9"/>  
    <FIELD ID="5" xsi:type="CharFixed" LENGTH="3"/>  
    <FIELD ID="6" xsi:type="CharFixed" LENGTH="60"/>  
    <FIELD ID="7" xsi:type="CharFixed" LENGTH="1"/>  
    <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="\r\n"  />
  </RECORD>  
  <ROW>  
    <COLUMN SOURCE="1" NAME="C1" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="2" NAME="C2" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="3" NAME="C3" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="4" NAME="C4" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="5" NAME="C5" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="6" NAME="C6" xsi:type="SQLCHAR" />  
    <COLUMN SOURCE="7" NAME="C7" xsi:type="SQLCHAR" />  
  </ROW>  
</BCPFORMAT>

(BULK INSERT命令与问题中的命令相同,但格式文件名称除外.)

(The BULK INSERT command is the same as in the question, other than the format file name.)

这篇关于带有áåóñ的散装扁平utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆