带有áåóñ的散装扁平utf-8 [英] Bulk insert flat utf-8 with áåóñ
问题描述
我有一个普通文件,我想批量插入SQL Server(2016年).
I have a flat file witch I would like to bulk insert into SQL server (2016).
BULK INSERT table FROM 'path\flatfile.dat'
WITH
(
FORMATFILE='path\fileformat.fmt',
CODEPAGE = '65001',
DATAFILETYPE = 'Char'
);
平面文件没有字段终止符:
Flat file has no field terminator:
9999030000000001001ab Baujahr 0
9999030000000001004from construction year 0
9999030000000001006à partir d'année de construction 0
9999030000000001007da anno 0
9999030000000001008año construcción desde 0
9999030000000001009Vanaf bouwjaar 0
9999030000000001010fra byggeår 0
9999030000000001011från årsmodell 0
9999030000000001012fra årsmodell 0
9999030000000001013Vuosimallista 0
但是在格式文件中,我设置了字段的长度
but in format file I set the lenght of field
12.0
7
1 SQLCHAR 0 22 "" 1 Field1 ""
2 SQLCHAR 0 4 "" 2 field2 ""
3 SQLCHAR 0 3 "" 3 field3 ""
4 SQLCHAR 0 9 "" 4 field4 ""
5 SQLCHAR 0 3 "" 5 field ""
6 SQLCHAR 0 60 "" 6 Textfield ""
7 SQLCHAR 0 1 "\r\n" 7 flag ""
除此文件外,其他格式相似的其他文件也很好用. 我遇到的错误是包含字符áåóñ...的行被截断了...
The other files I have with similar format works great except this one. The error I'm getting is truncation of rows which contains caracters áåóñ...
Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 3, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 5, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6
Bulk load data conversion error (truncation) for row 7, column 7 (LFlag).
Msg 4863, Level 16, State 1, Line 6
有什么想法吗?
推荐答案
目前尚不清楚BCP在使用UTF-8时是否完全支持格式文件的使用-您的经验表明并非如此. 导入UTF-8文件时,字节数和字符数之间似乎有些混淆.我认为这可能是BCP中的错误.
It's unclear whether the use of format files is fully supported by BCP when working with UTF-8 - your experience suggests that it isn't. It appears that there's some confusion between byte count and character count when importing a UTF-8 file. I think this is possibly a bug in BCP.
具有以下测试表
USE tempdb
GO
CREATE table t1
(f1 nvarchar(max),
f2 nvarchar(max),
f3 nvarchar(max),
f4 nvarchar(max),
f5 nvarchar(max),
f6 nvarchar(max),
f7 nvarchar(max))
我能够使用问题中的格式文件以及相同的BULK INSERT
命令导入您的文件.
I was able to import your file using the format file in the question, and the same BULK INSERT
command.
但是,值得注意的是,最后一列(我表中的f7
)用左开头填充了许多前导空格,这些空格等于第6列中的双字节字符数.
However, what's notable is that the final column (f7
in my table) is left-padded with a number of leading spaces equal to the number double-byte characters in column 6.
在第3行(第6列的值à partir d'année de construction
)上,第7列用两个空格填充.第5行和第8行也是如此,它们也有两个双字节字符.
On row 3 (column 6 value à partir d'année de construction
), column 7 is padded with two spaces. The same is true of rows 5 and 8, which also have two double-byte characters.
在第9行(第6列的值fra årsmodell
)上,第7列填充了一个空格.第7行也是如此.
On row 9 (column 6 value fra årsmodell
), column 7 is padded with one space. The same is true of row 7.
BCP似乎无法正确跟踪字节数和字符数.
It looks like the byte-count and character-count are not being tracked correctly by BCP.
假设输入文件的格式是固定的,那么我能看到的解决此问题的唯一方法是将数据文件导入具有单个nvarchar(max)
列的表中,然后使用LEFT
,RIGHT
和SUBSTRING
语句从行中提取数据值:
Assuming the format of your input file is fixed, the only option I can see to work around this issue would be to import the data file into a table with single nvarchar(max)
column, then use LEFT
, RIGHT
and SUBSTRING
statements to extract data values from the rows:
CREATE TABLE t2
(f1 nvarchar(max))
BULK INSERT t2 FROM 'file\location' WITH (CODEPAGE = '65001',DATAFILETYPE = 'Char' );
SELECT LEFT(f1,22), SUBSTRING(f1,23,4), SUBSTRING(f1,27,3),
SUBSTRING(f1,30,9), SUBSTRING(f1,39,3), SUBSTRING(f1,42,60),
SUBSTRING(f1,102,1)
FROM t2
修改
测试了XML格式的文件,该文件也无法正常工作.尽管使用XML格式的文件可以使文件正确加载,但第7列的值会丢失并用空格代替.
Tested an XML format file, which doesn't work correctly either. Although using an XML format file allows the file to load without error, the value for column 7 is lost and replaced by a space.
格式化文件
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="22"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="4"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="3"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="9"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="3"/>
<FIELD ID="6" xsi:type="CharFixed" LENGTH="60"/>
<FIELD ID="7" xsi:type="CharFixed" LENGTH="1"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR="\r\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="C1" xsi:type="SQLCHAR" />
<COLUMN SOURCE="2" NAME="C2" xsi:type="SQLCHAR" />
<COLUMN SOURCE="3" NAME="C3" xsi:type="SQLCHAR" />
<COLUMN SOURCE="4" NAME="C4" xsi:type="SQLCHAR" />
<COLUMN SOURCE="5" NAME="C5" xsi:type="SQLCHAR" />
<COLUMN SOURCE="6" NAME="C6" xsi:type="SQLCHAR" />
<COLUMN SOURCE="7" NAME="C7" xsi:type="SQLCHAR" />
</ROW>
</BCPFORMAT>
(BULK INSERT
命令与问题中的命令相同,但格式文件名称除外.)
(The BULK INSERT
command is the same as in the question, other than the format file name.)
这篇关于带有áåóñ的散装扁平utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!