使用BCP插入具有Unicode字符的行 [英] Insert rows with Unicode characters using BCP
问题描述
我正在使用BCP将数据从CSV文件批量上传到SQL Azure(因为不支持BULK INSERT).此命令运行并上传行:
I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net
但是data.csv是UTF8编码的,并且包含非ASCII字符串.这些被损坏.我尝试将-c选项更改为-w:
But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net
但是随后我得到复制0行".
But then I get '0 rows copied'.
我在做什么错,如何使用BCP批量插入Unicode字符?
What am I doing wrong and how do I bulk insert Unicode characters using BCP?
推荐答案
但是data.csv是UTF8编码的
But data.csv is UTF8 encoded
UTF-8编码是主要问题.使用-w
将无济于事,因为在Microsoft-land中,术语"Unicode"几乎总是指UTF-16 Little Endian.
The UTF-8 encoding is the primary issue. Using -w
won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.
解决方案将取决于您作为选项使用的BCP版本是最新版本(13.0/2016)中添加的:
The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):
-
如果使用的是SQL Server 2016之前的SQL Server BCP(版本13.0),则需要将csv文件转换为UTF-16 Little Endian(LE),因为这是Windows/SQL Server/.NET用于所有字符串.并使用
-w
开关.
我使它能够在Notepad ++中将文件编码为"UCS-2 LE BOM",而使用-c
开关,该导入文件失败.
I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the -c
switch.
如果您使用的是SQL Server 2016(版本13.0)或更高版本随附的BCP,则只需在命令行中添加-c -C 65001
. -C
用于代码页",而65001是用于UTF-8的代码页.
If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add -c -C 65001
to the command line. -C
is for "code page", and 65001 is the code page for UTF-8.
用于 bcp实用工具的MSDN页面(在解释中-C
开关的位置):
The MSDN page for bcp Utility states (in the explanation of the -C
switch):
版本13之前的版本(SQL Server 2016)不支持代码页65001(UTF-8编码).以13开头的版本可以将UTF-8编码导入到SQL Server的早期版本中.
Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
更新
对此Microsoft KB文章中所述,通过SP2将对UTF-8/代码页65001的支持添加到SQL Server 2014中:
Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article: