使用 Excel 创建带有特殊字符的 CSV 文件,然后使用 SSIS 将其导入数据库 [英] Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

查看:40
本文介绍了使用 Excel 创建带有特殊字符的 CSV 文件,然后使用 SSIS 将其导入数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

拿这个XLS文件

然后我将这个 XLS 文件保存为 CSV,然后用文本编辑器打开它.这是我看到的:

Col1,Col2,Col3,Col4,Col5,Col6,Col71,ABC,"AB""C","D,E",F,03,"3,2"

我看到C列中的双引号字符存储为AB""C,列值用引号括起来,数据中的双引号字符替换为2个双引号字符表示引用发生在数据中,而不是终止列值.我还看到 G 列的值 3,2 用引号括起来,因此很明显逗号出现在数据中,而不是指示新列.到目前为止,一切都很好.

我有点惊讶所有列值都没有用引号括起来,但是当我假设 EXCEL 仅在数据中存在特殊字符(如逗号或 dbl 引号字符)时指定列分隔符时,即使这看起来也合理.

现在我尝试使用 SQL Server 导入 csv 文件.请注意,我指定了一个双引号字符作为文本限定符字符.

和一个命令字符作为列分隔符.但是,请注意 SSIS 错误地导入第 3 列,例如,没有将两个连续的双引号字符翻译为双引号字符的单次出现.

我该怎么做才能让 Excel 和 SSIS 相处融洽?

通常人们通过使用不太可能出现在数据中的列分隔符字符来避免这个问题,但这不是一个真正的解决方案.

我发现如果我修改这个文件

Col1,Col2,Col3,Col4,Col5,Col6,Col71,ABC,"AB""C","D,E",F,03,"3,2"

...至此:

Col1,Col2,Col3,Col4,Col5,Col6,Col71,ABC,"AB"C","D,E",F,03,"3,2"

即,删除 C 列值中的两个连续引号,数据已正确加载,但是,这对我来说有点混乱.首先,SSIS 如何确定 B 和 C 之间的双引号没有终止该列值?是否因为以下字符不是逗号列分隔符或行分隔符 (CRLF)?为什么 Excel 会以这种方式导出?

根据 .我鼓励每个阅读本文的人点击上述链接并投票让他们解决这个问题.这是我遇到的最严重错误的前 10 名.

Take this XLS file

I then save this XLS file as CSV and then open it up with a text editor. This is what I see:

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"

I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.

I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.

Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.

And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.

What do I have to do to get Excel and SSIS to get along?

Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.

I find that if I modify the file from this

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"

...to this:

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"

i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?

According to Wikipedia, here are a couple of traits of a CSV file:

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx

  2. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...

Update:

If I use Notepad change the input file to

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"

Excel reads it just fine

but SSIS returns

The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

解决方案

Conclusion:

Just like the error message says in your update...

The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.

这篇关于使用 Excel 创建带有特殊字符的 CSV 文件,然后使用 SSIS 将其导入数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆