如何在将数据导出到 CSV 平面文件时修复嵌入文本限定符问题? [英] How to fix the embedded text qualifier issue while exporting data to CSV flat file?

查看:30
本文介绍了如何在将数据导出到 CSV 平面文件时修复嵌入文本限定符问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

RFC 4180:

  • 平面文件连接管理器不处理 CSV 文件中的文本分隔符
  • 嵌入式平面文件导入中的引号失败
  • BUG:平面文件连接管理器:多字符文本限定符未加载所有数据
  • MSDN 博客上的以下帖子指出,SQL Server 2012 中关于平面文件源支持嵌入式限定符和每行可变列数

    MSDN 博客上的另一篇文章在 Embedded Qualifiers 部分下说明了相同的内容.

    我知道的解决方法:

    我知道通过编写一个查询来解决该问题的解决方法,该查询将我的列数据中的所有双引号(")替换为两个双引号("") 以便导出的文件最终会包含正确的嵌入限定符数据.这将避免直接从表中提取数据.

    我的问题:

    • 我不知道这个问题是否在 SQL Server 2012 中真正得到解决.是否仅针对 importing 文件修复了此问题,这些文件具有嵌入的文本限定符,not 用于 导出数据到CSV?

    • 很可能,我显然做错了什么而忽略了显而易见的事情.有人可以向我解释一下我在这里做错了什么吗?

    微软连接:

    我已在 Microsoft Connect 网站上提交了错误报告以获取他们的反馈.这是错误报告的链接.如果您同意这是一个错误,请访问以下链接以在 Microsoft Connect 网站上投票.

    导出到 CSV 时嵌入的文本限定符不符合 RFC 4180

    解决方案

    我不会提供这个答案,除非你非常努力地记录它并且在一个月后它被投票了没有答案.所以,这就开始了.您唯一的选择似乎是更改数据或更改工具.

    <块引用>

    很可能,我显然做错了什么而忽略了显而易见的事情.有人可以向我解释我在这里做错了什么吗?

    当工具坏了而供应商不关心时,继续尝试是错误的.是时候切换了.您花了很多精力研究它究竟是如何被破坏的,并证明它不仅违反了 RFC,而且违反了该工具自己的先前版本.你还需要多少证据?

    CSV 也是一个船锚.如果可以选择,最好使用普通的分隔文件格式.对于许多应用程序,制表符分隔是好的.最好的分隔符 IMO 是\",因为该字符在英文文本中没有位置.(另一方面,它不适用于包含 Windows 路径名的数据.)

    CSV 作为交换格式有两个问题.首先,它不是所有的标准;不同的应用程序识别不同的版本,不管 RFC 可能怎么说.第二个(和相关的)是它不构成 CS 术语中的正则语言,这就是它不能被解析为正则表达式的原因.与制表符分隔行的 ^([^\t]*\t)*[\t]*$ 进行比较.CSV 定义复杂性的实际含义是(见上文)处理它们的工具相对缺乏,并且它们倾向于不兼容,尤其是在凌晨.

    如果您使用 CSV 和 DTS 启动,那么您有很好的选择,其中之一是 bcp.exe.它非常快速且安全,因为 Microsoft 多年来一直没有尝试更新它.我对 DTS 了解不多,但如果您必须将其用于自动化,IIRC 有一种方法可以调用外部实用程序.但请注意,bcp.exe 不会可靠地向外壳返回错误状态.

    如果您决定使用 DTS 并坚持使用 CSV,那么实际上您剩下的最佳选择就是编写一个视图,为其适当地准备数据.如果回到那个角落,我会创建一个名为DTS2012CSV"的模式,以便我可以编写 select * from DTS2012CSV.tablename,让任何关心它的人都有机会理解它(因为你会在视图文本的评论中记录它,不是吗?).如果需要,其他人可以将其技术复制到其他损坏的提取物中.

    HTH.

    RFC 4180:

    RFC 4180 defines Common Format and MIME Type for Comma-Separated Values (CSV) Files. One of the requirements of the RFC 4180 is stated as below. This is the point #7 in the RFC link.

    If double-quotes are used to enclose fields, then a double-quote
    appearing inside a field must be escaped by preceding it with
    another double quote.  For example:
    
    "aaa","b""bb","ccc"
    

    SQL Server 2000:

    DTS Export/Import Wizard in SQL Server 2000 seems to conform to the above mentioned standards even though the RFC 4180 itself seem to have been published only on October 2005. I am using the below stated SQL Server 2000 version.

    Microsoft SQL Server  2000 - 8.00.2039 (Intel X86) 
    May  3 2005 23:18:38 
    Copyright (c) 1988-2003 Microsoft Corporation
    Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
    

    SQL Server 2012:

    SQL Server Import and Export Wizard in SQL Server 2012 does not export the data from table to CSV file according to the standard defined in RFC 4180. I am using the below stated SQL Server 2012 version.

    Microsoft SQL Server 2012 - 11.0.2316.0 (X64) 
    Apr  6 2012 03:20:55 
    Copyright (c) Microsoft Corporation
    Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)
    

    Issue Simulation:

    Here is a sample that I ran in both SQL Server 2000 and SQL Server 2012. I ran the below query to create a table and insert few records. The ItemDesc column has data with double-quotes in it. My intention is to export the data from both these SQL Server versions using their in-built export data wizard and compare the generated CSV files.

    CREATE TABLE dbo.ItemInformation(
        ItemId nvarchar(20) NOT NULL,
        ItemDesc nvarchar(100) NOT NULL
    ) 
    GO
    
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('100338754', 'Crown Bolt 3/8"-16 x 1" Stainless-Steel Hex Bolt');
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('202255836', 'Simpson Strong-Tie 5/8" SSTB Anchot Bolt');
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('100171631', 'Grip-Rite #11 x 1-1/2" Electro-Galvanized Steel Roofing Nails');
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('202210289', 'Crown Bolt 1/2" x 3" "Zinc-Plated" Universal Clevis Pin');
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('100136988', 'Tapcon 3/16" x 1-3/4" Climaseal Steel "Flat-Head" Phillips Concrete Anchors (75-Pack)');
    INSERT INTO dbo.ItemInformation (ItemId, ItemDesc) VALUES ('203722101', 'KwikTap 3/16" x 2-1/4" "Flat-Head" Concrete Screws (100-Pack)');
    GO
    

    On the DTS Export/Import Wizard in SQL Server 2000, I used the below settings to export the data to CSV file. I saved the file under the name SQLServer2000_ItemInformation.csv.

    On the SQL Server Import and Export Wizard in SQL Server 2012, I used the below settings to export the data to CSV file. I saved the file under the name SQLServer2012_ItemInformation.csv.

    Here is the comparison between the two files using Beyond Compare. The left side contains the file generated by SQL Server 2000 and the right side contains the file generated by SQL Server 2012. You can notice that the left side file from SQL Server 2000 contains additional double-quotes to compensate the embedded quotes in the data column. This conforms to the standard specified in RFC 4180 but it is clearly missing from the file generated by SQL Server 2012

    Searches on the web:

    I searched for this bug on the web and found the following links. Following are the bug reports on Microsoft Connect. All these issues seem to be related to importing a file but nothing about exporting data. All these bugs have been closed as Fixed.

    Below post on MSDN blog states that changes have been made in SQL Server 2012 with respect to Flat file source supports embedded qualifiers and a variable number of columns per row

    Another post on MSDN blog states the same under the section Embedded Qualifiers.

    Workaround that I know of:

    I know a workaround to fix the issue by writing a query that would replace all double-quotes (") in my column data with two double-quotes ("") so that the exported file will end up with correct embedded qualifier data. This would avoid pulling the data directly from the table as it is.

    My questions:

    • I don't know if this issue has been truly fixed in SQL Server 2012. Has this issue been fixed only for importing files that have embedded text qualifiers and not for exporting data to CSV?

    • Probably, I am clearly doing something wrong and missing the obvious. Could someone please explain to me what I am doing wrong here?

    Microsoft Connect:

    I have submitted a bug report on Microsoft Connect website to get their feedback. Here is the link to the bug report. If you agree that this is a bug, please visit the below link to vote up on Microsoft Connect website.

    Embedded text qualifier during export to CSV does not conform to RFC 4180

    解决方案

    I wouldn't offer this answer except that you worked so hard to document it and it's been upvoted with no answer after a month. So, here goes. Your only choices appear to be to change the data or change the tool.

    Probably, I am clearly doing something wrong and missing the obvious. Could someone please explain to me what I am doing wrong here?

    When the tool is broken and the vendor doesn't care, it's mistake to keep trying. It's time to switch. You put a lot of effort into researching exactly how it's broken and demonstrating it violates not only the RFC but the tool's own prior version. How much more evidence do you need?

    CSV is a boat anchor too. If you have the option, you're better off using an ordinary delimited file format. For lots of applications, tab-delimited is good. The best delimiter IMO is '\' because that character has no place in English text. (On the other hand it won't work for data containing Windows pathnames.)

    CSV has two problems as an exchange format. First, it's not all that standard; different applications recognize different versions, whatever the RFC may say. Second (and related) is that it doesn't constitute a regular language in CS terms, which is why it can't be parsed as a regular expression. Compare with ^([^\t]*\t)*[\t]*$ for a tab-delimited line. The practical implication of the complexity of CSV's definition is (see above) the relative dearth of tools to handle them and their tendency to be incompatible, particularly during the wee hours.

    If you give CSV and DTS the boot, you have good options, one of which is bcp.exe. It's very fast, and safe because Microsoft hasn't been tempted to update it for years. I don't know much about DTS, but in case you have to use it for automation, IIRC there is a way to invoke external utilities. Beware though, that bcp.exe does not return error status to the shell dependably.

    If you're determined to use DTS and to stick with CSV, then really your best remaining option is to write a view that prepares the data appropriately for it. I would, if backed into that corner, create a schema called, say, "DTS2012CSV", so that I could write select * from DTS2012CSV.tablename, giving anyone who cares a fighting chance to understand it (because you'll document it, won't you, in comments in the view text?). If need be, others can copy its technique for other broken extracts.

    HTH.

    这篇关于如何在将数据导出到 CSV 平面文件时修复嵌入文本限定符问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆