在CSV中替换SSIS导入的双引号(在限定符中) [英] Replace double quotes (within qualifiers) in CSV for SSIS import

查看:412
本文介绍了在CSV中替换SSIS导入的双引号(在限定符中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个SSIS包从.csv文件导入数据。这个文件有doulbe报价(限定词为其中每个条目,但也在之间)我也添加了逗号()作为列分隔符。我不能给你我正在处理的原始数据,但这里是一个示例如何我的数据传递在平面文件源:

I have a SSIS package importing data from a .csv file. This file has doulbe quotes (") qualifiers for each entry in it but also in between. I also added commas (,) as a column delimiter. I can't give you the original data I'm working with but here is an example how my data is passed in Flat File Source:

"ID-1","A "B"", C, D, E","Today"
"ID-2","A, B, C, D, E,F","Yesterday"
"ID-3","A and nothing else","Today"

正如你可以看到第二列可以包含引号(和逗号),它粉碎我的SSIS导入,并指向这一行的错误。
我不太熟悉正则表达式,但我听说这在这种情况下可能有帮助。

As you can see the second column can contain quotes (and commas) which smashes my SSIS import with an error pointing at this line. I'm not really familiar with regular expressions, but I've heard that this might help in this case.

在我眼里,我需要更换所有双引号()单引号(')except ...

In my eyes I need to replace all the double quotes (") by single quotes (') except...


  • ...在一行开头处的所有引号

  • ...一行结尾处的所有引号

  • ...

  • ...all quotes at the beginning of one line
  • ...all quotes at the end of one line
  • ...quotes which are part of ","

推荐答案

使用这个简单的正则表达式替换双引号,使用单引号,这个正则表达式允许在行的开始和/或结束处的空格。

To replace double quotes with single quotes according to your specifications, use this simple regex. This regex will allow whitespace at the beginning and/or end of lines.

string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string resultString = Regex.Replace(subjectString, pattern, "'", RegexOptions.Multiline);

这是模式的解释:

// (?<!^\s*|,)"(?!,"|\s*$)
// 
// Options: ^ and $ match at line breaks
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!^\s*|,)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «^\s*»
//       Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
//       Match a single character that is a "whitespace character" (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
//       Match the character "," literally «,»
// Match the character """ literally «"»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!,"|\s*$)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «,"»
//       Match the characters ","" literally «,"»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «\s*$»
//       Match a single character that is a "whitespace character" (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//       Assert position at the end of a line (at the end of the string or before a line break character) «$»

这篇关于在CSV中替换SSIS导入的双引号(在限定符中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆