sed语句,用于更改/修改CSV分隔符和定界符 [英] sed statement to change/modify CSV separators and delimiters

查看:117
本文介绍了sed语句,用于更改/修改CSV分隔符和定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些CSV文件,其中包含逗号分隔的值,并且某些列值可以包含诸如,.<>!/\;&

I have some CSV files which contains comma seperated values and some of the column values can contain characters like ,.<>!/\;&

我正在尝试将CS​​V转换为逗号分隔并用引号括起来的CSV

I am trying to convert the CSV to be a comma separated, quote enclosed CSV

示例数据:

DateCreated,DateModified,SKU,Name,Category,Description,Url,OriginalUrl,Image,Image50,Image100,Image120,Image200,Image300,Image400,Price,Brand,ModelNumber
2012-10-19 10:52:50,2013-06-11 02:07:16,34,Austral Foldaway 45 Rotary Clothesline,Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers,"Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/Products/7228/1718695.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg,http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg,309.9000 AUD,Austral,FA45GR

我想要实现的输出是

"DateCreated","DateModified","SKU","Name","Category","Description","Url","OriginalUrl","Image","Image50","Image100","Image120","Image200","Image300","Image400","Price","Brand","ModelNumber"
"2012-10-19 10:52:50","2013-06-11 02:07:16","34","Austral Foldaway 45 Rotary Clothesline","Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers","Watch the Product Video            Plenty of Space to Hang a Family Wash  Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green.  Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash.  If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp;  Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil).  To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline      &nbsp;            //           Customer Video Reviews  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/Products/7228/1718695.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@50x50.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@100x100.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@120x120.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@200x200.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@300x300.jpg","http://content.commissionfactory.com.au/Products/7228/1718695@400x400.jpg","309.9000 AUD","Austral","FA45GR"

我们非常感谢您的协助.

Any assistance is GREATLY appreciated.

推荐答案

首先,让我们尝试一下琐碎的(并且不够好")解决方案,该解决方案只会在每个字段(包括已经有双引号的字段)中添加双引号!不是你想要的)

First, lets try the trivial (and "not good enough") solution that just adds a double quote to each field (including those that already have double quotes! Which isn't what you want)

sed -r 's/([^,]*)/"\1"/g'

太好了,第一部分查找其中没有逗号的序列,第二部分在其周围添加双引号,最后一个'g'表示每行执行多次以上

Great, the first part looks for sequences with no commas in them, the second part adds double quotes around them, the final 'g' means doing it more than once per line

这将变成

abc,345, some words ,"some text","text,with,commas"

进入 "abc","345",一些单词","一些文本","文本,"带有,"逗号"

into "abc","345"," some words ",""some text"",""text","with","commas""

一些注意事项:

  • 它正确地将某些单词"用空格括起来,但同时也将开头和结尾的空格括起来.我认为可以,但是可以解决

  • it correctly surrounds "some words" with space between them, BUT also surrounds the initial and final spaces. I assume that's OK but if not it can be fixed

如果该字段已经有引号,它将再次被引用,即BAD.需要修复

If the field already had quotes, it will be quoted again, which is BAD. Needs to be fixed

如果该字段已经有引号并且内部文本带有逗号(不应视为字段分隔符),则这些逗号也会被引用.这也需要解决

if the field already had quotes AND the inner text had commas (which shouldn't be considered field separators) these commas are also quoted. This too needs to be fixed

所以我们要匹配两个不同的正则表达式-有引号的字符串或没有逗号的字段:

So we want to match two different regexps - either there was a quoted string or a field with no commas:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g'

现在的结果将是

"abc","345"," some words ",""some text"",""text,with,commas""

如您所见,我们在原始引用的文本上有一个双引号.我们将不得不使用第二个sed命令删除它:

As you can see, we have a double quote on the originally quoted text. This we will have to remove with a second sed command:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g'

会导致

"abc","345"," some words ","some text","text,with,commas"

是的!

这篇关于sed语句,用于更改/修改CSV分隔符和定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆