Sed根据引号进行拆分,以处理引号内的逗号以及多个引号内没有逗号的数据 [英] Sed to split based on quotes handling comma within quotes along with data without comma within multiple quotes

查看:87
本文介绍了Sed根据引号进行拆分,以处理引号内的逗号以及多个引号内没有逗号的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据

123,"john,test",John"test,""john"",345

上面的内容需要按照下面的内容进行拆分,

The above needs to be split as per below,

123

"john,test"

John"test

""john""

345

我尝试使用sed在拆分时处理引号内的逗号,但对于多个双引号内的数据无法正确显示.而且中间的双引号数据也不会得到处理.我尝试使用awk,但由于我们使用的是较早版本的awk,因此无法使用fpat功能.

I tried using sed to handle comma within quotes while splitting but for the data which is within multiple double quotes is not displayed correctly. And also the data having double quotes in between also is not getting handled. I tried using awk but couldn't make use of fpat feature as we have older version of awk.

您能提供同样的解决方案吗?

Can you help with the solution for the same?

推荐答案

这可能对您有用(GNU sed):

This might work for you (GNU sed):

sed -r 's/([^",]*("[^"]*"[^",]*)*),/\1\n/g' file

将所有用双引号引起来的逗号替换为换行符.

Replace all commas not surrounded by double quotes with newlines.

更深入:将零个或多个不包含双引号或逗号的字符分组,然后将零个或多个双引号组,然后是零个或多个非双引号字符(可能是逗号),然后是双引号后跟零个或多个不包含双引号的字符,后跟一个逗号,并用换行符替换最后一个逗号.在整个文件中全局执行此操作.

In more depth: Group zero or more characters that do not contain double quotes or commas, followed by zero or more groups of double quotes followed by zero or more non double quote characters (may be commas), followed by a double quote followed by zero or more characters that do not contain double quotes, followed by a comma, and replace the final comma by a newline. Do this globally throughout the file.

现在,如果双引号或逗号被引用...

Now if the double quotes or commas are quoted ...

这篇关于Sed根据引号进行拆分,以处理引号内的逗号以及多个引号内没有逗号的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆