在awk中用双引号转义分隔符 [英] Escaping separator within double quotes, in awk
问题描述
我正在使用awk用,"作为分隔符来解析我的数据,因为输入是一个csv文件.但是,数据中存在用双引号("...")进行转义的,".
I am using awk to parse my data with "," as separator as the input is a csv file. However, there are "," within the data which is escaped by double quotes ("...").
示例
filed1,filed2,field3,"field4,FOO,BAR",field5
我如何忽略双引号中的逗号,",以便可以使用awk正确解析输出?我知道我们可以在excel中做到这一点,但是我们如何在awk中做到这一点呢?
How can i ignore the comma "," within the the double quote so that I can parse the output correctly using awk? I know we can do this in excel, but how do we do it in awk?
推荐答案
使用 GNU awk 4 很简单:
zsh-4.3.12[t]% awk '{
for (i = 0; ++i <= NF;)
printf "field %d => %s\n", i, $i
}' FPAT='([^,]+)|("[^"]+")' infile
field 1 => filed1
field 2 => filed2
field 3 => field3
field 4 => "field4,FOO,BAR"
field 5 => field5
根据OP要求添加一些注释.
Adding some comments as per OP requirement.
来自 GNU关于按内容定义字段:
FPAT的值应为提供常规字符的字符串 表达.此正则表达式描述了每个内容 场地.对于上述CSV数据,每个字段为 不是逗号的任何内容"或双引号,任何内容" 那不是双引号,也不是双引号."如果写成 一个正则表达式常量,我们将有
/([^,]+)|("[^"]+")/
.将其写为字符串 要求我们转义双引号,从而导致:
The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field. In the case of CSV data as presented above, each field is either "anything that is not a comma," or "a double quote, anything that is not a double quote, and a closing double quote." If written as a regular expression constant, we would have
/([^,]+)|("[^"]+")/
. Writing this as a string requires us to escape the double quotes, leading to:
FPAT = "([^,]+)|(\"[^\"]+\")"
使用+
两次,此操作不适用于空白字段,但也可以修复:
Using +
twice, this does not work properly for empty fields, but it can be fixed as well:
按照书面规定,用于FPAT的正则表达式要求每个字段至少包含一个字符.直接修改(将第一个"
+
"更改为"*
")可以使字段为空:
As written, the regexp used for FPAT requires that each field contain at least one character. A straightforward modification (changing the first ‘
+
’ to ‘*
’) allows fields to be empty:
FPAT = "([^,]*)|(\"[^\"]+\")"
这篇关于在awk中用双引号转义分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!