在制表符分隔的文件linux中的封闭字符串中替换制表符 [英] replace tab in an enclosed string in a tab delimited file linux
问题描述
我有一个制表符分隔的txt文件,其中第三列包含附带的字符串,该字符串也可能具有制表符.由于有这个额外的标签,当我尝试读取此标签定界文件时,我得到了5列.所以我想用空格替换选项卡.
I have a tab delimited txt file in which third column contains enclosed string that might also has a tab. Because of this extra tab i am getting 5 columns when i try to read this tab delimited file. So i want to replace the tab with space.
以下是示例文件.
col1 col2 col3 col4
1 abc "pqr xyz" asd
2 asd "lmn pqr" aws
3 abc "asd" lmn
我想要这样的输出
col1 col2 col3 col4
1 abc "pqr xyz" asd
2 asd "lmn pqr" aws
3 abc "asd" lmn
这是我尝试过的
awk -F"\t" '{ gsub("\t","",$3); print $3 }' file.txt
在那之后我得到以下输出
after that i am getting following output
col3
"pqr
"lmn
"asd"
请帮助
推荐答案
具有GNU awk(gawk),您可以使用以下表达式:
Having GNU awk (gawk) you can use the following expression:
gawk '{gsub("\t"," ",$3)}1' OFS='\t' FPAT='"[^"]*"|[^\t]*' file
此处的键是FPAT
变量.它定义了字段的外观,而不仅仅是指定字段定界符.
The key here is the FPAT
variable. It defines how a field can look like instead of just specifying the field delimiter.
在我们的情况下,字段可以是用双引号"[^"]*"
括起来的非双引号字符序列,或者是零个或多个非制表符[^\t]*
的序列. (零,以正确处理空字段)
In our case a field can either be an sequence of non-double-quote chars enclosed in double quotes "[^"]*"
or a sequence of zero or more non tab characters [^\t]*
. (zero, to handle empty fields properly)
由于我们首先指定了非引号字符的序列,所以它具有优先级.
Since we are specifying the sequence of non quote characters first it has a precedence.
这篇关于在制表符分隔的文件linux中的封闭字符串中替换制表符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!