为每个字段增强插入双引号 [英] insert double quotes for each field enhancement

查看:105
本文介绍了为每个字段增强插入双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据下面提供的示例寻找下面的输入

I am looking for below input based on the sample provided below

样品

eno~ename~address~zip
123~abc~~560000~"a~b~c"
245~"abc ~ def"~hyd~560102
333~"ghi~jkl"~pub~560103
444~ramdev "abc def"~ram~10000

预期产量

"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"~"a~b~c"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"
"444"~"ramdev ""abc def"""~"ram"~"10000"

当前代码:

awk 'BEGIN{s1="\"";FS=OFS="~"} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' sample

当前代码不适用于最后一行..这是对的增强使用awk为每个字段插入引号

Current code doesn't work for last line.. This is enhancement of insert quotes for each field using awk

推荐答案

这可能对您有用(GNU sed):

This might work for you (GNU sed):

cat <<\! | sed -Ef - file
:a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta; #1
s/.*/~&/                                                             #2
s/~"([^"]*)"/~\1/g                                                   #3
s/"/""/g                                                             #4
s/.//                                                                #5
s/[^~]*/"&"/g                                                        #6
y/\n/~/;                                                             #7
!

此sed脚本的工作方式如下:

This sed script works as follows:

    字符串中的
  1. ~可以与字段定界符混淆.它们需要替换为当前行中不存在的唯一字符.由于sed使用换行符来分隔其输入,因此换行符无法在模式空间中显示,因此是此类字符的理想选择.字段由三种类型的字符串组成:

  1. ~ within strings can be confused with field delimiters. They need to replaced by a unique character which is not present in the current line. As sed uses newlines to delimit its input, a newline cannot be presented in the pattern space and is therefore the perfect choice for such a character. Fields consist of three types of strings:

a)不以双引号开头和结尾且没有引号的字符串.

a) Strings which not start and end with double quotes and have no quoted strings.

b)用双引号引起来的字符串

b) Double quoted strings

c)不以双引号开头和结尾并且在其中用引号引起来的字符串.

c) Strings which not start and end with double quotes and have quoted strings within them.

后面的字符串需要它们中的任何~来代替\n.这可以通过循环当前行以保留不包含~的类型a,b或c的字段,而仅在后面的字符串中替换~来实现.

The latter strings need any ~'s within them to be substituted for \n's. This can be achieved by looping through the current line leaving fields of type a,b or c that do not contain ~'s and only replacing ~'s in the latter strings.

为了便于进行下一步,我们为第一个字符串引入了字段定界符.

To make it easier for the next step, we introduce a field delimiter for the first string.

删除所有用双引号引起来的字段(请参见1b).

Remove all double quotes enclosing fields (see 1b).

所有剩余的双引号都在类型1c的字符串中,并且可以通过在前缀"之前加引号.

All double quotes remaining are within strings of type 1c and can be quoted by prefixing a ".

现在删除步骤2中引入的初始字段定界符.

Now remove the initial field delimiter introduced in step 2.

用双引号将所有字段引起来.

Surround all fields by double quotes.

将步骤1中引入的换行符替换为其原始值,即~.

Replace newlines introduced in step 1 by their original value i.e. ~.

看来,GNU sed有一个错误,即如果翻译命令(y/../../)是脚本中的最后一个命令或一行命令,则需要在其后缀;.

N.B. It appears that GNU sed has a bug whereby if the translate command (y/../../) is the last command within a script or a one line command, it needs to suffixed by a ;.

以上解决方案可以输入很长的一行:

The above solution can be entered on one long line:

sed -E ':a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta;s/.*/~&/;s/~"([^"]*)"/~\1/g;s/"/""/g;s/.//;s/[^~]*/"&"/g;y/\n/~/;' file

这篇关于为每个字段增强插入双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆