是否可以处理awk中包含换行符的字段? [英] Is it possible to handle fields containing line breaks in awk?

查看:74
本文介绍了是否可以处理awk中包含换行符的字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个文本文件,其中包含以下形式的记录,其中 FS 通常是逗号,而 RS 通常是换行符.

但是,此规则的例外是,如果字段用引号引起来,则应将换行符和逗号视为该字段的一部分.

 该字段包含换行符是引用但它应该被视为单个字段,1,2,3,"另一个字段 

如何使用awk正确解析此类文件,而我仍然可以像往常一样访问 $ 1,$ 2 ... ,但是具有上述字段解释?

我已经查看了此Wiki页面,但是此处提供的解决方案没有解决换行问题.

解决方案

您可能可以使用double新行作为记录分隔符.如果您还将逗号设置为字段分隔符,则可以将每个文本块都作为一个字段来处理:

  awk -v RS ="\ n \ n" -v FS =,"'...'文件 

对于给定的文件,让我们显示文件号以及文件本身:

  $ awk -v RS ="\ n \ n" -v FS =,"'{for(i = 1; i< = NF; i ++)打印i,$ i}'文件1此字段包含换行符是引用但它应该被视为单一字段"2 13 24 35另一个领域" 

Suppose I have a text file with records of the following form, where the FS is generally speaking a comma, and the RS is generally speaking a newline.

However, the exception to this rule is that if a field is in quotes, it should treat the line breaks and commas as part of the field.

"This field contains
line breaks and is
quoted but it 
should be treated as a 
single field",1,2,3,"another field"

How can I use awk to parse such a file correctly, where I can still access $1,$2..., as I usually would, but with the above interpretation of fields?

I have already looked at this wiki page, but the solution presented there does not solve the problem of line breaks.

解决方案

You can probably use double new line as record separator. If you also set comma as the field separator, then this allows you to handle each block of text as a field:

awk -v RS="\n\n" -v FS="," '...' file

For your given file, let's show the file number together with the file itself:

$ awk -v RS="\n\n" -v FS="," '{for (i=1; i<=NF; i++) print i, $i}' file
1 "This field contains
line breaks and is
quoted but it 
should be treated as a 
single field"
2 1
3 2
4 3
5 "another field"

这篇关于是否可以处理awk中包含换行符的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆