需要使用UNIX(solaris)有选择地从文件中删除换行符 [英] Need to selectively remove newline characters from a file using unix (solaris)

查看:209
本文介绍了需要使用UNIX(solaris)有选择地从文件中删除换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一种从文件中选择性删除换行符的方法.我将它们全部删除都没有问题.但是我需要保留一些.

I am trying to find a way to selectively remove newline characters from a file. I have no issues removing all of them..but I need some to remain.

这是错误的输入文件的示例.请注意,许可证ID为COO789&的行. COO012在描述字段中嵌入了换行符,我需要删除这些换行符.

Here is the example of the bad input file. Note that rows with Permit ID COO789 & COO012 have newlines embedded in the description field that I need to remove.

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians
Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race 
weekend",,"05/11/2013","05/11/2013"

这是我需要文件如何显示的示例:

Here is an example of how I need the file to look like:

"Permit Number/Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"

注意:我确实通过删除一些额外的列来简化了文件.逻辑应该能够容纳任意数量的列.实际的完整标题行与所有列均相同.从技术上讲,我希望额外"换行符可以在说明"和位置"列中找到.

NOTE: I did simplify the file by removing a few extra columns. The logic should be able to accommodation any number of columns though. The actual full header line is with all columns is. Technically, I expect the "extra" newlines to be found in Description and Location columns.

"Permit Number/Id","Permit Name","Description","Start Date","End Date","Custom Status","Owner Name","Total Expected Attendance","Location"

我尝试了sed,cut,tr,nawk等.打开所有可以执行此操作的解决方案.可以在UNIX脚本中调用.

I have tried sed, cut, tr, nawk, etc. Open to any solution that can do this..that can be called from within a unix script.

谢谢!

推荐答案

如果必须仅从说明"和位置"字段中删除换行符,则需要一个适当的csv解析器(请考虑Text :: CSV) .您也可以使用GNU awk轻松地完成此操作,但是不幸的是,您将无法在Solaris上访问gawk.因此,下一个最佳解决方案是将不以双引号开头的行连接到上一行.您可以使用sed执行此操作.我在编写此文件时就考虑到了兼容性:

If you must remove newline characters from only within the 'Description' and 'Location' fields, you will need a proper csv parser (think Text::CSV). You could also do this fairly easily using GNU awk, but you won't have access to gawk on Solaris unfortunately. Therefore, the next best solution would be to join lines that don't start with a double-quote to the previous line. You can do this using sed. I've written this with compatibility in mind:

sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file

结果:

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"

这篇关于需要使用UNIX(solaris)有选择地从文件中删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆