需要使用UNIX(solaris)有选择地从文件中删除换行符 [英] Need to selectively remove newline characters from a file using unix (solaris)
问题描述
我正在尝试找到一种从文件中选择性删除换行符的方法.我将它们全部删除都没有问题.但是我需要保留一些.
I am trying to find a way to selectively remove newline characters from a file. I have no issues removing all of them..but I need some to remain.
这是错误的输入文件的示例.请注意,许可证ID为COO789&的行. COO012在描述字段中嵌入了换行符,我需要删除这些换行符.
Here is the example of the bad input file. Note that rows with Permit ID COO789 & COO012 have newlines embedded in the description field that I need to remove.
"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians
Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race
weekend",,"05/11/2013","05/11/2013"
这是我需要文件如何显示的示例:
Here is an example of how I need the file to look like:
"Permit Number/Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"
注意:我确实通过删除一些额外的列来简化了文件.逻辑应该能够容纳任意数量的列.实际的完整标题行与所有列均相同.从技术上讲,我希望额外"换行符可以在说明"和位置"列中找到.
NOTE: I did simplify the file by removing a few extra columns. The logic should be able to accommodation any number of columns though. The actual full header line is with all columns is. Technically, I expect the "extra" newlines to be found in Description and Location columns.
"Permit Number/Id","Permit Name","Description","Start Date","End Date","Custom Status","Owner Name","Total Expected Attendance","Location"
我尝试了sed,cut,tr,nawk等.打开所有可以执行此操作的解决方案.可以在UNIX脚本中调用.
I have tried sed, cut, tr, nawk, etc. Open to any solution that can do this..that can be called from within a unix script.
谢谢!
推荐答案
如果必须仅从说明"和位置"字段中删除换行符,则需要一个适当的csv解析器(请考虑Text :: CSV) .您也可以使用GNU awk
轻松地完成此操作,但是不幸的是,您将无法在Solaris上访问gawk
.因此,下一个最佳解决方案是将不以双引号开头的行连接到上一行.您可以使用sed
执行此操作.我在编写此文件时就考虑到了兼容性:
If you must remove newline characters from only within the 'Description' and 'Location' fields, you will need a proper csv parser (think Text::CSV). You could also do this fairly easily using GNU awk
, but you won't have access to gawk
on Solaris unfortunately. Therefore, the next best solution would be to join lines that don't start with a double-quote to the previous line. You can do this using sed
. I've written this with compatibility in mind:
sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file
结果:
"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"
这篇关于需要使用UNIX(solaris)有选择地从文件中删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!