无法删除双引号中包含的回车符和换行符 [英] Unable to remove carriage returns and line feeds in columns enclosed in double quotes

查看:135
本文介绍了无法删除双引号中包含的回车符和换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要删除列数据中所有不可打印的换行符.

我已用双引号将所有列括起来,以轻松删除该列中出现的新行字符,并忽略行尾各行之间的记录定界符.

说,我在文本文件中有4列用逗号分隔,并用引号引起来. 仅当双引号之间出现\ n和\ r字符时,我才尝试删除它们.

当前使用的修剪,但是它删除了每个换行符,并使其成为没有任何记录分隔符的序列文件.

tr -d '\n\r' < in.txt > out.txt

样本数据:

"1","test \ n

样本",数据","col4" \ n

" 2 \ n

,"Test","Sample","data" \ n

"3","Sam \ n

ple","te \ n

st",数据" \ n

预期输出:

"1","testSample","data","col4" \ n

"2",测试",样本",数据". \ n

"3",样本",测试",数据" \ n

有什么建议吗?预先感谢

解决方案

以下是可能的解决方案:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

如果当前行的引号不平衡(即"的奇数),则必须在字段中间结束,因此我们将换行符删掉,追加下一条输入行,然后重新开始循环. /p>

I want to remove any non printable new line characters in the column data.

I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.

Say,I have 4 columns seperated by comma and enclosed by quotes in a text file. I'm trying to remove \n and \r characters only if it is present in between the double quotes

Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.

tr -d '\n\r' < in.txt > out.txt

Sample data:

"1","test\n

Sample","data","col4"\n

"2\n

","Test","Sample","data" \n

"3","Sam\n

ple","te\n

st","data"\n

Expected Output:

"1","testSample","data","col4"\n

"2","Test","Sample","data" \n

"3","Sample","test","data"\n

Any suggestions ? Thanks in advance

解决方案

Here's a possible solution:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

If the current line has unbalanced quotes (i.e. an odd number of "), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.

这篇关于无法删除双引号中包含的回车符和换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆