在 Linux 上删除 Windows 换行符(sed 与 awk) [英] Removing Windows newlines on Linux (sed vs. awk)
问题描述
有一些分隔文件,在字段中间(不是行尾)放置不正确的换行符,在 Vim 中显示为 ^M.它们源自 MSSQL 数据库的 freebcp(在 Centos 6 上)导出.以十六进制转储数据显示 模式:
Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows patterns:
$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43
我可以使用 awk 删除它们,但无法使用 sed 删除它们.
I can remove them with awk, but am unable to do the same with sed.
这适用于 awk,完全删除换行符:
This works in awk, removing the line breaks completely:
awk 'gsub(/
/,""){printf $0;next}{print}'
但这在 sed 中没有,保留换行符:
But this in sed does not, leaving line feeds in place:
sed -i 's/
//g'
这似乎没有效果:
sed -i 's/
//g'
在 sed 表达式 (ctrl+v, ctrl+m) 中使用 ^M 似乎也不起作用.
Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.
对于这类任务,sed 更容易理解,但我正在努力学习更多关于两者的知识.sed 使用不当,还是有限制?
For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?
推荐答案
我相信某些版本的 sed
不会将
识别为字符.但是,您可以使用 bash
功能来解决该限制:
I believe some versions of sed
will not recognize
as a character. However, you can use a bash
feature to work around that limitation:
echo $string | sed $'s/
//'
在这里,您让 bash
将 '
' 替换为 $'...'
构造中的实际回车符,然后再将其传递给 sed
作为它的命令.(假设您使用 bash
;其他 shell 应该具有类似的构造.)
Here, you let bash
replace '
' with the actual carriage return character inside the $'...'
construct before passing that to sed
as its command. (Assuming you use bash
; other shells should have a similar construct.)
这篇关于在 Linux 上删除 Windows 换行符(sed 与 awk)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!