从文件中删除控制字符 [英] Removing Control Characters from a File

查看:295
本文介绍了从文件中删除控制字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用linux bash命令从文件中删除所有控制字符.

I want to delete all the control characters from my file using linux bash commands.

有一些控制字符,例如EOF(0x1A),尤其是当我在另一软件中加载文件时引起问题.我要删除它.

There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.

这是我到目前为止尝试过的:

Here is what I have tried so far:

这将列出所有控制字符:

this will list all the control characters:

cat -v -e -t file.txt | head -n 10

^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$

这将使用grep列出所有控制字符:

This will list all the control characters using grep:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1

-
-
1
%
-
.
/

与cat命令的上述输出匹配.

matches the above output of cat command.

现在,我运行以下命令以显示所有不包含控制字符的行,但仍显示与上述相同的输出(包含控制字符的行)

Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)

$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1

-
-
1
%
-
.
/

这是十六进制格式的输出:

here is the output in hex format:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050

如您所见,十六进制值0x01、0x18是控制字符.

as you can see, the hex values, 0x01, 0x18 are control characters.

我尝试使用tr命令删除控制字符,但出现错误:

I tried using the tr command to delete the control characters but got an error:

$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.

如果删除所有控制字符,最终将删除换行符和回车符,这些换行符也用作Windows上的换行符.如何删除所有仅保留必需的控制字符,例如"\ r \ n"?

If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?

谢谢.

推荐答案

而不是使用预定义的[:cntrl:]集(如您所见,其中包括\n\r),只需列出(八进制)控制字符即可想摆脱:

Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:

$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt

这篇关于从文件中删除控制字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆