识别并删除文本文件中的特定隐藏字符 [英] Identify and remove specific hidden characters from text file

查看:359
本文介绍了识别并删除文本文件中的特定隐藏字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含几个隐藏字符的文本文件。使用猫-v 我能看到它们包含以下;


  

^ M


  
  

^ [[A


也有在该行的末尾\\ n 字符。我希望能够和莫名其妙地显示这些。

然后我想是能够选择性地剪切 SED 这些隐藏字符。我将如何能够实现这一点?

我试过 DOS2UNIX的但这并没有帮助删除任何 ^ M 字符。我也试过 SED S / ^ M //摹,其中我pressed Ctrl + V M


原始数据

猫-v 输出的原始数据,
也可在: http://pastebin.com/Vk2i81JC

  ^ MCopying非试块......第一遍(向前)^ M ^ [[A ^ [[A ^ [Arescued:0 B,errsize:0 B,电流率:0 B / S
   新股消息:0 B,错误:0,平均速率:0 B / S
   OPOS:0 B,运行时间:1秒,读取成功:1秒前
^ MFinished

输出通缉

也可在: http://pastebin.com/wfDnrELm

 获救:0 B,errsize:0 B,现价:0 B / S
   新股消息:0 B,错误:0,平均速率:0 B / S
   OPOS:0 B,运行时间:1秒,读取成功:1秒前


解决方案

尝试以下 TR 这是用来翻译或删除字符命令。下面的命令删除所有比引号内的八进制指定的其他字符

八进制\\ 12 - 新行(\\ n)的八进制\\ 11 - TAB(^ I),八进制\\ 40- \\ 176 - 好个字符

有关八进制值的完整参考参考这个网页:
https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-$c$c-table.html

  TR-CD'\\ 11 \\ 12 \\ 40- \\ 176'< org.txt> new.txt

文件 new.txt 将包含字符删除。

要删除^ M之间的字符,并删除不必要的控制字符,使用下面的命令

  sed的S / \\ R * \\ r // Gorg.txt | TR-CD'\\ 11 \\ 12 \\ 40- \\ 176'> new.txt

I have a text file that contains several hidden characters. Using cat -v I am able to see that they include the following;

^M

^[[A

There are also \n characters at the end of the line. I would like to be able to display these as well somehow.

Then I would like to be able to selectively cut and sed these hidden characters. How would I go able accomplishing this?

I've tried dos2unix but that didn't help remove any of the ^M characters. I've also tried sed s/^M//g wherein I pressed ctrl+v m.


Raw data

Output from cat -v on the raw data, also available at: http://pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

Output wanted

Also available at: http://pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished

解决方案

Try the below tr command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes

octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.

For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '\11\12\40-\176' < org.txt > new.txt

The file new.txt will contain the characters removed.

To remove the characters between ^M and remove the unnecessary control characters use the below command

sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt

这篇关于识别并删除文本文件中的特定隐藏字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆