识别并删除文本文件中的特定隐藏字符 [英] Identify and remove specific hidden characters from text file
问题描述
我有一个包含几个隐藏字符的文本文件。使用猫-v
我能看到它们包含以下;
^ M
^ [[A
块引用>也有
在该行的末尾\\ n
字符。我希望能够和莫名其妙地显示这些。然后我想是能够选择性地
剪切
和SED
这些隐藏字符。我将如何能够实现这一点?我试过
DOS2UNIX的
但这并没有帮助删除任何^ M
字符。我也试过SED S / ^ M //摹
,其中我pressed Ctrl + V M原始数据
从
猫-v
输出的原始数据,
也可在: http://pastebin.com/Vk2i81JC^ MCopying非试块......第一遍(向前)^ M ^ [[A ^ [[A ^ [Arescued:0 B,errsize:0 B,电流率:0 B / S
新股消息:0 B,错误:0,平均速率:0 B / S
OPOS:0 B,运行时间:1秒,读取成功:1秒前
^ MFinished输出通缉
也可在: http://pastebin.com/wfDnrELm
获救:0 B,errsize:0 B,现价:0 B / S
新股消息:0 B,错误:0,平均速率:0 B / S
OPOS:0 B,运行时间:1秒,读取成功:1秒前
完
解决方案尝试以下
TR
这是用来翻译或删除字符命令。下面的命令删除所有比引号内的八进制指定的其他字符八进制\\ 12 - 新行(\\ n)的八进制\\ 11 - TAB(^ I),八进制\\ 40- \\ 176 - 好个字符
有关八进制值的完整参考参考这个网页:
的https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-$c$c-table.htmlTR-CD'\\ 11 \\ 12 \\ 40- \\ 176'< org.txt> new.txt
文件
new.txt
将包含字符删除。要删除^ M之间的字符,并删除不必要的控制字符,使用下面的命令
sed的S / \\ R * \\ r // Gorg.txt | TR-CD'\\ 11 \\ 12 \\ 40- \\ 176'> new.txt
I have a text file that contains several hidden characters. Using
cat -v
I am able to see that they include the following;^M
^[[A
There are also
\n
characters at the end of the line. I would like to be able to display these as well somehow.Then I would like to be able to selectively
cut
andsed
these hidden characters. How would I go able accomplishing this?I've tried
dos2unix
but that didn't help remove any of the^M
characters. I've also triedsed s/^M//g
wherein I pressed ctrl+v m.
Raw data
Output from
cat -v
on the raw data, also available at: http://pastebin.com/Vk2i81JC^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued: 0 B, errsize: 0 B, current rate: 0 B/s ipos: 0 B, errors: 0, average rate: 0 B/s opos: 0 B, run time: 1 s, successful read: 1 s ago ^MFinished
Output wanted
Also available at: http://pastebin.com/wfDnrELm
rescued: 0 B, errsize: 0 B, current rate: 0 B/s ipos: 0 B, errors: 0, average rate: 0 B/s opos: 0 B, run time: 1 s, successful read: 1 s ago Finished
解决方案Try the below
tr
command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotesoctal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.
For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html
tr -cd '\11\12\40-\176' < org.txt > new.txt
The file
new.txt
will contain the characters removed.To remove the characters between ^M and remove the unnecessary control characters use the below command
sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt
这篇关于识别并删除文本文件中的特定隐藏字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!