如何删除文件中的重复行... AWK,SED,UNIQ不能处理我的文件 [英] How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file
问题描述
我觉得很多方法可以这样做, AWK
, SED
, UNIQ
,但没有一个正在处理我的文件。
我想删除重复的行。以下是我档案的一部分例子:
KTBX
KFSO
KCLK
KTBX
KFSO
KCLK
PAJZ
PAJZ
注意:当我从文件中剪切并粘贴时,我不得不手动添加换行符,因为某些原因它将所有变量放在一行上。让我觉得我的44000行文本文件其实只有1行?有没有办法修改它,所以我可以删除dups?
可以使用此命令查看所有非打印字符:
od -c oldfile
如果您的所有记录都在一行,您可以使用 sed替换空格(空格) ,标签,换行符):line / g'oldfile> oldfile.1
一旦你有多行,这个awk one-liner:
awk'!x [$ 0] ++'oldfile.1> newfile
我的文件:
KTBX
KFSO
KCLK
PAJZ
I find many ways to do this, AWK
, SED
, UNIQ
, but none of them are working on my file.
I want to delete duplicate lines. Here is an example of part of my file:
KTBX
KFSO
KCLK
KTBX
KFSO
KCLK
PAJZ
PAJZ
NOTE: I had to manually add line feeds when I cut and pasted from the file...for some reason it was putting all the variables on one line. Makes me think that my 44,000 line text file actually has only "1" line? Is there a way to modify it so I can delete dups?
You can see all non-printed characters with this command:
od -c oldfile
If all your records are on one line, you can use sed to replace a whitespace (space, tab, newline) with a linebreak:
sed -e 's/\s\+/\n/g' oldfile > oldfile.1
Once you have multiple lines, this awk one-liner:
awk '!x[$0]++' oldfile.1 > newfile
my outfile:
KTBX
KFSO
KCLK
PAJZ
这篇关于如何删除文件中的重复行... AWK,SED,UNIQ不能处理我的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!