如何从文件中删除重复的行? [英] How might I remove duplicate lines from a file?
问题描述
我有一个包含一列的文件.如何删除文件中的重复行?
I have a file with one column. How to delete repeated lines in a file?
推荐答案
在 Unix/Linux 上,使用 uniq
命令,按照 David Locke 的回答,或 sort
,根据威廉珀塞尔的评论.
On Unix/Linux, use the uniq
command, as per David Locke's answer, or sort
, as per William Pursell's comment.
如果您需要 Python 脚本:
If you need a Python script:
lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
更新: sort
/uniq
组合将删除重复项,但返回一个行排序的文件,这可能是也可能不是你要.上面的 Python 脚本不会重新排序行,而只会删除重复项.当然,要让上面的脚本也进行排序,只需省略 outfile.write(line)
,而是在循环之后立即执行 outfile.writelines(sorted(lines_seen))
.
Update: The sort
/uniq
combination will remove duplicates but return a file with the lines sorted, which may or may not be what you want. The Python script above won't reorder lines, but just drop duplicates. Of course, to get the script above to sort as well, just leave out the outfile.write(line)
and instead, immediately after the loop, do outfile.writelines(sorted(lines_seen))
.
这篇关于如何从文件中删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!