如何从文件中删除重复的行? [英] How might I remove duplicate lines from a file?

查看:22
本文介绍了如何从文件中删除重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一列的文件.如何删除文件中的重复行?

I have a file with one column. How to delete repeated lines in a file?

推荐答案

在 Unix/Linux 上,使用 uniq 命令,按照 David Locke 的回答,或 sort,根据威廉珀塞尔的评论.

On Unix/Linux, use the uniq command, as per David Locke's answer, or sort, as per William Pursell's comment.

如果您需要 Python 脚本:

If you need a Python script:

lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

更新: sort/uniq 组合将删除重复项,但返回一个行排序的文件,这可能是也可能不是你要.上面的 Python 脚本不会重新排序行,而只会删除重复项.当然,要让上面的脚本也进行排序,只需省略 outfile.write(line),而是在循环之后立即执行 outfile.writelines(sorted(lines_seen)).

Update: The sort/uniq combination will remove duplicates but return a file with the lines sorted, which may or may not be what you want. The Python script above won't reorder lines, but just drop duplicates. Of course, to get the script above to sort as well, just leave out the outfile.write(line) and instead, immediately after the loop, do outfile.writelines(sorted(lines_seen)).

这篇关于如何从文件中删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆