我怎样才能从文件中删除重复的行? [英] How might I remove duplicate lines from a file?

查看:117
本文介绍了我怎样才能从文件中删除重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列的文件。如何删除文件中的重复行?

解决方案

在Unix / Linux上,使用 uniq 命令,根据David Locke的回答,或者按照William Pursell的评论 sort 如果你需要一个Python脚本:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ line_seen = set()#包含已经看到的行
outfile = open(outfilename ,w)
打开(infilename,r)的行:
如果行不在lines_seen:#不是重复的
outfile.write(line)
lines_seen .add(line)
outfile.close()

strong sort / uniq 组合会删除重复项,但会返回一个排序行的文件,这可能会也可能不会成为你想要的。上面的Python脚本不会重新排序行,但只是删除重复。当然,为了获得上面的脚本来排序,只需要省略 outfile.write(line),而是在循环之后立即执行 outfile.writelines(sorted(lines_seen))


I have a file with one column. How to delete repeated lines in a file?

解决方案

On Unix/Linux, use the uniq command, as per David Locke's answer, or sort, as per William Pursell's comment.

If you need a Python script:

lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

Update: The sort/uniq combination will remove duplicates but return a file with the lines sorted, which may or may not be what you want. The Python script above won't reorder lines, but just drop duplicates. Of course, to get the script above to sort as well, just leave out the outfile.write(line) and instead, immediately after the loop, do outfile.writelines(sorted(lines_seen)).

这篇关于我怎样才能从文件中删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆