使用python脚本从csv文件中删除重复的行 [英] Removing duplicate rows from a csv file using a python script

查看:994
本文介绍了使用python脚本从csv文件中删除重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标

我已经从hotmail下载了CSV文件,但是其中有很多重复项.这些重复项是完整的副本,我不知道为什么我的手机会创建它们.

I have downloaded a CSV file from hotmail, but it has a lot of duplicates in it. These duplicates are complete copies and I don't know why my phone created them.

我想去除重复项.

方法

编写一个python脚本以删除重复项.

Write a python script to remove duplicates.

技术规范



Windows XP SP 3
Python 2.7
CSV file with 400 contacts

推荐答案

更新日期:2016

如果您乐意使用有用的 more_itertools 外部库:

If you are happy to use the helpful more_itertools external library:

from more_itertools import unique_everseen
with open('1.csv','r') as f, open('2.csv','w') as out_file:
    out_file.writelines(unique_everseen(f))


@IcyFlame解决方案的更有效版本


A more efficient version of @IcyFlame's solution

with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue # skip duplicate

        seen.add(line)
        out_file.write(line)

要就地编辑同一文件,您可以使用此

To edit the same file in-place you could use this

import fileinput
seen = set() # set for fast O(1) amortized lookup
for line in fileinput.FileInput('1.csv', inplace=1):
    if line in seen: continue # skip duplicate

    seen.add(line)
    print line, # standard output is now redirected to the file

这篇关于使用python脚本从csv文件中删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆