Python:删除重复的CSV条目 [英] Python: Removing duplicate CSV entries

查看:131
本文介绍了Python:删除重复的CSV条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个条目的CSV文件.范例csv:

I have a CSV file with multiple entries. Example csv:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
ed, 123, ed@x.com

我正在尝试通过CSV中的特定列删除重复项,但是使用下面的代码,我得到列表索引超出范围".我认为通过将row[1]newrows[1]进行比较,我会找到所有重复项,并且只重写file2.csv中的唯一条目.但这不起作用,我也不明白为什么.

I'm trying to remove the duplicates by a specific column in the CSV however with the code below I'm getting an "list index out of range". I thought by comparing row[1] with newrows[1] I would find all duplicates and only rewrite the unique entries in file2.csv. This doesn't work though and I can't understand why.

f1 = csv.reader(open('file1.csv', 'rb'))
    newrows = []
    for row in f1:
        if row[1] not in newrows[1]:
            newrows.append(row)
    writer = csv.writer(open("file2.csv", "wb"))
    writer.writerows(newrows)

我的最终结果是拥有一个维护文件顺序的列表(set将无法正常工作...对吗?),其外观应如下所示:

My end result is to have a list that maintains the sequence of the file (set won't work...right?) which should look like this:

user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com

推荐答案

row[1]引用当前行(电话)的第二列.一切都很好.

row[1] refers to the second column in the current row (phone). That's all well in good.

但是,您newrows.append(row)将整行添加到列表中.

However, you newrows.append(row) add the entire row to the list.

当您检查row[1] in newrows时,您正在对照完整行列表检查单个电话号码.但这不是您想要的.您只需要检查一个列表或一组电话号码.为此,您可能想要跟踪行和一组观察到的电话号码.

When you check row[1] in newrows you are checking the individual phone number against a list of complete rows. But that's not what you want to do. You need to check against a list or set of just phone numbers. For that, you probably want to keep track of the rows and a set of the observed phone numbers.

类似的东西:

f1 = csv.reader(open('file1.csv', 'rb'))
writer = csv.writer(open("file2.csv", "wb"))
phone_numbers = set()
for row in f1:
    if row[1] not in phone_numbers:
        writer.writerow(row)
        phone_numbers.add( row[1] )

这篇关于Python:删除重复的CSV条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆