在CSV文件中查找重复项总数 [英] Finding total number of duplicates in CSV file

查看：111 发布时间：2021/4/27 19:42:09 python csv

本文介绍了在CSV文件中查找重复项总数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在解析CSV文件，需要您的帮助.我的CSV文件中有重复项.我想告诉Python向我提供重复地址的总数和唯一地址的总数，然后列出它们.我已经成功地找到了地址显示是唯一还是重复的部分，但是现在我想告诉Python也向我提供受尊敬的数字.

I am parsing through a CSV file and require your kind assistance. I have duplicates in my CSV file. I want to tell Python to provide me with the total number of Duplicate Addresses and total number of unique Addresses and then list them. I have successfully got to the part where the Address shows if it's an unique or duplicate but now I want to tell Python to provide me with the respected numbers as well.

import csv

csv_data = csv.reader(file('T:\DataDump\Book1.csv'))

next(csv_data)

already_seen = set()

for row in csv_data:
    Address = row[6]
    if Address in already_seen:
        print('{} is a duplicate Address'.format(Address))
    else:
        print('{} is a unique Address'.format(Address))
        already_seen.add(Address)

推荐答案

您可以通过1次唯一的操作就可以实时检测到重复项，但是您必须完全读取该文件才能知道它是否是个重复项并计算有多少重复项.

You could detect duplicates on the fly with 1 sole pass but you have to fully read the file to know if it's not a duplicate and to count how many duplicates there are.

因此，这里需要2次通过.像这样使用 collections.Counter :

So 2 passes are required here. Use collections.Counter like this:

import csv
import collections

with open(r"T:\DataDump\Book1.csv") as f:
    csv_data = csv.reader(f,delimiter=",")

    next(csv_data)  # skip title line

    count = collections.Counter()

    # first pass: read the file
    for row in csv_data:
        address = row[6]
        count[address] += 1

    # second pass: display duplicate info & compute total
    total_dups = 0
    for address,nb in count.items():
        if nb>1:
            total_dups += nb
            print('{} is a duplicate address, seen {} times'.format(address,nb))
        else:
            print('{} is a unique address'.format(address))
    print("Total duplicate addresses {}".format(toal_dups))

要打印重复地址的总数，您也可以直接执行以下操作:

to print the total number of duplicate addresses you could also do directly:

    print("Total duplicate addresses {}".format(sum(x for x in count.values() if x > 1)))

这篇关于在CSV文件中查找重复项总数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在CSV文件中查找重复项总数 [英] Finding total number of duplicates in CSV file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在CSV文件中查找重复项总数 [英] Finding total number of duplicates in CSV file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭