Python-在csv文件中显示具有重复值的行 [英] Python - Display rows with repeated values in csv files

查看:632
本文介绍了Python-在csv文件中显示具有重复值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.csv文件,其中有几列,其中一列填充了随机数,我想在那找到重复的值。如果有-奇怪的情况,但这毕竟是我要检查的内容-,我想显示/存储存储这些值的完整行。

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but it's what I want to check after all -, I would like to display/store the complete row in which those values are stored.

要弄清楚,我是这样的:

To make it clear, I have sth like this:


首先,无论是230,还是其他,

其次,无论如何,11,无论哪个,等等

第三,什么,无论是46,哪个,等等,等等

第四,什么,无论18,哪个,等等,等等

14岁的任何人,等等

第六岁的任何人,48岁的年轻人,等

第七岁的任何人,b等,91等,

18,无论哪个,等等

第九,什么,等等,67,无论哪个,等等

First, Whatever, 230, Whichever, etc
Second, Whatever, 11, Whichever, etc
Third, Whatever, 46, Whichever, etc
Fourth, Whatever, 18, Whichever, etc
Fifth, Whatever, 14, Whichever, etc
Sixth, Whatever, 48, Whichever, etc
Seventh, Whatever, 91, Whichever, etc
Eighth, Whatever, 18, Whichever, etc
Ninth, Whatever, 67, Whichever, etc


第四,什么,18,哪个,等等

第八,什么,18,哪个,等等

Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

要查找重复的值,我将该列存储到字典中,并按顺序计算每个键发现它们出现了多少次。

To find duplicated values, I store that column into a dictionary and I count every key in order to discover how many times they appear.

import csv
from collections import Counter, defaultdict, OrderedDict

with open(file, 'rt') as inputfile:
        data = csv.reader(inputfile)

        seen = defaultdict(set)
        counts = Counter(row[col_2] for row in data)

print "Numbers and times they appear: %s" % counts

然后我看到


Counter({'18':2,'46':1,'67':1,'48' :1,...})

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

现在出现了问题,因为我无法将键与重复项进行链接并进行计算以后再说。如果我这样做

The problem comes now because I don't manage to link the key with the repetitions and compute it later. If I do

for value in counts:
        if counts > 1:
            print counts

我只会拿钥匙,这不是我的钥匙想要和每个值(更不用说我不仅要打印整个行,而且要打印...)

I would be taking only the key, which is not what I want and every value (not to mention that I'm looking to print not only that but the whole line...)

基本上,我正在寻找一种

Basically I'm looking for a way of doing

If there's a repeated number:
        print rows containing those number
else
        print "No repetitions"

预先感谢。

推荐答案

尝试一下可能适合您。

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
    for line in my_file:
        columns = line.strip().split(',')
        if columns[2] not in entries:
            entries.append(columns[2])
        else:
            duplicate_entries.append(columns[2]) 

if len(duplicate_entries) > 0:
    with open('out.txt', 'w') as out_file:
        with open('in.txt', 'r') as my_file:
            for line in my_file:
                columns = line.strip().split(',')
                if columns[2] in duplicate_entries:
                    print line.strip()
                    out_file.write(line)
else:
    print "No repetitions"

这篇关于Python-在csv文件中显示具有重复值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆