python csv:获取子集 [英] python csv: getting subset

查看:63
本文介绍了python csv:获取子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的csv的快照:

here is a snapshot of my csv:

alex    123f    1
harry   fwef    2
alex    sef 3
alex    gsdf    4
alex    wf35    6
harry   sdfsdf  3

我想获取此数据的子集,其中第一列(harry,alex)中至少有4个出现,所以我希望得到的数据集为:

i would like to get the subset of this data where the occurrence of anything in the first column (harry, alex) is at least 4. so i want the resulting data set to be:

alex    123f    1
alex    sef 3
alex    gsdf    4
alex    wf35    6

推荐答案

很显然,在看到所有行之前,您不能决定哪些行是有趣的(因为最后一行可能是将某些计数从3变为4的那一行)从而使一些以前见过的行变得有趣,例如;-).因此,除非您的CSV文件非常庞大,否则首先将其全部吸纳到内存中,作为一个列表...:

Clearly, you cannot decide which rows are interesting until you've seen all rows (since the very last row might be the one turning some count from three to four and thereby making some previously seen rows interesting, for example;-). So, unless your CSV file is horribly huge, suck it all into memory, first, as a list...:

import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))

然后,进行计数-Python 2.7有一个更好的方法,但是假设您像我们大多数人一样仍然使用2.6 ...:

then, do the counting -- Python 2.7 has a better way, but assuming you're still on 2.6 like most of us...:

import collections
counter = collections.defaultdict(int)
for row in data:
    counter[row[0]] += 1

最后进行选择循环...:

and finally do the selection loop...:

for row in data:
    if counter[row[0]] >= 4:
        print row

当然,这会将每个有趣的行打印为粗略显示的列表(带有方括号和项目周围的引号),但是可以通过任何您喜欢的方式对其进行格式化.

Of course, this prints each interesting row as a roughly-hewed list (with square brackets and quotes around the items), but it will be easy to format it in any way you might prefer.

这篇关于python csv:获取子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆