根据键,使用defaultdict python进行聚合 [英] Aggregate sets according to keys with defaultdict python

查看:248
本文介绍了根据键,使用defaultdict python进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 团队(年)|姓氏1, Name1 

例如

 code> Yankees(1993)| Abbot,Jim 
洋基(1994)| Abbot,Jim
洋基(1993)| Assenmacher,Paul
洋基(2000)| Buddies,Mike
洋基(2000)| Canseco,Jose

等等几年和几个团队。
我想根据团队(年)组合删除任何重复的名称(可能发生在原始数据库中有一些冗余信息)来聚合玩家的名称。在这个例子中,我的输出应该是:

  Yankees(1993)| Abbot,Jim | Assenmacher,Paul 
洋基队(1994)| Abbot,Jim
洋基(2000)| Buddies,Mike | Canseco,Jose

我已经写了这段代码:

  file_in = open('filein.txt')
file_out = open('fileout.txt','w +')
$ b从集合导入defaultdict
teams = defaultdict(set)

file_in中的行:
items = [entry.strip()for entry in line.split('|')if entry]
team = items [0]
name = items [1]
teams [团队] .add(name)

我结束了一个由键组成的大字典的团队和年份)和价值观。但我不知道如何进行汇总。



我也可以比较我的最后一套值(例如,有多少球员有扬基1993年和1994年的团队共同?如何做到这一点?



任何帮助都是欣赏

解决方案

您可以在这里使用元组作为键,例如。 ('Yankees','1994')

  defaultdict 
dic = defaultdict(list)
with open('abc')as f:
for f:
key,val = line.split('|')
key = tuple(x.strip('()')for x在key.split())
vals = [x.strip()for x在val.split(',')]
dic [keys] .append(vals)
print dic
for k,v in dic.iteritems():
print{}({})| {} .format(k [0],k [1],|.join([,.join(x)for x in v]))

输出:

  defaultdict(< ;类型'list'> 
{('Yankees','1994'):[['Abbot','Jim']],
('Yankees','2000'): ['兄弟','麦克'],['Canseco','何塞']],
('洋基','1993'):[[阿博特],'吉姆' ,'Paul']]})

洋基(1994)|阿博特,吉姆
洋基(2000)|伙计,迈克|坎塞科,何塞
洋基(1993)| Abbot,Jim | Assenmacher,Paul


I have a bunch of lines in text with names and teams in this format:

Team (year)|Surname1, Name1

e.g.

Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose

and so on for several years and several teams. I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:

Yankees (1993)|Abbot, Jim|Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike|Canseco, Jose

I've written this code so far:

file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')

from collections import defaultdict
teams = defaultdict(set)

for line in file_in:
    items = [entry.strip() for entry in line.split('|') if entry]    
    team = items[0]
    name = items[1]
    teams[team].add(name)

I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.

I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?

Any help is appreciated

解决方案

You can use a tuple as a key here, for eg. ('Yankees', '1994'):

from collections import defaultdict
dic = defaultdict(list)
with open('abc') as f:
    for line in f:
        key,val  = line.split('|')
        keys = tuple(x.strip('()') for x in key.split())
        vals = [x.strip() for x in val.split(', ')]
        dic[keys].append(vals)
print dic
for k,v in dic.iteritems():
    print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))

Output:

defaultdict(<type 'list'>, 
{('Yankees', '1994'): [['Abbot', 'Jim']],
 ('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],
 ('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})

Yankees(1994)|Abbot, Jim
Yankees(2000)|Buddies, Mike|Canseco, Jose
Yankees(1993)|Abbot, Jim|Assenmacher, Paul

这篇关于根据键,使用defaultdict python进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆