根据键,使用defaultdict python进行聚合 [英] Aggregate sets according to keys with defaultdict python
问题描述
团队(年)|姓氏1, Name1
例如
code> Yankees(1993)| Abbot,Jim
洋基(1994)| Abbot,Jim
洋基(1993)| Assenmacher,Paul
洋基(2000)| Buddies,Mike
洋基(2000)| Canseco,Jose
等等几年和几个团队。
我想根据团队(年)组合删除任何重复的名称(可能发生在原始数据库中有一些冗余信息)来聚合玩家的名称。在这个例子中,我的输出应该是:
Yankees(1993)| Abbot,Jim | Assenmacher,Paul
洋基队(1994)| Abbot,Jim
洋基(2000)| Buddies,Mike | Canseco,Jose
我已经写了这段代码:
file_in = open('filein.txt')
file_out = open('fileout.txt','w +')
$ b从集合导入defaultdict
teams = defaultdict(set)
file_in中的行:
items = [entry.strip()for entry in line.split('|')if entry]
team = items [0]
name = items [1]
teams [团队] .add(name)
我结束了一个由键组成的大字典的团队和年份)和价值观。但我不知道如何进行汇总。
我也可以比较我的最后一套值(例如,有多少球员有扬基1993年和1994年的团队共同?如何做到这一点?
任何帮助都是欣赏
您可以在这里使用元组作为键,例如。 ('Yankees','1994')
:
defaultdict
dic = defaultdict(list)
with open('abc')as f:
for f:
key,val = line.split('|')
key = tuple(x.strip('()')for x在key.split())
vals = [x.strip()for x在val.split(',')]
dic [keys] .append(vals)
print dic
for k,v in dic.iteritems():
print{}({})| {} .format(k [0],k [1],|.join([,.join(x)for x in v]))
输出:
defaultdict(< ;类型'list'>
{('Yankees','1994'):[['Abbot','Jim']],
('Yankees','2000'): ['兄弟','麦克'],['Canseco','何塞']],
('洋基','1993'):[[阿博特],'吉姆' ,'Paul']]})
洋基(1994)|阿博特,吉姆
洋基(2000)|伙计,迈克|坎塞科,何塞
洋基(1993)| Abbot,Jim | Assenmacher,Paul
I have a bunch of lines in text with names and teams in this format:
Team (year)|Surname1, Name1
e.g.
Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose
and so on for several years and several teams. I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:
Yankees (1993)|Abbot, Jim|Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike|Canseco, Jose
I've written this code so far:
file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')
from collections import defaultdict
teams = defaultdict(set)
for line in file_in:
items = [entry.strip() for entry in line.split('|') if entry]
team = items[0]
name = items[1]
teams[team].add(name)
I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.
I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?
Any help is appreciated
You can use a tuple as a key here, for eg. ('Yankees', '1994')
:
from collections import defaultdict
dic = defaultdict(list)
with open('abc') as f:
for line in f:
key,val = line.split('|')
keys = tuple(x.strip('()') for x in key.split())
vals = [x.strip() for x in val.split(', ')]
dic[keys].append(vals)
print dic
for k,v in dic.iteritems():
print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))
Output:
defaultdict(<type 'list'>,
{('Yankees', '1994'): [['Abbot', 'Jim']],
('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],
('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})
Yankees(1994)|Abbot, Jim
Yankees(2000)|Buddies, Mike|Canseco, Jose
Yankees(1993)|Abbot, Jim|Assenmacher, Paul
这篇关于根据键,使用defaultdict python进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!