为什么itertools.groupby()不起作用? [英] Why itertools.groupby() doesn't work?
问题描述
我已经检查了一些关于 groupby()
的主题,但我没有看到我的例子有什么问题:
students = [{'name':'Paul','mail':'@ gmail.com'},
{'name':'Tom',' mail':'@ yahoo.com'},
{'name':'Jim','mail':'gmail.com'},
{'name':'Jules','mail ':'@ something.com'},
{'name':'Gregory','mail':'@ gmail.com'},
{'name':'Kathrin','mail ':'@ something.com'}]
$ b $ key_func = lambda student:student ['mail']
for key,group in itertools.groupby(students,key = key_func):
print(key)
print(list(group))
这将分别打印每个学生。为什么我不能只获得三组: @ gmail.com
, @ yahoo.com
和<$ c $对于初学者来说,一些邮件是<$ c>
$ c> gmail.com ,有些是 @ gmail.com
这就是为什么它们被视为单独的组。
groupby
也预计数据将被同一个键
函数预先排序,这解释了为什么您得到 @ something.com
两次。
从 docs :
...通常,迭代器需要在同一个按键函数上进行排序。 ...
students = [{'name':'Paul','mail' :'@ gmail.com'},{'name':'Tom','mail':'@ yahoo.com'},
{'name':'Jim','mail':'gmail。 '',{'name':'Jules','mail':'@ something.com'},
{'name':'Gregory','mail':'@ gmail.com'}, {'name':'Kathrin','mail':'@ something.com'}]
key_func = lambda学生:学生['mail']
学生。 sort(key = key_func)
#我们稍后用groupby
作为键,itertools.groupby(students,key = key_func)中的group:
print (key)
print(list(group))
#@ gmail.com
#[{'name':'Paul','mail':'@gmail。 '',{'name':'Gregory','mail':'@ gmail.com'}]
#@ something.com
#[{'name':'Jules','邮件':'@ something.com'},{'name':'Kathrin','邮件':'@ something.com'}]
#@ yahoo.com
#[{'name':'Tom','mail':'@ yahoo.com'}]
#gmail.com
#[{'name':'Jim','mail':'gmail.com'}]
修复了排序和 gmail.com
/ @ gmail.com
后,我们得到预期的输出:
pre $ import itertools
students = [{'name':'Paul', 'mail':'@ gmail.com'},{'name':'Tom','mail':'@ yahoo.com'},
{'name':'Jim','mail': '@ gmail.com'},{'name':'Jules','mail':'@ something.com'},
{'name':'Gregory','mail':'@gmail。 {'name':'Kathrin','mail':'@ something.com'}]
key_func = lambda学生:student ['mail']
students.sort(key = key_func)
for key,group in itertools.groupby(students,key = key_func):
print(key)
print(list(组))
#@ gmail.com
#[{'mail':'@ gmail.com','name':'Paul'},
#{'mail':'@ gmail.com','name':'Jim'},
#{'mail':'@ gmail.com','name':'Gregory'}]
#@ something.com
#[{'mail':'@ something.com','name':'Jules'},
#{'mail':'@ something.com','name':'Kathrin' }]
#@ yahoo.com
#[{'mail':'@ yahoo.com','name':'Tom'}]
I've checked some topics about groupby()
but I don't get what's wrong with my example:
students = [{'name': 'Paul', 'mail': '@gmail.com'},
{'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'},
{'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'},
{'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
This prints each student separately. Why I don't get only 3 groups: @gmail.com
, @yahoo.com
and @something.com
?
For starters, some of the mails are gmail.com
and some are @gmail.com
which is why they are treated as separate groups.
groupby
also expects the data to be pre-sorted by the same key
function, which explains why you get @something.com
twice.
From the docs:
... Generally, the iterable needs to already be sorted on the same key function. ...
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
# sorting by same key function we later use with groupby
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
# [{'name': 'Tom', 'mail': '@yahoo.com'}]
# gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]
After fixing both sorting and gmail.com
/@gmail.com
we get the expected output:
import itertools
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'mail': '@gmail.com', 'name': 'Paul'},
# {'mail': '@gmail.com', 'name': 'Jim'},
# {'mail': '@gmail.com', 'name': 'Gregory'}]
# @something.com
# [{'mail': '@something.com', 'name': 'Jules'},
# {'mail': '@something.com', 'name': 'Kathrin'}]
# @yahoo.com
# [{'mail': '@yahoo.com', 'name': 'Tom'}]
这篇关于为什么itertools.groupby()不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!