为什么itertools.groupby()不起作用? [英] Why itertools.groupby() doesn't work?

查看:89
本文介绍了为什么itertools.groupby()不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经检查了一些关于 groupby()的主题,但我没有看到我的例子有什么问题:

  students = [{'name':'Paul','mail':'@ gmail.com'},
{'name':'Tom',' mail':'@ yahoo.com'},
{'name':'Jim','mail':'gmail.com'},
{'name':'Jules','mail ':'@ something.com'},
{'name':'Gregory','mail':'@ gmail.com'},
{'name':'Kathrin','mail ':'@ something.com'}]
$ b $ key_func = lambda student:student ['mail']

for key,group in itertools.groupby(students,key = key_func):
print(key)
print(list(group))

这将分别打印每个学生。为什么我不能只获得三组: @ gmail.com @ yahoo.com 和<$ c $对于初学者来说,一些邮件是<$ c>

解决方案

$ c> gmail.com ,有些是 @ gmail.com 这就是为什么它们被视为单独的组。



groupby 也预计数据将被同一个函数预先排序,这解释了为什么您得到 @ something.com 两次。



docs


...通常,迭代器需要在同一个按键函数上进行排序。 ...



  students = [{'name':'Paul','mail' :'@ gmail.com'},{'name':'Tom','mail':'@ yahoo.com'},
{'name':'Jim','mail':'gmail。 '',{'name':'Jules','mail':'@ something.com'},
{'name':'Gregory','mail':'@ gmail.com'}, {'name':'Kathrin','mail':'@ something.com'}]

key_func = lambda学生:学生['mail']

学生。 sort(key = key_func)
#我们稍后用groupby

作为键,itertools.groupby(students,key = key_func)中的group:
print (key)
print(list(group))

#@ gmail.com
#[{'name':'Paul','mail':'@gmail。 '',{'name':'Gregory','mail':'@ gmail.com'}]
#@ something.com
#[{'name':'Jules','邮件':'@ something.com'},{'name':'Kathrin','邮件':'@ something.com'}]
#@ yahoo.com
#[{'name':'Tom','mail':'@ yahoo.com'}]
#gmail.com
#[{'name':'Jim','mail':'gmail.com'}]

修复了排序和 gmail.com / @ gmail.com 后,我们得到预期的输出:

pre $ import itertools

students = [{'name':'Paul', 'mail':'@ gmail.com'},{'name':'Tom','mail':'@ yahoo.com'},
{'name':'Jim','mail': '@ gmail.com'},{'name':'Jules','mail':'@ something.com'},
{'name':'Gregory','mail':'@gmail。 {'name':'Kathrin','mail':'@ something.com'}]

key_func = lambda学生:student ['mail']

students.sort(key = key_func)

for key,group in itertools.groupby(students,key = key_func):
print(key)
print(list(组))

#@ gmail.com
#[{'mail':'@ gmail.com','name':'Paul'},
#{'mail':'@ gmail.com','name':'Jim'},
#{'mail':'@ gmail.com','name':'Gregory'}]
#@ something.com
#[{'mail':'@ something.com','name':'Jules'},
#{'mail':'@ something.com','name':'Kathrin' }]
#@ yahoo.com
#[{'mail':'@ yahoo.com','name':'Tom'}]


I've checked some topics about groupby() but I don't get what's wrong with my example:

students = [{'name': 'Paul',    'mail': '@gmail.com'},
            {'name': 'Tom',     'mail': '@yahoo.com'},
            {'name': 'Jim',     'mail': 'gmail.com'},
            {'name': 'Jules',   'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'},
            {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

This prints each student separately. Why I don't get only 3 groups: @gmail.com, @yahoo.com and @something.com?

解决方案

For starters, some of the mails are gmail.com and some are @gmail.com which is why they are treated as separate groups.

groupby also expects the data to be pre-sorted by the same key function, which explains why you get @something.com twice.

From the docs:

... Generally, the iterable needs to already be sorted on the same key function. ...

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)
# sorting by same key function we later use with groupby

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
#  @something.com
#  [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
#  @yahoo.com
#  [{'name': 'Tom', 'mail': '@yahoo.com'}]
#  gmail.com
#  [{'name': 'Jim', 'mail': 'gmail.com'}]

After fixing both sorting and gmail.com/@gmail.com we get the expected output:

import itertools

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'mail': '@gmail.com', 'name': 'Paul'},
#   {'mail': '@gmail.com', 'name': 'Jim'},
#   {'mail': '@gmail.com', 'name': 'Gregory'}]
#  @something.com
#  [{'mail': '@something.com', 'name': 'Jules'},
#   {'mail': '@something.com', 'name': 'Kathrin'}]
#  @yahoo.com
#  [{'mail': '@yahoo.com', 'name': 'Tom'}]

这篇关于为什么itertools.groupby()不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆