分组并汇总Python中词典列表的值 [英] Group by and aggregate the values of a list of dictionaries in Python

查看:1152
本文介绍了分组并汇总Python中词典列表的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在尝试以一种优雅的方式写一个函数,它会将一个字典列表和一个列表组合在一起。 > 示例

  my_dataset = [
{
'date': datetime.date(2013,1,1),
'id':99,
'value1':10,
'value2':10
},
{
'date':datetime.date(2013,1,1),
'id':98,
'value1':10,
'value2':10
},
{
'date':datetime.date(2013,1,2),
'id'99,
'value1':10,
'value2':10
}
]

group_and_sum_dataset(my_dataset,'date',['value1','value2'])


应该返回:
[
{
'date':datetime.date(2013,1,1),
'value1':20,
'value2':20
},
{
'date':datetime.date(2013,1,2),
'value1':10,
'value2':10
}
]

我已经尝试使用itertools为groupby并求和每个like键值对,但我在这里遗漏的东西。这是我的功能目前的样子:

  def group_and_sum_dataset(dataset,group_by_key,sum_value_keys):
keyfunc = itemgetter(group_by_key)
dataset.sort(key = keyfunc)
new_dataset = []
用于键,itertools.groupby中的索引(dataset,keyfunc):
d = {group_by_key: key}
d.update({k:sum([item [k] for index in index])for sum in value_value_keys})
new_dataset.append(d)
return new_dataset


解决方案

您可以使用 collections.Counter collections.defaultdict



使用dict可以在 O(N),而排序需要 O(NlogN)时间。

 从集合导入defaultdict,Counter 
def solve(dataset,group_by_key,sum_value_keys):
dic = defaultdict(Counter)
数据集中的项目:
key = item [grou p_by_key]
vals = {k:sum_value_keys中的k的项目[k]}
dic [key] .update(vals)
return dic
...
>>> d = solve(my_dataset,'date',['value1','value2'])
>>> d
defaultdict(< class'collections.Counter'>,
{
datetime.date(2013,1,2):Counter({'value2':10,'value1' :
datetime.date(2013,1,1):Counter({'value2':20,'value1':20})
})

Counter 的优点是它会自动求和类似键的值。



示例:

  >>> c = Counter(** {'value1':10,'value2':5})
>>> c.update({'value1':7,'value2':3})
>>> c
计数器({'value1':17,'value2':8})


I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.

Example:

my_dataset = [  
    {
        'date': datetime.date(2013, 1, 1),
        'id': 99,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 1),
        'id': 98,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 2),
        'id' 99,
        'value1': 10,
        'value2': 10
    }
]

group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])

"""
Should return:
[
    {
        'date': datetime.date(2013, 1, 1),
        'value1': 20,
        'value2': 20
    },
    {
        'date': datetime.date(2013, 1, 2),
        'value1': 10,
        'value2': 10
    }
]
"""

I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
    keyfunc = operator.itemgetter(group_by_key)
    dataset.sort(key=keyfunc)
    new_dataset = []
    for key, index in itertools.groupby(dataset, keyfunc):
        d = {group_by_key: key}
        d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
        new_dataset.append(d)
    return new_dataset

解决方案

You can use collections.Counter and collections.defaultdict.

Using a dict this can be done in O(N), while sorting requires O(NlogN) time.

from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
    dic = defaultdict(Counter)
    for item in dataset:
        key = item[group_by_key]
        vals = {k:item[k] for k in sum_value_keys}
        dic[key].update(vals)
    return dic
... 
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
 datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
 datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})

The advantage of Counter is that it'll automatically sum the values of similar keys.:

Example:

>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})

这篇关于分组并汇总Python中词典列表的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆