Python:优雅地将字典与值的 sum() 合并 [英] Python: Elegantly merge dictionaries with sum() of values

查看:20
本文介绍了Python:优雅地将字典与值的 sum() 合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并来自多个服务器的日志.每个日志都是一个元组列表(datecount).date 可能出现多次,我希望生成的字典包含所有服务器的所有计数的总和.

I'm trying to merge logs from several servers. Each log is a list of tuples (date, count). date may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.

这是我的尝试,例如一些数据:

Here's my attempt, with some data for example:

from collections import defaultdict

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

output=defaultdict(int)
for d in input:
        for item in d:
           output[item[0]]+=item[1]
print dict(output)

给出:

{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}

正如预期的那样.

我要发疯了,因为一位同事看到了代码.她坚持认为必须有一种更 Pythonic 和优雅的方式来做到这一点,而没有这些嵌套的 for 循环.有什么想法吗?

推荐答案

没有比这更简单的了,我认为:

Doesn't get simpler than this, I think:

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

from collections import Counter

print sum(
    (Counter(dict(x)) for x in input),
    Counter())

请注意,Counter(也称为多重集)是数据最自然的数据结构(一种元素可以属于多次的集合类型,或等效地 - 具有语义的映射Element -> OccurrenceCount.您可以首先使用它,而不是元组列表.

Note that Counter (also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently - a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.

也有可能:

from collections import Counter
from operator import add

print reduce(add, (Counter(dict(x)) for x in input))

使用 reduce(add, seq) 而不是 sum(seq, initialValue) 通常更灵活,允许您跳过传递冗余初始值.

Using reduce(add, seq) instead of sum(seq, initialValue) is generally more flexible and allows you to skip passing the redundant initial value.

请注意,您还可以使用 operator.and_ 来查找多重集的交集,而不是求和.

Note that you could also use operator.and_ to find the intersection of the multisets instead of the sum.

上述变体非常慢,因为每一步都会创建一个新的 Counter.让我们解决这个问题.

The above variant is terribly slow, because a new Counter is created on every step. Let's fix that.

我们知道Counter+Counter 返回一个新的Counter 合并数据.这没问题,但我们想避免额外的创建.让我们使用 Counter.update 代替:

We know that Counter+Counter returns a new Counter with merged data. This is OK, but we want to avoid extra creation. Let's use Counter.update instead:

update(self, iterable=None, **kwds) 未绑定 collections.Counter 方法

update(self, iterable=None, **kwds) unbound collections.Counter method

像 dict.update() 但添加计数而不是替换它们.源可以是可迭代对象、字典或其他 Counter 实例.

Like dict.update() but add counts instead of replacing them. Source can be an iterable, a dictionary, or another Counter instance.

这就是我们想要的.让我们用一个与 reduce 兼容的函数包装它,看看会发生什么.

That's what we want. Let's wrap it with a function compatible with reduce and see what happens.

def updateInPlace(a,b):
    a.update(b)
    return a

print reduce(updateInPlace, (Counter(dict(x)) for x in input))

这仅比 OP 的解决方案慢一点.

This is only marginally slower than the OP's solution.

基准:http://ideone.com/7IzSx (更新了另一个解决方案,感谢 astynax)

(另外:如果你非常想要一个单线,你可以用 lambda x,y: x.update(y) or x 替换 updateInPlace它的工作方式相同,甚至被证明要快一秒钟,但在可读性方面却失败了.不要:-))

(Also: If you desperately want an one-liner, you can replace updateInPlace by lambda x,y: x.update(y) or x which works the same way and even proves to be a split second faster, but fails at readability. Don't :-))

这篇关于Python:优雅地将字典与值的 sum() 合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆