Python:优雅地将词典与值(sum)合并 [英] Python: Elegantly merge dictionaries with sum() of values

查看:98
本文介绍了Python:优雅地将词典与值(sum)合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并来自多个服务器的日志。每个日志都是元组列表( date count )。 日期可能会出现多次,我希望结果字典保存所有服务器的所有计数的总和。



这是我的尝试,有一些数据,例如:

 从集合import defaultdict 

a = [(13.5,100)]
b = [(14.5,100),(15.5,100)]
c = [(15.5,100) ,(16.5,100)]
input = [a,b,c]

output = defaultdict(int)
for d in input:
for项目d:
输出[item [0]] + = item [1]
print dict(output)

其中:

  {'14.5':100,'16 .5':100, 13.5':100,'15.5':200} 

如预期。



我要去香蕉,因为看到代码的同事。她坚持认为,如果没有这些嵌套的循环,那么必须有一种更加优雅的方式来做到这一点。任何想法?

解决方案

不会比这更简单,我想:

  a = [(13.5,100)] 
b = [(14.5,100),(15.5,100)]
c = [(15.5,100),(16.5,100)]
input = [a,b,c]

from collections import Counter

打印总和(
(Counter(dict(x))for x in input),
Counter())
Counter (也称为多集)是您的数据最自然的数据结构(一种类型的集合哪些元素可以属于多于一个,或者等效地 - 具有语义元素 - > OccurrenceCount的地图,您可以首先使用它,而不是元组列表。



< hr>

也可能:

 从集合导入计数器
从运算符导入添加

print(add(((x))(for(x(x))for $)

使用 reduce(add,seq)而不是o f sum(seq,initialValue)通常更灵活,可以跳过传递冗余初始值。



请注意,您还可以使用 operator.and _ 来查找多重数据集的交集而不是总和。






上面的变体是非常慢的,因为在每个步骤都创建了一个新的计数器。我们来解决这个问题。



我们知道 Counter + Counter 返回一个新的 Counter 合并数据。这是可以的,但是我们想避免额外的创建。我们来使用 Counter.update


update(self,iterable = None, ** kwds)unbound collections.Counter方法



像dict.update(),但添加计数,而不是替换它们。
源可以是一个可迭代的,一个字典或另一个Counter实例。


这就是我们想要的。让我们用与 reduce 兼容的功能来包装,看看会发生什么。

  def updateInPlace(a,b):
a.update(b)
返回一个

print reduce(updateInPlace,(Counter(dict(x))for x in input ))

这比OP的解决方案稍微慢一些。



基准 http://ideone.com/7IzSx < (更新为另一个解决方案,感谢 astynax



(另外:如果你拼命想要一个一线,您可以通过替换 updateInPlace lambda x,y:x.update(y)或x 相同的方式,甚至证明是更快的分秒,但无法阅读。不要: - ))


I'm trying to merge logs from several servers. Each log is a list of tuples (date, count). date may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.

Here's my attempt, with some data for example:

from collections import defaultdict

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

output=defaultdict(int)
for d in input:
        for item in d:
           output[item[0]]+=item[1]
print dict(output)

Which gives:

{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}

As expected.

I'm about to go bananas because of a colleague who saw the code. She insists that there must be a more Pythonic and elegant way to do it, without these nested for loops. Any ideas?

解决方案

Doesn't get simpler than this, I think:

a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]

from collections import Counter

print sum(
    (Counter(dict(x)) for x in input),
    Counter())

Note that Counter (also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently - a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.


Also possible:

from collections import Counter
from operator import add

print reduce(add, (Counter(dict(x)) for x in input))

Using reduce(add, seq) instead of sum(seq, initialValue) is generally more flexible and allows you to skip passing the redundant initial value.

Note that you could also use operator.and_ to find the intersection of the multisets instead of the sum.


The above variant is terribly slow, because a new Counter is created on every step. Let's fix that.

We know that Counter+Counter returns a new Counter with merged data. This is OK, but we want to avoid extra creation. Let's use Counter.update instead:

update(self, iterable=None, **kwds) unbound collections.Counter method

Like dict.update() but add counts instead of replacing them. Source can be an iterable, a dictionary, or another Counter instance.

That's what we want. Let's wrap it with a function compatible with reduce and see what happens.

def updateInPlace(a,b):
    a.update(b)
    return a

print reduce(updateInPlace, (Counter(dict(x)) for x in input))

This is only marginally slower than the OP's solution.

Benchmark: http://ideone.com/7IzSx (Updated with yet another solution, thanks to astynax)

(Also: If you desperately want an one-liner, you can replace updateInPlace by lambda x,y: x.update(y) or x which works the same way and even proves to be a split second faster, but fails at readability. Don't :-))

这篇关于Python:优雅地将词典与值(sum)合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆