Python:优雅地将词典与值(sum)合并 [英] Python: Elegantly merge dictionaries with sum() of values
问题描述
我正在尝试合并来自多个服务器的日志。每个日志都是元组列表( date
, count
)。 日期
可能会出现多次,我希望结果字典保存所有服务器的所有计数的总和。
这是我的尝试,有一些数据,例如:
从集合import defaultdict
a = [(13.5,100)]
b = [(14.5,100),(15.5,100)]
c = [(15.5,100) ,(16.5,100)]
input = [a,b,c]
output = defaultdict(int)
for d in input:
for项目d:
输出[item [0]] + = item [1]
print dict(output)
其中:
{'14.5':100,'16 .5':100, 13.5':100,'15.5':200}
如预期。
我要去香蕉,因为看到代码的同事。她坚持认为,如果没有这些嵌套的循环,那么必须有一种更加优雅的方式来做到这一点。任何想法?
不会比这更简单,我想:
a = [(13.5,100)]
$ p $请注意,
b = [(14.5,100),(15.5,100)]
c = [(15.5,100),(16.5,100)]
input = [a,b,c]
from collections import Counter
打印总和(
(Counter(dict(x))for x in input),
Counter())
Counter
(也称为多集)是您的数据最自然的数据结构(一种类型的集合哪些元素可以属于多于一个,或者等效地 - 具有语义元素 - > OccurrenceCount的地图,您可以首先使用它,而不是元组列表。
< hr>
也可能:
从集合导入计数器
从运算符导入添加
print(add(((x))(for(x(x))for $)
使用
reduce(add,seq)
而不是o fsum(seq,initialValue)
通常更灵活,可以跳过传递冗余初始值。
请注意,您还可以使用
operator.and _
来查找多重数据集的交集而不是总和。
上面的变体是非常慢的,因为在每个步骤都创建了一个新的计数器。我们来解决这个问题。
我们知道
Counter + Counter
返回一个新的Counter
合并数据。这是可以的,但是我们想避免额外的创建。我们来使用
Counter.update
update(self,iterable = None, ** kwds)unbound collections.Counter方法
像dict.update(),但添加计数,而不是替换它们。
源可以是一个可迭代的,一个字典或另一个Counter实例。
这就是我们想要的。让我们用与
reduce
兼容的功能来包装,看看会发生什么。def updateInPlace(a,b):
a.update(b)
返回一个
print reduce(updateInPlace,(Counter(dict(x))for x in input ))
这比OP的解决方案稍微慢一些。
基准: http://ideone.com/7IzSx < (更新为另一个解决方案,感谢 astynax )
(另外:如果你拼命想要一个一线,您可以通过
替换
相同的方式,甚至证明是更快的分秒,但无法阅读。不要: - ))updateInPlace
lambda x,y:x.update(y)或xI'm trying to merge logs from several servers. Each log is a list of tuples (
date
,count
).date
may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.Here's my attempt, with some data for example:
from collections import defaultdict a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] output=defaultdict(int) for d in input: for item in d: output[item[0]]+=item[1] print dict(output)
Which gives:
{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}
As expected.
I'm about to go bananas because of a colleague who saw the code. She insists that there must be a more Pythonic and elegant way to do it, without these nested for loops. Any ideas?
解决方案Doesn't get simpler than this, I think:
a=[("13.5",100)] b=[("14.5",100), ("15.5", 100)] c=[("15.5",100), ("16.5", 100)] input=[a,b,c] from collections import Counter print sum( (Counter(dict(x)) for x in input), Counter())
Note that
Counter
(also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently - a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.
Also possible:
from collections import Counter from operator import add print reduce(add, (Counter(dict(x)) for x in input))
Using
reduce(add, seq)
instead ofsum(seq, initialValue)
is generally more flexible and allows you to skip passing the redundant initial value.Note that you could also use
operator.and_
to find the intersection of the multisets instead of the sum.
The above variant is terribly slow, because a new Counter is created on every step. Let's fix that.
We know that
Counter+Counter
returns a newCounter
with merged data. This is OK, but we want to avoid extra creation. Let's useCounter.update
instead:update(self, iterable=None, **kwds) unbound collections.Counter method
Like dict.update() but add counts instead of replacing them. Source can be an iterable, a dictionary, or another Counter instance.
That's what we want. Let's wrap it with a function compatible with
reduce
and see what happens.def updateInPlace(a,b): a.update(b) return a print reduce(updateInPlace, (Counter(dict(x)) for x in input))
This is only marginally slower than the OP's solution.
Benchmark: http://ideone.com/7IzSx (Updated with yet another solution, thanks to astynax)
(Also: If you desperately want an one-liner, you can replace
updateInPlace
bylambda x,y: x.update(y) or x
which works the same way and even proves to be a split second faster, but fails at readability. Don't :-))这篇关于Python:优雅地将词典与值(sum)合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!