优化与dict.update()类似的功能,但添加了常用值 [英] Optimize function similiar to dict.update() but adds common values

查看:56
本文介绍了优化与dict.update()类似的功能,但添加了常用值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿伙计们,


我以为我会把它扔出去,因为每个人都喜欢优化

问题(这是真的,结账对这类

问题的回复数量!)


现在我有一个如下所示的功能。我想结合两个

词典,并将值加在一起,哪里有重叠。

我认为这个功能会相当快,但我在每个字典中都有大约1

百万的键(约80%重叠)并且它只运行所有

晚上。


所以也许我的功能不是O(n)?我真的认为它可能是

字典查找是O(nlogn)?


无论如何这里是函数。任何关于完成这项任务的想法很快就会被贬值。


谢谢!


< code apology =抱歉,如果gmail杀死我的标签>


def add_freqs(freq1,freq2):

"""添加两个单词freq dicts" ;"

newfreq = {}

为密钥,值为freq1.items():

newfreq [key] = value + freq2 .get(key,0)

表示密钥,值为freq2.items():

newfreq [key] = value + freq1.get(key,0)

返回newfreq


freq1 = {

''dog'':1,

''猫'':2,

''人类'':3

}

freq2 = {

' 'perro'':1,

''gato'':1,

''human'':2

}

print add_freqs(freq1,freq2)


answer_I_want = {

''dog'':1,

''猫'':2,

''perro'':1,

''gato'':1,

' '人'':5 <无线电通信/>
}

< / code>

-

Gregory Pi?ero

首席创新官

混合技术

www。 blendedtechnologies.com

解决方案

Gregory Pi?ero写道:

def add_freqs( freq1,freq2):
****""" Add * two * word * freq * dicts""
**** newfreq = {}
* *** for * key,value * in * freq1.items():
******** newfreq [key] = value + freq2.get(key,0)
** ** for * key,value * in * freq2.items():
******** newfreq [key] = value + freq1.get(key,0)
*** * return * newfreq
* * * * * * * * * * * * * * * * * * * $ * $ $ $ $ $ $ $ $ $ $ $ $ $
将整个字典放入元组列表中;

iteritems()只需遍历现有字典并一次创建一个元组




在80%重叠的情况下,你正在查找并设置五个值中的四个

在你的for循环中两次。


转储对称并尝试其中之一:


def add_freqs2(freq1,freq2):

total = dict(freq1)

for key,freq2.iteritems()中的值:

如果键在freq1中:

总计[键] + =值

else:

总计[key] = value

返回总额

def add_freqs3(freq1,freq2):

total = dict(freq1)

for key,value in freq2.iteritems():

try:

total [key] + = value

除了KeyError:

总计[key] = value

返回总额


我的猜测是add_freqs3()表现最好。


Peter


谢谢Peter,这些都是非常好的想法。我等不及试试今晚他们出去了。


这是关于你的功能的问题。如果我只看看

freq2中的键,那么我不会错过freq1中的任何键而不是freq2中的键吗?

这就是为什么我的原始功能中有两个循环。


-Greg

12/14 / 05,Peter Otten< __ ******* @ web.de>写道:

Gregory Pi?ero写道:

def add_freqs(freq1,freq2):
"""" Addtwowordfreqdicts""
newfreq = {}
forkey,valueinfreq1.items():
newfreq [key] = value + freq2.get(key,0)
forkey,valueinfreq2.items ():
newfreq [key] = value + freq1.get(key,0)
returnnewfreq


Anyideasondoingthistaskalot更快将被appriciated。 / blockquote>

使用items()将整个字典复制到元组列表中;
iteritems()只是遍历现有字典并一次创建一个元组


80%重叠,你在for循环中查找并设置五个值中的四个


转储对称性并尝试其中一个:

def add_freqs2(freq1,freq2):
total = dict(freq1)
for key,freq2.iteritems()中的值:
如果键入f req1:
总[key] + = value
否则:
总[key] =值
返回总数

def add_freqs3(freq1,freq2) :
total = dict(freq1)
for key,value in freq2.iteritems():
尝试:
total [key] + = value
除了KeyError:
总共[key] =值
返回总数

我的猜测是add_freqs3()表现最好。

彼得
-
http://mail.python.org/mailman/ listinfo / python-list



-

Gregory Pi?ero

首席创新官

混合技术

www.blendedtechnologies.com


Gregory Pi?ero写道:

[top-posting rearranged]

2005年12月14日,Peter Otten< __ ******* @ web.de>写道:

Gregory Pi?ero写道:

def add_freqs(freq1,freq2):
" "" Addtwowordfreqdicts"""
newfreq = {}
forkey,valueinfreq1.items():
newfreq [key] = value + freq2.get(key,0)
forkey,valueinfreq2.items():
newfreq [key] = value + freq1.get(key,0)
returnnewfreq


Anyideasondoingthistaskalot更快将被appriciated。



使用items()您将整个字典复制到元组列表;
iteritems()只是遍历现有字典并创建一个元组
一次。

80%重叠,你正在查找并在你的for循环中设置两个五个值中的四个。

转储对称并尝试以下方法之一:

def add_freqs2(freq1,freq2):
total = dict(freq1)
for key,freq2中的值.iteritems():
如果键在freq1:
总[键] + =值
否则:
总[键] =值
返回总数

def add_freqs3(freq1,freq2):
total = dict(freq1)
for key,value in freq2.iteritems():
尝试:
total [key] + = value
除了KeyError:
总计[key] = value
返回总数

我的猜测是add_freqs3()表现最好。


谢谢Peter,这些都是非常好的想法。我迫不及待地想在今晚试一试。

这是关于你的功能的问题。如果我只看了
freq2中的键,那么我不会错过freq1中的任何键而不是freq2中的键吗?
这就是为什么我在原始函数中有两个循环的原因。



不,因为声明


total = dict(freq1)将total创建为freq1的浅表副本。因此

剩下要做的就是从freq2添加物品。


问候

史蒂夫

-

Steve Holden +44 150 684 7255 +1 800 494 3119

Holden Web LLC www.holdenweb.com

PyCon TX 2006 www.python.org/pycon/


Hey guys,

I thought I''d throw this out there since everyone loves optimization
questions (it''s true, check out the number of replies to those type of
questions!)

Right now I have a function shown below. I want to combine two
dictionaries and add the values together where-ever there is overlap.
I figured this function would be reasonably fast, but I have around 1
million keys in each dictionary (~80% overlap) and it just runs all
night.

So maybe my function is not O(n)? I really figured it was but maybe
dictionary lookups are O(nlogn)?

Anyway here is the function. Any ideas on doing this task a lot
faster would be appriciated.

Thanks!

<code apology=sorry if gmail kills my tabs>

def add_freqs(freq1,freq2):
"""Add two word freq dicts"""
newfreq={}
for key,value in freq1.items():
newfreq[key]=value+freq2.get(key,0)
for key,value in freq2.items():
newfreq[key]=value+freq1.get(key,0)
return newfreq

freq1={
''dog'':1,
''cat'':2,
''human'':3
}
freq2={
''perro'':1,
''gato'':1,
''human'':2
}
print add_freqs(freq1,freq2)

answer_I_want={
''dog'':1,
''cat'':2,
''perro'':1,
''gato'':1,
''human'':5
}
</code>
--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)

解决方案

Gregory Pi?ero wrote:

def add_freqs(freq1,freq2):
****"""Add*two*word*freq*dicts"""
****newfreq={}
****for*key,value*in*freq1.items():
********newfreq[key]=value+freq2.get(key,0)
****for*key,value*in*freq2.items():
********newfreq[key]=value+freq1.get(key,0)
****return*newfreq Any*ideas*on*doing*this*task*a*lot faster would be appriciated.



With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.

Peter


Thanks Peter, those are some really good ideas. I can''t wait to try
them out tonight.

Here''s a question about your functions. if I only look at the keys in
freq2 then won''t I miss any keys that are in freq1 and not in freq2?
That''s why I have the two loops in my original function.

-Greg
On 12/14/05, Peter Otten <__*******@web.de> wrote:

Gregory Pi?ero wrote:

def add_freqs(freq1,freq2):
"""Addtwowordfreqdicts"""
newfreq={}
forkey,valueinfreq1.items():
newfreq[key]=value+freq2.get(key,0)
forkey,valueinfreq2.items():
newfreq[key]=value+freq1.get(key,0)
returnnewfreq


Anyideasondoingthistaskalot faster would be appriciated.



With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.

Peter
--
http://mail.python.org/mailman/listinfo/python-list


--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)


Gregory Pi?ero wrote:
[top-posting rearranged]

On 12/14/05, Peter Otten <__*******@web.de> wrote:

Gregory Pi?ero wrote:

def add_freqs(freq1,freq2):
"""Addtwowordfreqdicts"""
newfreq={}
forkey,valueinfreq1.items():
newfreq[key]=value+freq2.get(key,0)
forkey,valueinfreq2.items():
newfreq[key]=value+freq1.get(key,0)
returnnewfreq


Anyideasondoingthistaskalot faster would be appriciated.



With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.


Thanks Peter, those are some really good ideas. I can''t wait to try
them out tonight.

Here''s a question about your functions. if I only look at the keys in
freq2 then won''t I miss any keys that are in freq1 and not in freq2?
That''s why I have the two loops in my original function.


No, because the statement

total = dict(freq1) creates total as a shallow copy of freq1. Thus
all that remains to be done is to add the items from freq2.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/


这篇关于优化与dict.update()类似的功能,但添加了常用值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆