从给定的列表快速查找字典中的所有键 [英] Finding all keys in a dictionary from a given list QUICKLY

查看:125
本文介绍了从给定的列表快速查找字典中的所有键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(可能相当大的)字典和一个'可能的'键列表。我想快速找到哪些键在字典中具有匹配的值。我发现很多关于单个字典值的讨论这里这里,但没有讨论速度或多个条目。



我已经提出了四种方式,对于最有效的三种方法,我比较不同样本大小的速度下面 - 有更好的方法吗?如果人们可以建议合理的竞争者,我将会对他们进行分析。



样例列表和词典创建如下:

 导入cProfile 
从随机导入randint

长度= 100000

listOfRandomInts = [randint 0,length * length / 10-1)for x in range(length)]
dictionaryOfRandomInts = {randint(0,length * length / 10-1):在这里for x in range(length)}

 



方法1:'中的关键字:

  def way1(theList,theDict):
resultList = []
列表中的listItem:
如果listItem在theDict中:
resultsList.append(theDict [listItem])
返回resultList

cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')

32个函数调用在0.018秒



 



方法2:错误处理:

  def way2(theList,theDict):
resultsList = []
列表中的listItem:
try:
resultsList.append(theDict [listItem])
除了:
;
return resultsList

cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')

在0.087秒内执行32个函数调用



 



方法3:设置交集:

  def way3(theList,theDict):
返回列表(set(theList) .intersection(set(theDict.keys())))

cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')

26秒函数调用0.046秒





方法4:原生使用 dict.keys()



这是一个警告性的故事 - 这是我的第一次尝试,并且是最慢的!

  def way4(theList ,theDict):
resultsList = []
keys = theDict.keys()
在列表中的listItem:
如果key中的listItem:
resultsList.append(theDict [listItem])
返回resultList

cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')

248.552秒的12个功能调用





编辑:答案与我用于一致性的框架相同。许多人注意到,在Python 3.x中可以实现更多的性能提升,特别是基于列表推导的方法。非常感谢所有的帮助!



方法5:更好的执行交叉的方式(感谢jonrsharpe):


  def way5(theList,theDict):
return = list(set(theList).intersection(theDict))

0.037秒内的25个函数调用



 



方法6:列表理解(感谢jonrsharpe):

 code> def way6(theList,theDict):
return [item中的item的项目如果item中的项目]

0.020秒内的24个函数调用



 



方法7:使用& 关键字(感谢jonrsharpe):

  def way7(theList,theDict):
返回列表(theDict.viewkeys()& theList)

25秒函数调用0.026秒



对于方法1-3和5-7,我用lis长度为1000,10000,100000,1000000,10000000和100000000的t /字典,并显示所用时间的日志对数图。在所有长度上,交点和语句方法表现更好。梯度大约为1(可能稍高一点),表示O(n)或略微超线性缩放。



解决方案

我尝试了几种其他方法,最快的是一个简单的列表理解:

  def way6(theList,theDict):
return [item中的item的项目如果item中的项目]

这个运行方式与您最快的方法相同, way1 ,但更快。为了比较,最快的设置的方式是

  def way5 theList,theDict):
返回列表(set(theList).intersection(theDict))

timeit 结果:

 >>> import timeit 
>>>从__main__导入方式1中的setup =导入方式1,way5,way6
从随机导入randint
length = 100000
listOfRandomInts = [randint(0,length * length / 10-1)范围(长度)]
dictionaryOfRandomInts = {randint(0,length * length / 10-1):范围(长度)中x的这里)

> >> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
14.550477756582723
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
19.597916393388232
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
13.652289059326904






添加了@ abarnert的建议:

  def way7(theList ,theDict):
返回列表(theDict.viewkeys()& theList)

和重新运行我现在得到的时间:

 >>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
13.110055883138497
>>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
17.292466681101036
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
14.351759544463917
>>>> timeit.timeit('way7(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
17.206370930653392

way1 way6 已切换位置,所以我重新运行:

 >>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
13.648176054011941
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)',setup = setup,number = 1000)
13.847062579316628

所以看起来设置的方法比列表慢,但列表和列表的理解之间的差异(令人惊讶的是,至少对我来说)是一个有点变量。我会选择一个,而不用担心,除非它成为一个真正的瓶颈。


I have a (potentially quite big) dictionary and a list of 'possible' keys. I want to quickly find which of the keys have matching values in the dictionary. I've found lots of discussion of single dictionary values here and here, but no discussion of speed or multiple entries.

I've come up with four ways, and for the three that work best I compare their speed on different sample sizes below - are there better methods? If people can suggest sensible contenders I'll subject them to the analysis below as well.

Sample lists and dictionaries are created as follows:

import cProfile
from random import randint

length = 100000

listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}

 

Method 1: the 'in' keyword:

def way1(theList,theDict):
    resultsList = []
    for listItem in theList:
        if listItem in theDict:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.018 seconds

 

Method 2: error handling:

def way2(theList,theDict):
    resultsList = []
    for listItem in theList:
        try:
            resultsList.append(theDict[listItem])
        except:
            ;
    return resultsList

cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.087 seconds

 

Method 3: set intersection:

def way3(theList,theDict):
    return list(set(theList).intersection(set(theDict.keys())))

cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')

26 function calls in 0.046 seconds

 

Method 4: Naive use of dict.keys():

This is a cautionary tale - it was my first attempt and BY FAR the slowest!

def way4(theList,theDict):
    resultsList = []
    keys = theDict.keys()
    for listItem in theList:
        if listItem in keys:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')

12 function calls in 248.552 seconds

 

EDIT: Bringing the suggestions given in the answers into the same framework that I've used for consistency. Many have noted that more performance gains can be achieved in Python 3.x, particularly list comprehension-based methods. Many thanks for all of the help!

Method 5: Better way of performing intersection (thanks jonrsharpe):

def way5(theList, theDict):
    return = list(set(theList).intersection(theDict))

25 function calls in 0.037 seconds

 

Method 6: List comprehension (thanks jonrsharpe):

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

24 function calls in 0.020 seconds

 

Method 7: Using the & keyword (thanks jonrsharpe):

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

25 function calls in 0.026 seconds

For methods 1-3 and 5-7 I timed them as above with list/dictionaries of length 1000, 10000, 100000, 1000000, 10000000 and 100000000 and show a log-log plot of time taken. Across all lengths the intersection and in-statement method perform better. The gradients are all about 1 (maybe a bit higher), indicating O(n) or perhaps slightly super-linear scaling.

解决方案

Of a couple of additional methods I've tried, the fastest was a simple list comprehension:

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

This runs the same process as your fastest approach, way1, but more quickly. For comparison, the quickest set-based way was

def way5(theList, theDict):
    return list(set(theList).intersection(theDict))

timeit results:

>>> import timeit
>>> setup = """from __main__ import way1, way5, way6
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
"""
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.550477756582723
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
19.597916393388232
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.652289059326904


Having added @abarnert's suggestion:

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

and re-run the timing I now get:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.110055883138497
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.292466681101036
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.351759544463917
>>> timeit.timeit('way7(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.206370930653392

way1 and way6 have switched places, so I re-ran again:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.648176054011941
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.847062579316628

So it looks like the set approach is slower than the list, but the difference between the list and list comprehension is (surprisingly, to me at least) is a bit variable. I'd say just pick one, and not worry about it unless it becomes a real bottleneck later.

这篇关于从给定的列表快速查找字典中的所有键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆