为什么dict查询总是比列表查询更好? [英] Why are dict lookups always better than list lookups?

查看:127
本文介绍了为什么dict查询总是比列表查询更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当时使用字典作为查找表,但是我开始怀疑列表是否对我的应用程序更好-查找表中的条目数量并不大.我知道列表在后台使用C数组,这使我得出结论,在只有几个项目的列表中查找比在字典中查找更好(访问数组中的一些元素比计算哈希更快).

I was using a dictionary as a lookup table but I started to wonder if a list would be better for my application -- the amount of entries in my lookup table wasn't that big. I know lists use C arrays under the hood which made me conclude that lookup in a list with just a few items would be better than in a dictionary (accessing a few elements in an array is faster than computing a hash).

我决定介绍其他方法,但结果令我感到惊讶.仅使用一个元素就可以更好地进行列表查找!参见下图(对数-对数图):

I decided to profile the alternatives but the results surprised me. List lookup was only better with a single element! See the following figure (log-log plot):

因此出现了一个问题: 为什么列表查找的性能如此差? 我缺少什么?

So here comes the question: Why do list lookups perform so poorly? What am I missing?

在另一个问题上,引起我注意的其他事情是,在大约1000个条目之后,字典查找时间有点不连续".我只画了字典查找时间来显示它.

On a side question, something else that called my attention was a little "discontinuity" in the dict lookup time after approximately 1000 entries. I plotted the dict lookup time alone to show it.

ps1我知道数组和哈希表的O(n)与O(1)摊销时间,但是通常情况下,对于少数元素,在数组上进行迭代比使用更好哈希表.

p.s.2这是我用来比较字典和列表查找时间的代码:

import timeit

lengths = [2 ** i for i in xrange(15)]

list_time = []
dict_time = []
for l in lengths:
    list_time.append(timeit.timeit('%i in d' % (l/2), 'd=range(%i)' % l))
    dict_time.append(timeit.timeit('%i in d' % (l/2),
                                   'd=dict.fromkeys(range(%i))' % l))
    print l, list_time[-1], dict_time[-1]

p.s.3使用Python 2.7.13

推荐答案

我知道列表在幕后使用C数组,这使我得出结论,在只有几个项目的列表中查找比在字典中查找更好(访问数组中的几个元素比计算哈希更快). /p>

I know lists use C arrays under the hood which made me conclude that lookup in a list with just a few items would be better than in a dictionary (accessing a few elements in an array is faster than computing a hash).

当然,访问一些数组元素很便宜,但是在Python中计算==令人惊讶地是重量级的.看到第二张图中的峰值吗?那就是在那里计算两个整数的==的成本.

Accessing a few array elements is cheap, sure, but computing == is surprisingly heavyweight in Python. See that spike in your second graph? That's the cost of computing == for two ints right there.

您的列表查找所需要的计算==比您的字典查找要多得多.

Your list lookups need to compute == a lot more than your dict lookups do.

同时,对于许多对象而言,计算散列可能是一项相当繁重的操作,但对于此处涉及的所有int而言,它们只是对自身进行散列. (-1将散列为-2,大整数(技术上为long s)将散列为较小的整数,但这不适用于此处.)

Meanwhile, computing hashes might be a pretty heavyweight operation for a lot of objects, but for all ints involved here, they just hash to themselves. (-1 would hash to -2, and large integers (technically longs) would hash to smaller integers, but that doesn't apply here.)

在Python中,字典查找并不是真的那么糟糕,尤其是当您的键只是一个连续的整数范围时.这里的所有int都会散列在一起,并且Python使用自定义的开放寻址方案而不是链接,因此您所有的键在内存中的邻接度几乎都与使用列表相同(也就是说,指向键的指针结尾)在PyDictEntry的连续范围内).查找过程很快速,在您的测试用例中,它总是会在第一个探针上按右键.

Dict lookup isn't really that bad in Python, especially when your keys are just a consecutive range of ints. All ints here hash to themselves, and Python uses a custom open addressing scheme instead of chaining, so all your keys end up nearly as contiguous in memory as if you'd used a list (which is to say, the pointers to the keys end up in a contiguous range of PyDictEntrys). The lookup procedure is fast, and in your test cases, it always hits the right key on the first probe.

好的,回到图2中的峰值.第二张图中1024个条目的查找时间峰值是因为对于所有较小的尺寸,您要查找的整数都< = 256,因此它们都落了在CPython的小整数缓存范围内. Python的参考实现为-5到256(含)之间的所有整数保留规范的整数对象.对于这些整数,Python能够使用快速指针比较来避免经历计算==的(令人惊讶的重量级)过程.对于更大的整数,in的参数不再是与字典中匹配整数的对象相同的对象,Python必须经历整个==过程.

Okay, back to the spike in graph 2. The spike in the lookup times at 1024 entries in the second graph is because for all smaller sizes, the integers you were looking for were all <= 256, so they all fell within the range of CPython's small integer cache. The reference implementation of Python keeps canonical integer objects for all integers from -5 to 256, inclusive. For these integers, Python was able to use a quick pointer comparison to avoid going through the (surprisingly heavyweight) process of computing ==. For larger integers, the argument to in was no longer the same object as the matching integer in the dict, and Python had to go through the whole == process.

这篇关于为什么dict查询总是比列表查询更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆