有什么比dict()更快吗? [英] Is there anything faster than dict()?

查看:123
本文介绍了有什么比dict()更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个更快的方式存储和访问大约3GB的 k:v 对。其中 k 是一个字符串整数 v 是一个 np.array(),可以有不同的形状。
有没有任何对象,比存储和访问这样一个表的标准python dict更快?例如,一个pandas.DataFrame

I need a faster way to store and access around 3GB of k:v pairs. Where k is a string or an integer and v is an np.array() that can be of different shapes. Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame?

到目前为止,我已经明白python dict是一个非常快的实现哈希表,有没有比我的具体情况更好?

As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?

推荐答案

没有比这个任务的字典更快,因为其索引和成员资格检查的复杂性大约为O(1)。

No there is nothing faster than a dictionary for this task, since the complexity of its indexing and even membership checking is approximately O(1).

将项目保存在字典中后,您可以在一段时间内访问它们。也就是说,问题不在于索引过程。但是,您可能会通过对对象及其类型进行一些更改来使过程稍快一些。这可能会在引擎盖的操作下引起一些优化。例如,如果你的字符串(键)不是很大,你可以实习他们,以便在内存中兑现,而不是被创建为一个对象。如果字典中的键被实体化,并且查找键被实体化,那么密钥比较(散列之后)可以通过指针比较而不是字符串比较来完成。这使得访问对象非常快。 Python提供了一个 intern() 函数在 sys 模块中,您可以使用它来实现此目的。

Once you saved your items in a dictionary, you can access them in a constant time. That said, the problem is not the indexing process. But you might be able to make the process slightly faster by doing some changes in your objects and their types. This might cause some optimizations in under the hood's operations. For example, if your strings (keys) are not very large you can intern them, in order to be cashed in memory rather than being created as an object. If the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. That makes the access to object very faster. Python has provided an intern() function within sys module that you can use it for this aim.


在interned字符串的表格中输入字符串,并返回interned字符串 - 这是字符串本身或副本。在字典查找

Enter string in the table of "interned" strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

下面是一个例子:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop

这篇关于有什么比dict()更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆