有什么比 dict() 更快的吗? [英] Is there anything faster than dict()?

查看:35
本文介绍了有什么比 dict() 更快的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一种更快的方式来存储和访问大约 3GB 的 k:v 对.其中 k 是一个字符串或整数,v 是一个 np.array() ,它可以是不同的形状.

I need a faster way to store and access around 3GB of k:v pairs. Where k is a string or an integer and v is an np.array() that can be of different shapes.

是否有任何对象在存储和访问这样的表时比标准 python dict 更快?例如,一个 pandas.DataFrame?

Is there any object that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame?

据我所知,python dict 是一个哈希表的快速实现.对于我的具体情况,还有什么比这更好的吗?

As far I have understood, python dict is a quite fast implementation of a hashtable. Is there anything better than that for my specific case?

推荐答案

不,没有什么比字典来完成这项任务更快了,这是因为它的索引(获取和设置项)甚至成员资格检查的复杂性都是 O(1)平均.(检查 Python 文档中其余功能的复杂性 https://wiki.python.org/moin/时间复杂度 )

No, there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing (getting and setting item) and even membership checking is O(1) in average. (check the complexity of the rest of functionalities on Python doc https://wiki.python.org/moin/TimeComplexity )

一旦您将项目保存在字典中,您就可以在恒定时间内访问它们,这意味着您的性能问题不太可能与字典索引有关.话虽如此,您仍然可以通过对对象及其类型进行一些更改来稍微加快此过程,这可能会导致在底层操作中进行一些优化.

Once you saved your items in a dictionary, you can have access to them in constant time which means that it's unlikely for your performance problem to have anything to do with dictionary indexing. That being said, you still might be able to make this process slightly faster by making some changes in your objects and their types that may result in some optimizations at under the hood operations.

例如如果您的字符串(键)不是很大,您可以实习查找键和字典的键.实习是在内存中缓存对象——或者像在 Python 中一样,实习"表字符串——而不是将它们创建为一个单独的对象.

e.g. If your strings (keys) are not very large you can intern the lookup key and your dictionary's keys. Interning is caching the objects in memory --or as in Python, table of "interned" strings-- rather than creating them as a separate object.

Python 提供了一个 intern() sys 模块中的函数,您可以为此使用它.

Python has provided an intern() function within the sys module that you can use for this.

在interned"字符串表中输入string并返回interned字符串——它是字符串本身或副本.实习字符串对于提高字典查找...

Enter string in the table of "interned" strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

还有……

如果字典中的键是实习的并且查找键是实习的,则可以通过指针比较而不是比较字符串值本身来完成键比较(散列后),从而减少对对象的访问时间.

If the keys in a dictionary are interned and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer comparison instead of comparing the string values themselves which in consequence reduces the access time to the object.

这是一个例子:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop

这篇关于有什么比 dict() 更快的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆