优化最恶劣案例时间复杂度O(1)对于python dicts [英] Optimizing Worst Case Time complexity to O(1) for python dicts

查看:232
本文介绍了优化最恶劣案例时间复杂度O(1)对于python dicts的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在内存(RAM)中存储500M的两位数unicode字符。



我使用的数据结构应该有:

 最差的案例空间复杂性:O(n)
最差情况时间复杂度:O(1)< - 插入,读取,更新,删除

我正在考虑选择在python中实现哈希的dict,但是问题是仅在平均情况下才能确保O(1)的时间复杂度,而不是最差的情况。我听说如果条目数已知,O(1)的时间复杂度可以在最坏的情况下实现。



如何做?



如果在python中无法访问内存地址和数据直接在我的python代码?如果是,那么如何?

解决方案

大多数情况下,性能匹配(通常是在碰撞中)是在所有呼叫中进行摊销。所以对于最实际的使用,你不会得到每个通话的 O(n)。事实上,在每一次呼叫中,您将会遇到的每个密钥哈希与现有密钥哈希值相冲突的病毒情况是唯一的情况,即每个呼叫都会导致 O(n)哈希表中最糟糕的(或最不幸的)使用方式)。



如果你是事先知道你的钥匙,你知道他们不会有哈希碰撞(即所有的哈希都是独一无二的),那么你不会遇到碰撞的情况。另一个主要的 O(n)操作是哈希表调整大小,但是这个频率取决于实现(扩展因子/散列函数/冲突解决方案等),它也可以根据输入设置变化运行。



在任一种情况下,如果您可以使用所有键预先填充dict,则可以避免突然的运行时减速。这些值可以设置为None,并在之后填充其真实值。这应该导致唯一明显的性能命中,最初使用键启动dict,未来​​的值插入应该是恒定的时间。



一个完全不同的问题是你是如何打算读/查结构?您需要附加单独的值并通过键访问它们吗?应该下单吗?可能一个集合可能比 dict 更合适,因为您不需要 key:value 映射。



更新:



根据您在评论中的描述,听起来更像一个数据库的工作,即使你正在使用一个临时集。您可以使用内存关系数据库(例如使用SQLite)。此外,您可以使用像SQLAlchemy这样的ORM,更加愚蠢地与数据库进行交互,而无需编写SQL。



甚至听起来你可能正在读取数据数据库开始,所以也许你可以进一步利用这一点?



存储/查询/更新唯一键入的大量类型记录正是RDBMS专业化为数十年的发展和研究。使用内存中版本的预先存在的关系数据库(如SQLite)可能是一个更务实和可持续的选择。



尝试使用python内置的 sqlite3 模块,并通过提供:memory:作为数据库文件路径,尝试内存中版本在建设中:

  con = sqlite3.connect(:memory:)


I have to store 500M two digit unicode character in memory (RAM).

The data structure I use should have:

Worst Case Space Complexity: O(n)
Worst Case Time Complexity: O(1) <-- insertion, read, update, deletion

I was thinking of choosing dict which is implementation of hash in python but then problem is it assures time complexity of O(1) for the required operations only in average cases than worst case.

I heard that if number of entries is known, time complexity of O(1) can be achieved in worst case scenarios.

How todo that?

In case, thats not possible in python can I access memory addresses and the data onto it directly in my python code? If yes, then how?

解决方案

Mostly the performance hits (usually taken on a collision) are amortized over all calls. So for most realistic use, you wont get O(n) for every call. In fact, the only case where you would incur an O(n) hit on every call is in the pathological case where every key's hash collides with an existing key's hash value (i.e. the worst possible (or most unfortunate) usage of a hashtable).

If for example, you know your set of keys beforehand, and you know that they will not have hash collisions (i.e. all their hashes are unique), then you will not suffer collision cases. The other major O(n) operation is hashtable resizing, but the frequency of this depends on the implementation (expanse factor/hash function/collision resolution scheme, etc.) and it will also vary run-to-run depending on the input set.

In either case you can avoid sudden runtime slowdowns if you can pre-populate the dict with all keys. the values can just be set to None, and populated with their real values later on. This should cause the only noticeable performance hit when "priming" the dict with keys initially, and future value insertion should be constant time.

A completely different question is how you are intending to read/query the structure? do you need to attach separate values and have access to them via a key? should it be ordered? perhaps a set might be more appropriate than a dict, since you do not really require a key:value mapping.

Update:

Based on your description in the comments, this is starting to sound more like a job for a database to do, even if you are working with a temporary set. You could use an in-memory relational database (e.g. with SQLite). Additionally, you could use an ORM like SQLAlchemy to interact with the database more pythonically and without having to write SQL.

It even sounds like you might be reading the data from a database to start with, so maybe you can leverage that further?

Storing/querying/updating a massive number of typed records that are keyed uniquely is exactly what RDBMS's have been specialised for with decades of development and research. Using an in-memory version of a pre-existing relational database (such as SQLite's) will probably be a more pragmatic and sustainable choice.

Try using python's built in sqlite3 module and try the in-memory version by providing ":memory:" as the db file path on construction:

con = sqlite3.connect(":memory:")

这篇关于优化最恶劣案例时间复杂度O(1)对于python dicts的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆