从哈希表中删除值的成本是多少? [英] What is the cost of deleting a value from a hashtable?

查看:77
本文介绍了从哈希表中删除值的成本是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我有一个问题,问我在插入过程中使用线性探测时从哈希表中删除值的代价.

通过阅读互联网上的各种内容,我可以发现,它必须与负载因子有关.虽然我不确定,但是我读到了负载系数和所需的探头数之间的关系,它是探头数= 1/(1-LF).

因此,我认为成本必须取决于探针序列.但是随后又有一个想法毁了一切.

如果在p探针中插入了该元素,现在我试图删除该元素,该怎么办.但是在此之前,我已经删除了几个具有相同哈希码的元素,并且它们是插入小于p的探针中的一部分.

在这种情况下,我到达了一个哈希表中的插槽为空的阶段,但是我不确定要删除的元素是否已被删除或由于探测而位于其他位置. /p>

我还发现,一旦删除了元素,就必须用一些特殊的指示器标记​​该插槽以告知其可用,但这并不能解决我不确定要删除的元素的问题.

在这种情况下,有人可以建议如何找到费用吗? 如果是非线性探测,方法是否会有所不同?

解决方案

标准方法是查找元素,将其标记为已删除".标记显然具有O(1)成本,因此总操作成本与查找相同:O(1) expected .在简并的情况下(例如,所有元素具有相同的哈希值),它可能高达O( n ).理论上我们只能说O(1).

关于负载系数.负载系数越高(占用的桶数与总数之比),期望的系数就越大(但这不会改变理论上的O成本).请注意,在这种情况下,负载系数包括表元素中同时存在的两个数量以及之前标记为已删除的存储桶的数量.

其他探测类型(例如二次)不会改变理论成本,但可能会更改预期的恒定因子或其方差.如果您查看后备"序列,则按线性顺序将不同存储桶的序列重叠.这意味着如果某个存储桶的序列很长,相邻存储桶的序列也将很长.例如:如果存储桶4到10被占用,则存储桶#4的序列为7个存储桶长(4、5、6,...,10),对于#5,存储序列为6,依此类推.对于二次探测,情况并非如此.

但是,由于您检查彼此靠近的存储单元,因此线性探测具有更好的存储高速缓存行为的好处.但是,实际上,对于二次探测,后备序列很少足够长,以至于无济于事.

最后,在线性探测的情况下,可以在没有删除标记的情况下工作,但是为此,您必须使删除过程变得相当复杂(尽管预期仍为O(1),但是更高的常数因子).是否值得,必须通过实际分析来确定;例如,这简化了插入和查找的过程.对于C ++实现,这样做的缺点是,delete()会使迭代器无效.

Now I have this question where I was asked the cost of deleting a value from a hash table when we used linear probing while the insertion process.

What I could figure out from reading various stuff on the internet is that it has to do something with the load factor. Though I am not sure, but I read a relation between the load factor and no of probes required and it is No of probes = 1 / (1-LF).

So I believe the cost has to be dependent on the probe sequence. But then another thought ruins everything.

What if the element was inserted in p probes and now I am trying to delete this element. But before this I had already deleted few elements having the same hash code and were a part of insertion in probes less than p.

In this case I reach to a stage where I see a slot empty in the hash table but I am not sure if the element I am trying to delete is already deleted or is at some other location as a result of probing.

I also found that once I delete an element I must mark this slot with some special indicator to inform that it is available, but this doesn't solve my problem of being uncertain about the element which I am willing to delete.

Could anyone please suggest how to find the cost in such cases? Is the approach going to vary if it is non-linear probing?

解决方案

The standard approach is "lookup the element, mark as deleted". Marking obviously has O(1) cost, so the total operation cost is the same as just lookup: O(1) expected. It can be as high as O(n) in degenerate cases (e.g. all elements have the same hash). O(1) expected is all we can say theoretically.

About the load factor. The higher the load factor (ratio of number of occupied buckets to the total number), the larger is the expected factor (but this doesn't change the theoretical O cost). Note that in this case load factor includes number of both present in the table elements plus the number of buckets that got marked as deleted previously.

Other probing kinds (e.g. quadratic) don't change the theoretical cost, but may alter the expected constant factor or its variance. If you look at "fallback" sequences, in linear ordering the sequences of different buckets overlap. This means that if for some bucket the sequence is long, for adjacent buckets it will also be long. E.g.: if buckets 4 to 10 are occupied, sequence for bucket #4 is 7 bucket long (4, 5, 6, ..., 10), for #5 it's 6 and so on. For quadratic probing this is not the case.

However, linear probing has the benefit of better memory-cache behavior, since you check memory cells close to each other. In practice, though, for quadratic probing fallback sequences are rarely long enough for this to matter.

Finally, in linear probing case, it is possible to work without deleted mark, but for this you'd have to complicate deleting procedure considerably (still O(1) expected, though, but with much higher constant factor). Whether it is worth it has to be decided with actual profiling; for example, this simplifies inserting somewhat and lookup a bit. For a C++ implementation this would have the downside that erase() would invalidate iterators, though.

这篇关于从哈希表中删除值的成本是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆