设计有效的哈希 [英] Designing an efficient hash

查看：86 发布时间：2019/6/7 12:59:05 Algorithms hash

本文介绍了设计有效的哈希的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一组节点（N）排列在一个或多个网络中。每个网络都有一个或多个根节点。每个节点都有一个32位整数的唯一ID（UID）。如果有100,000个节点是一个大问题 - 1,000,000几乎是不可能的巨大。我需要检查是否所有节点都连接到至少一个根节点，我正在尝试设计一个哈希函数来帮助我。

我当前的代码从每个根节点开始并跟踪所有路径，记录每个连接的节点。它没有散列就可以做到这一点，因为之前我可以依赖UID相对较小且大部分是连续的。除了起始偏移量，它只是直接使用UID索引到位图数组。将位图与N的成员进行比较以确定哪些节点未连接。

然而，现在，我遇到了存在较大差距的情况UIDS（例如，从100,000跳到100,000,000） - 这使得简单地增加位图的大小越来越不切实际，因此需要散列。在一般情况下，如果用户没有完全搞砸，我希望未连接节点的数量少于连接节点的数量（但不一定）。

例如，我如何选择应该使用多少桶进行散列以平衡内存使用与冲突可能性？可以（这应该吗？）动态完成吗？（也就是说，我应该根据N的大小动态分配我的桶阵列吗？）通过将所有节点UID散列到桶中，然后在发现它们时清除连接的哈希值，是否更有效率？我怎样才能确定处理碰撞的最佳方法？是否有可能动态创建一个完美的哈希？

我尝试过：

没有比谷歌搜索更多了。我不是计算机科学家，所以这对我来说有点新鲜。我发现大量的在线信息有点难以消化 - 因此对专家有吸引力。

I have a set of nodes (N) arranged into one or more networks. Each network has one or more root nodes. Each node has a unique ID (UID) that is a 32-bit integer. If there were 100,000 nodes that would be a large problem - 1,000,000 would be almost infeasibly gigantic. I need to check if all nodes are connected to at least one root-node, and I'm trying to design a hash function to help me.

My current code starts at each root node and traces all paths, recording each connected node. It does this without hashing, since previously I could rely on the UIDs being relative small and mostly contiguous. Along with a starting offset, it just uses the UID directly to index into a bitmap array. The bitmap is the compared against the members of N to determine which nodes are not connected.

Now, however, I'm coming across situations where there are large gaps in the UIDS (e.g., jumping from 100,000 to 100,000,000) - this make simply increasing the size of the bitmap more and more impractical, hence the need for a hash. In the general case, provided the user hasn't completely messed up, I would expect the number of unconnected nodes to be less than the number of connected ones (but not necessarily).

How do I, for instance, pick how many buckets I should use for hashing to balance memory use against likelihood of collisions? Can this (should this?) be done dynamically? (i.e., should I dynamically allocate my bucket array based on the size of N?) Is it more efficient to start by hashing all my node UIDs into buckets, then clearing the 'connected' hashes as they are discovered? How can I can I determine the best way to handle collisions? Is it possible to create a perfect hash dynamically?

What I have tried:

Nothing much more than Googling. I'm not a computer scientist, so this is a bit new to me. I'm finding the masses of online information a bit hard to digest - hence the appeal to the experts.

设计有效的哈希 [英] Designing an efficient hash

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

设计有效的哈希 [英] Designing an efficient hash

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭