字典在C#两个哈希函数? [英] Dictionary with two hash functions in C#?
问题描述
我有条目的巨大(>> 10米)名单。每个条目提供了两个哈希函数:
- 便宜:快速计算哈希值,但它的分布是可怕的(可以把项目的99%,以1%哈希空间)
- 贵:需要花费大量的时间来计算,但分布要好很多也
这是普通的字典让我只使用其中的哈希函数。我想,首先使用廉价的散列函数,并检查相撞昂贵的一个解释。
这似乎是一个好主意,用dictionory内的字典为了这。我现在基本上都用这个畸形:
词典< INT,字典< INT,列表与LT;富>>> ;;
我改进了这一设计,使昂贵的哈希被称为只有当实际有相同便宜的两个项目哈希值。
这完全符合并做了完美的工作对我来说,但它看起来喜欢的事,应该在6500万年前去世了。
据我所知,此功能不包括在基本框架。我要写一DoubleHashedDictionary类,但我想知道你的意见首先
至于我的具体情况:结果
首先哈希函数=在一个文件系统目录中的文件的数目(快)
第二散列函数=文件(慢)
编辑的尺寸的总和:
- 改变标题和增加了更多的信息。
- 增加了相当重要的缺失细节
首先,我认为你是在正确的道路上实现自己的哈希表,如果你是描述被忠实地desired.But作为一个评论家,我想问几个问题:
你有没有使用的东西更独特的每个条目考虑的。
我假设每个条目是一个文件系统的目录信息,你有没有考虑过使用其完整路径为重点?计算机名/ IP地址前缀?
在另一方面,如果你使用的文件哈希键的数量,是这些目录永远不会改变?因为如果散列键/结果的变化,你将永远无法再找到它。
虽然关于这个主题,如果目录内容/大小是永远不会改变,你能存储价值的地方,以节省实际计算出的时间呢?
只是我的几美分。
I've got a huge (>>10m) list of entries. Each entry offers two hash functions:
- Cheap: quickly computes hash, but its distribution is terrible (may put 99% of items in 1% of hash space)
- Expensive: takes a lot of time to compute, but the distribution is a lot better also
An ordinary Dictionary lets me use only one of these hash functions. I'd like a Dictionary that uses the cheap hash function first, and checks the expensive one on collisions.
It seems like a good idea to use a dictionary inside a dictionory for this. I currently basically use this monstrosity:
Dictionary<int, Dictionary<int, List<Foo>>>;
I improved this design so the expensive hash gets called only if there are actually two items of the same cheap hash.
It fits perfectly and does a flawless job for me, but it looks like something that should have died 65 million years ago.
To my knowledge, this functionality is not included in the basic framework. I am about to write a DoubleHashedDictionary class but I wanted to know of your opinion first.
As for my specific case:
First hash function = number of files in a file system directory (fast)
Second hash function = sum of size of files (slow)
Edits:
- Changed title and added more informations.
- Added quite important missing detail
First off, I think you're on the right path to implement your own hashtable, if what you are describing is truely desired.But as a critic, I'd like to ask a few questions:
Have you considered using something more unique for each entry?
I am assuming that each entry is a file system directory information, have you considered using its full path as key? prefixing with computer name/ip address?
On the other hand, if you're using number of files as hash key, are those directories never going to change? Because if the hash key/result changes, you will never be able to find it again.
While on this topic, if the directory content/size is never going to change, can you store that value somewhere to save the time to actually calculate that?
Just my few cents.
这篇关于字典在C#两个哈希函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!