字典在C#两个哈希函数? [英] Dictionary with two hash functions in C#?

查看:149
本文介绍了字典在C#两个哈希函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有条目的巨大(>> 10米)名单。每个条目提供了两个哈希函数:




  • 便宜:快速计算哈希值,但它的分布是可怕的(可以把项目的99%,以1%哈希空间)

  • 贵:需要花费大量的时间来计算,但分布要好很多也



这是普通的字典让我只使用其中的哈希函数。我想,首先使用廉价的散列函数,并检查相撞昂贵的一个解释。



这似乎是一个好主意,用dictionory内的字典为了这。我现在基本上都用这个畸形:

 词典< INT,字典< INT,列表与LT;富>>> ;; 



我改进了这一设计,使昂贵的哈希被称为只有当实际有相同便宜的两个项目哈希值。



这完全符合并做了完美的工作对我来说,但它看起来喜欢的事,应该在6500万年前去世了。



据我所知,此功能不包括在基本框架。我要写一DoubleHashedDictionary类,但我想知道你的意见首先



至于我的具体情况:结果
首先哈希函数=在一个文件系统目录中的文件的数目(快)
第二散列函数=文件(慢)



编辑的尺寸的总和:




  • 改变标题和增加了更多的信息。

  • 增加了相当重要的缺失细节


解决方案

首先,我认为你是在正确的道路上实现自己的哈希表,如果你是描述被忠实地desired.But作为一个评论家,我想问几个问题:



你有没有使用的东西更独特的每个条目考虑的。

我假设每个条目是一个文件系统的目录信息,你有没有考虑过使用其完整路径为重点?计算机名/ IP地址前缀?



在另一方面,如果你使用的文件哈希键的数量,是这些目录永远不会改变?因为如果散列键/结果的变化,你将永远无法再找到它。



虽然关于这个主题,如果目录内容/大小是永远不会改变,你能存储价值的地方,以节省实际计算出的时间呢?



只是我的几美分。


I've got a huge (>>10m) list of entries. Each entry offers two hash functions:

  • Cheap: quickly computes hash, but its distribution is terrible (may put 99% of items in 1% of hash space)
  • Expensive: takes a lot of time to compute, but the distribution is a lot better also

An ordinary Dictionary lets me use only one of these hash functions. I'd like a Dictionary that uses the cheap hash function first, and checks the expensive one on collisions.

It seems like a good idea to use a dictionary inside a dictionory for this. I currently basically use this monstrosity:

Dictionary<int, Dictionary<int, List<Foo>>>;

I improved this design so the expensive hash gets called only if there are actually two items of the same cheap hash.

It fits perfectly and does a flawless job for me, but it looks like something that should have died 65 million years ago.

To my knowledge, this functionality is not included in the basic framework. I am about to write a DoubleHashedDictionary class but I wanted to know of your opinion first.

As for my specific case:
First hash function = number of files in a file system directory (fast) Second hash function = sum of size of files (slow)

Edits:

  • Changed title and added more informations.
  • Added quite important missing detail

解决方案

First off, I think you're on the right path to implement your own hashtable, if what you are describing is truely desired.But as a critic, I'd like to ask a few questions:

Have you considered using something more unique for each entry?

I am assuming that each entry is a file system directory information, have you considered using its full path as key? prefixing with computer name/ip address?

On the other hand, if you're using number of files as hash key, are those directories never going to change? Because if the hash key/result changes, you will never be able to find it again.

While on this topic, if the directory content/size is never going to change, can you store that value somewhere to save the time to actually calculate that?

Just my few cents.

这篇关于字典在C#两个哈希函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆