不需要字典的密钥 - 哈希表? [英] don't need dictionary's keys - hash table?
问题描述
您好,
我正在使用一些非常大的词典,其长键是
(urls)。对于一个大字典,这些键开始占用大量的内存。我不需要访问这些键 - 我
只需要能够检索与某个
键相关的值,所以我不想拥有密钥存储在内存中。我能不能首先使用
hash()url字符串并使用生成的整数作为键?
我认为我在这之后更像是传统哈希表。如果我这样做b $ b这样做,我会得到我追求的记忆力吗?
哈希函数总是生成唯一键吗?另外,同样的
技术是否适用于一套?
我们非常感谢任何其他想法或考虑因素。
谢谢。
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated with a certain
key, so I do not want to have the keys stored in memory. Could I just
hash() the url strings first and use the resulting integer as the key?
I think what I''m after here is more like a tradition hash table. If I
do it this way am I going to get the memory savings I am after? Will
the hash function always generate unique keys? Also, would the same
technique work for a set?
Any other thoughts or considerations are appreciated.
Thank You.
推荐答案
kdot ... @ gmail.com写道:
kdot...@gmail.com wrote:
您好,
我正在使用一些非常大的字典,其长键是长串
(网址)。对于一个大字典,这些键开始占用大量的内存。我不需要访问这些键 - 我
只需要能够检索与某个
键相关的值,所以我不想拥有密钥存储在内存中。我能不能首先使用
hash()url字符串并使用生成的整数作为键?
我认为我在这之后更像是传统哈希表。如果我这样做b $ b这样做,我会得到我追求的记忆力吗?
哈希函数总是生成唯一键吗?另外,同样的
技术是否适用于一套?
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated with a certain
key, so I do not want to have the keys stored in memory. Could I just
hash() the url strings first and use the resulting integer as the key?
I think what I''m after here is more like a tradition hash table. If I
do it this way am I going to get the memory savings I am after? Will
the hash function always generate unique keys? Also, would the same
technique work for a set?
我刚才意识到哈希并不总是
唯一,所以这不会真的有用。并且看起来哈希表还是需要存储键(作为字符串),以便在发生冲突时可以完成字符串比较
。我猜有没有避免
存储它们的钥匙?
I just realized that of course the hash is not always going to be
unique, so this wouldn''t really work. And it seems a hash table would
still need to store the keys (as strings) so that string comparisons
can be done when a collision occurs. I guess there''s no avoiding
storing they keys?
kd ***** @ gmail.com 写道:
kd*****@gmail.com wrote:
哈希函数是否始终生成唯一键?
Will the hash function always generate unique keys?
no。 hash()是为字典(哈希表)设计的,不能用作
加密哈希值。
取决于你的应用程序,一个布隆过滤器可能是足够好:
http://en.wikipedia。 org / wiki / Bloom_filter
(参见Python实现的链接部分)
< / F>
no. hash() is designed for dictionaries (hash tables), not for use as a
cryptographic hash.
depending on your application, a bloom filter might be a good enough:
http://en.wikipedia.org/wiki/Bloom_filter
(see the links section for a Python implementation)
</F>
kd*****@gmail.com 写道:
kd*****@gmail.com wrote:
我刚才意识到哈希并不总是唯一的,因为这不是真的有效。并且看起来哈希表还是需要存储键(作为字符串),以便在发生冲突时可以完成字符串比较
。
I just realized that of course the hash is not always going to be
unique, so this wouldn''t really work. And it seems a hash table would
still need to store the keys (as strings) so that string comparisons
can be done when a collision occurs.
顺便说一句,Python的字典类型*是* b $ ba传统哈希表的高度优化实现。 />
< / F>
btw, Python''s dictionary type *is* a highly-optimized implementation of
a "traditional hash table".
</F>
这篇关于不需要字典的密钥 - 哈希表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!