如何实现像v8这样的强大的哈希表 [英] How to implement a robust hash table like v8

查看:176
本文介绍了如何实现像v8这样的强大的哈希表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找学习如何实现一个哈希表

我希望它能够:

p>


  1. 有效解决冲突

  2. strong>和

  3. 大小无限制(至少在原则上,就像v8对象一样,最大为系统内存大小)。 >

在我的研究和帮助下,有很多方法可以解决散列表中的冲突。 v8的方式是二次探测





 <$ c $> /> Quadratic_probing#Algorithm_to_insert_key_in_hash_tablerel =nofollow noreferrer> wikipedia algorithm 在JavaScript中实现二次探测c> var i = 0 
var SIZE = 10000
var key = getKey(arbitraryString)
var hash = key%SIZE
if(hashtable [hash]){$ b $ (i i ++
hash =(key + i * i)%SIZE
if(!hashtable [hash])break
if(i = = SIZE)抛出new Error('Hashtable full。')
}
hashtable [hash] = key
} else {
hashtable [hash] = key
}

维基百科条目中缺少的元素包括:


  1. 如何计算散列 getKey(arbitraryString)。希望了解v8如何做到这一点(不一定完全相同,只是沿着相同的路线)。不精通C语言,它看起来像键是一个对象,并且散列是一个32位整数。不确定是否 lookup-cache.h

  2. 如何使其成为动态的,以便 SIZE 约束可以被移除。





    $ b $ V8允许你指定你自己的Shape对象在哈希表中使用:

      //哈希表类用Shape来参数化。 
    // Shape必须是具有以下接口的类:
    // class ExampleShape {
    // public:
    // //指示密钥是否与其他匹配。
    //静态布尔IsMatch(Key key,Object * other);
    // //返回键的散列值。
    // static uint32_t散列(Isolate * isolate,Key key);
    // //返回对象的散列值。
    // static uint32_t HashForObject(Isolate * isolate,Object * object);
    // //将键转换为对象。
    // static inline Handle< Object> AsHandle(Isolate * isolate,Key key);
    // //前缀大小表示后备存储开始
    // //中的元素数量。
    //静态const int kPrefixSize = ..;
    // //元素大小表示每个条目的元素数量。
    // static const int kEntrySize = ..;
    // //指示IsMatch是否可以处理其他的the_hole(a
    // //删除的条目)。
    // static const bool kNeedsHoleCheck = ..;
    //};

    但是不知道密钥是什么,以及它们如何将密钥转换为哈希,所以密钥均匀分布并且哈希函数不仅仅是一个hello-world的例子。



    问题是,如何实现像V8这样的快速哈希表,可以有效地解决冲突并且是无限的在尺寸方面。它不一定非常像V8,但具有上述功能。



    就空间效率而言,一种天真的方法可以做到 var array = new Array(10000),它会在填充之前吃掉一堆内存。不知道v8如何处理它,但如果你多次执行 var x = {} ,它不会为未使用的键分配一堆内存,不知何故动态的。



    我坚持在这里本质上:

      var m = require('node-murmurhash')

    函数HashTable(){
    this.array = new Array(10000)
    }

    HashTable。 prototype.key = function(value){
    //不知道键是否实际是this,或者
    //从.set函数计算出的最终结果
    //如果是这样,如何存储。
    return m(value)
    }

    HashTable.prototype.set = function(value){
    var key = this.key(value)
    var array = this.array
    //不知道如何摆脱这个约束。
    var SIZE = 10000
    var hash = key%SIZE
    var i = 0
    if(array [hash]){
    while(i< SIZE){
    i ++
    hash(key + i * i)%SIZE
    if(!array [hash])break $ b $ if(i == SIZE)throw new Error('Hashtable ')
    }
    array [hash] = value
    } else {
    array [hash] = value
    }
    }

    HashTable.prototype.get = function(index){
    return this.array [index]
    }


    解决方案

    这是一个非常广泛的问题,我不确定你想要什么答案。 (如何实现...?听起来像你只是想让别人为你做你的工作,请具体说明一下。)


    如何计算哈希值

    任何哈希函数都可以。我已经指出V8在你问的另一个问题中的实现;但你真的有很大的自由。


    不确定lookup-cache.h是否重要。


    不,它是无关的。


    如何使它动态化可以删除SIZE约束。


    将表的当前大小存储为变量,跟踪散列表中元素的数量,并在使用的时隙百分比超过给定阈值时增加表格(您在那里有一个时空折衷:较低的负载因素(如50%)会导致较少的冲突,但使用更多的内存,较高的因素(如80%)使用较少的内存,慢病例)。我开始时的容量是可能需要的最少条目数量的估计值,并以2倍的步长增长(例如32 - > 64 - > 128 - >等等)。


    在哪里存储最终的哈希值,

    在JavaScript中,您不能在字符串(或基本类型)上存储其他属性。你可以在边上使用一个 Map (或者object),但是如果你打算这么做,那么你可以使用 哈希表,而不用费力去实现你自己的东西。


    以及如何多次计算它。

    >

    这个很简单:再次调用你的哈希函数; - )

    lockquote

    我只想要一个函数 getUniqueString(string)


    关于这个:

      var table = new Map(); 
    var max = 0;

    函数getUniqueString(字符串){
    var unique = table.get(string);
    if(unique === undefined){
    unique =(++ max).toString();
    table.set(string,unique);
    }
    返回唯一;

    $ / code>

    对于更好的封装,您可以定义一个具有 max 作为属性。


    Looking to learn how to implement a hash table in a decent way in JavaScript.

    I would like for it to be able to:

    1. Efficiently resolve collisions,
    2. Be space efficient, and
    3. Be unbounded in size (at least in principle, like v8 objects are, up to the size of the system memory).

    From my research and help from SO, there are many ways to resolve collisions in hash tables. The way v8 does it is Quadratic probing:

    The wikipedia algorithm implementing quadratic probing in JavaScript looks something like this:

    var i = 0
    var SIZE = 10000
    var key = getKey(arbitraryString)
    var hash = key % SIZE
    if (hashtable[hash]) {
      while (i < SIZE) {
        i++
        hash = (key + i * i) % SIZE
        if (!hashtable[hash]) break
        if (i == SIZE) throw new Error('Hashtable full.')
      }
      hashtable[hash] = key
    } else {
      hashtable[hash] = key
    }
    

    The elements that are missing from the wikipedia entry are:

    1. How to compute the hash getKey(arbitraryString). Hoping to learn how v8 does this (not necessarily an exact replica, just along the same lines). Not being proficient in C it looks like the key is an object, and the hash is a 32 bit integer. Not sure if the lookup-cache.h is important.
    2. How to make it dynamic so the SIZE constraint can be removed.
    3. Where to store the final hash, and how to compute it more than once.

    V8 allows you to specify your own "Shape" object to use in the hash table:

    // The hash table class is parameterized with a Shape.
    // Shape must be a class with the following interface:
    //   class ExampleShape {
    //    public:
    //     // Tells whether key matches other.
    //     static bool IsMatch(Key key, Object* other);
    //     // Returns the hash value for key.
    //     static uint32_t Hash(Isolate* isolate, Key key);
    //     // Returns the hash value for object.
    //     static uint32_t HashForObject(Isolate* isolate, Object* object);
    //     // Convert key to an object.
    //     static inline Handle<Object> AsHandle(Isolate* isolate, Key key);
    //     // The prefix size indicates number of elements in the beginning
    //     // of the backing storage.
    //     static const int kPrefixSize = ..;
    //     // The Element size indicates number of elements per entry.
    //     static const int kEntrySize = ..;
    //     // Indicates whether IsMatch can deal with other being the_hole (a
    //     // deleted entry).
    //     static const bool kNeedsHoleCheck = ..;
    //   };
    

    But not sure what the key is and how they convert that key to the hash so keys are evenly distributed and the hash function isn't just a hello-world example.

    The question is, how to implement a quick hash table like V8 that can efficiently resolve collisions and is unbounded in size. It doesn't have to be exactly like V8 but have the features outlined above.

    In terms of space efficiency, a naive approach would do var array = new Array(10000), which would eat up a bunch of memory until it was filled out. Not sure how v8 handles it, but if you do var x = {} a bunch of times, it doesn't allocate a bunch of memory for unused keys, somehow it is dynamic.

    I'm stuck here essentially:

    var m = require('node-murmurhash')
    
    function HashTable() {
      this.array = new Array(10000)
    }
    
    HashTable.prototype.key = function(value){
      // not sure if the key is actually this, or
      // the final result computed from the .set function,
      // and if so, how to store that.
      return m(value)
    }
    
    HashTable.prototype.set = function(value){
      var key = this.key(value)
      var array = this.array
      // not sure how to get rid of this constraint.
      var SIZE = 10000
      var hash = key % SIZE
      var i = 0
      if (array[hash]) {
        while (i < SIZE) {
          i++
          hash = (key + i * i) % SIZE
          if (!array[hash]) break
          if (i == SIZE) throw new Error('Hashtable full.')
        }
        array[hash] = value
      } else {
        array[hash] = value
      }
    }
    
    HashTable.prototype.get = function(index){
      return this.array[index]
    }
    

    解决方案

    This is a very broad question, and I'm not sure what exactly you want an answer to. ("How to implement ...?" sounds like you just want someone to do your work for you. Please be more specific.)

    How to compute the hash

    Any hash function will do. I've pointed out V8's implementation in the other question you've asked; but you really have a lot of freedom here.

    Not sure if the lookup-cache.h is important.

    Nope, it's unrelated.

    How to make it dynamic so the SIZE constraint can be removed.

    Store the table's current size as a variable, keep track of the number of elements in your hash table, and grow the table when the percentage of used slots exceeds a given threshold (you have a space-time tradeoff there: lower load factors like 50% give fewer collisions but use more memory, higher factors like 80% use less memory but hit more slow cases). I'd start with a capacity that's an estimate of "minimum number of entries you'll likely need", and grow in steps of 2x (e.g. 32 -> 64 -> 128 -> etc.).

    Where to store the final hash,

    That one's difficult: in JavaScript, you can't store additional properties on strings (or primitives in general). You could use a Map (or object) on the side, but if you're going to do that anyway, then you might as well use that as the hash table, and not bother implementing your own thing on top.

    and how to compute it more than once.

    That one's easy: invoke your hashing function again ;-)

    I just want a function getUniqueString(string)

    How about this:

    var table = new Map();
    var max = 0;
    
    function getUniqueString(string) {
      var unique = table.get(string);
      if (unique === undefined) {
        unique = (++max).toString();
        table.set(string, unique);
      }
      return unique;
    }
    

    For nicer encapsulation, you could define an object that has table and max as properties.

    这篇关于如何实现像v8这样的强大的哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆