最佳HashSet初始化(Scala | Java) [英] Optimal HashSet Initialization (Scala | Java)

查看:389
本文介绍了最佳HashSet初始化(Scala | Java)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一封AI解决"生命迷宫"难题.尝试将状态存储到HashSet会使所有操作变慢.没有一组探索状态,运行它会更快.我相当有信心我的节点(状态存储)实现equals和hashCode,并且测试显示HashSet不会添加重复的状态.我可能需要重做hashCode函数,但是我认为减慢它的原因是HashSet重新哈希和调整大小.

I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode well as tests show a HashSet doesn't add duplicate states. I may need to rework the hashCode function, but I believe what's slowing it down is the HashSet rehashing and resizing.

我尝试将初始容量设置为一个很大的数字,但是仍然非常慢:

I've tried setting the initial capacity to a very large number, but it's still extremely slow:

 val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
 val frontier = new QuickQueue[Node](initCapacity)

这是快速队列代码:

class QuickQueue[T](capacity: Int) {

val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
    //methods below

有关更多信息,这是哈希函数.我将网格值以字节存储在两个数组中,并使用元组访问它:

For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:

override def hashCode(): Int = {
  var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
  for (y <- 0 until grid.height) {
     for (x <- 0 until grid.width) {
        sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
     }
     sum += Math.pow(sum, y).toInt
  }
  return sum
}

关于如何设置不会降低速度的HashSet的任何建议?也许是关于如何记住探索状态的另一个建议?

Any suggestions on how to setup a HashSet that wont slow things down? Maybe another suggestion of how to remember explored states?

P.S.使用java.util.HashSet,即使设置了初始容量,相对于<还是要花费80秒.没有设置7秒

P.S. using java.util.HashSet, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set

推荐答案

好的,请先替换

override def hashCode(): Int =

使用

override lazy val hashCode: Int = 

因此您不必在每次需要访问哈希代码时都计算(grid.height*grid.width)浮点功率.这样可以大大加快速度.

so you don't calculate (grid.height*grid.width) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.

然后,除非您以某种方式依赖具有紧密哈希码的紧密单元,否则不要重新发明轮子.使用scala.util.hashing.MurmurHash3.seqHash或类似的方法来计算您的哈希值.这将使您的哈希速度提高20倍左右. (仍然保持惰性价.)

Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)

这时,您仅需要执行必需的设置操作.现在,除非您有很多0x0网格,否则您将花费​​大量时间等待math.pow得到结果(并冒着一切变成Double.PositiveInfinity0.0的风险,具体取决于值会产生哈希冲突,从而进一步降低速度).

Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity or 0.0, depending on how big the values are, which will create hash collisions which will slow things down still further).

这篇关于最佳HashSet初始化(Scala | Java)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆