共享大型不可变对象的工厂/缓存策略 [英] Factory/Caching strategy for sharing large immutable objects

查看:60
本文介绍了共享大型不可变对象的工厂/缓存策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题很像之前的帖子Optimal HashSet Initialization (Scala | Java),我想使用 HashSet 进行加速(目前我使用的是 Set),但 HashSet 没有展示其(恒定时间)优势.

My problem is much like the previous post Optimal HashSet Initialization (Scala | Java), where I want to use HashSet for speedups (currently i am using Set) but the HashSet does not exhibit its (Constant time)advantages.

对于提到的解决方案:

您可以通过实习来最大限度地减少平等的成本.这意味着你通过工厂方法获取类的新对象,检查请求的新对象是否已经存在,如果存在,返回对现有对象的引用.如果你断言每个这种类型的对象是以这种方式构造的,你知道有每个不同对象和equals只有一个实例变成相当于对象标识,这是一个廉价的引用比较(在 Scala 中为 eq).

You can minimize the cost of equals by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).

但是,我不太确定检查的有效方法是什么

However, I am not quite sure what's the efficient way to check

请求的新对象是否已经存在

whether the requested new object already exists

对于大对象(例如带有hashmap参数的case类对象,其他一些对象结构......等)

for large objects (e.g. objects of case class with parameter of hashmap, some other object structures...etc)

通过比较这些复杂的字段中的每一个并没有给出太多的性能优势,不是吗?或者如果是,还有其他方法吗?

By comparing each of those complicated fields do not give out much performance advantage, isn't it? Or if it is, are there other ways?

另外,我也很困惑如何制作

In addition, I'm also confused that how to make

等于变成相当于对象标识,这是一个廉价的引用比较(在 Scala 中为 eq).

equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).

在代码中.

我认为上面提到的意图技术基本上是一个对象缓存.因此,我参考了小不可变缓存策略中提到的技术Java 中的对象?.但是,我仍然不知道大型对象的有效方法是什么.

The intening technique mentioned above, I think, is basically an object cache. Therefore, I reference to the technique mentioned in the post Caching strategy for small immutable objects in Java?. However, I still do not see what's the efficient way for large objects.

为方便起见,我引用了帖子中的缓存技术(在 Java 中),并用 /// 表示我的想法和问题:

For convenience, I quoted the caching technique (in Java) from the post with /// denoting my thoughts and questions:

private static final int N_POINTS = 10191; 
private static final Point[] POINTS = new Point[N_POINTS];

public static Point of(int x, int y, int z) {
    int h = hash(x,y,z); ///  I can use hash code of each complicated field to construct the value
    int index = (h & 0x7fffffff) % N_POINTS;
    Point p = POINTS[index];
    if (p != null && p.x == x && p.y == y && p.z == z) /// Not working for large objects?
       return p;
    return POINTS[index] = new Point(x,y,z);
} 

总而言之,为大对象实现高效缓存策略的最佳实践是什么,以便我可以在 Scala 中利用 HashSet?

To summarize, what's the best practice to implement efficient caching strategy for large objects, so that I can take advantage of HashSet in Scala?

谢谢,

推荐答案

interning 的目标是使 equals 方法能够使用 引用相等 实现,如下所示:this eq that(或 this == that 在 Java 中).很明显,此实现将具有最佳运行时特性比较一些字段集的传统 equals.

The goal of interning is to enable the equals method to be implemented using reference equality as: this eq that (or this == that in Java). It's clear that this implementation will have optimal run-time characteristics over the more traditional equals that compares some set of fields.

这种比较只有在只有一个实例时才有效每个唯一对象"由对象的某些字段集确定.

This comparison is only effective if there is one and only one instance of each "unique object" as determined by some set of fields of the object.

实习生-ing 仅在以下情况下有效实习生操作的前期成本可以通过对 equals 的调用(可能很多)的最小成本完全抵消,由 HashMap 驱动.

Intern-ing is effective only in so far as the up-front cost of the intern operation can be fully offset by the minimized cost of the (possibly many) calls to equals, driven by HashMap.

正如您所指出的,这个 intern 可能需要一个潜在的昂贵的缓存机制:有运行时开销(执行检查)以及内存开销(缓存的大小).

As you've noted, this intern-ing may require a potentially costly caching mechanism: there is a run time overhead (performing the check) and a memory overhead (the size of the cache).

  • 最直接的缓存方式是使用 HashMap 和传统的 equals.hashCode 应该是懒惰的;缓存它的结果,所以不需要重新计算.可能需要考虑线程问题.

  • The most straight forward way to cache is with HashMap, and a traditional equals. hashCode should be lazy; caching it's result so it doesn't need to be recomputed. Threading issues may need to be considered.

实现这种缓存的一种方法是使用 trie,也许在每个节点上用哈希表实现,其中每个级别"对应于对象中的一个字段(第一级 - 字段 1、第二级、字段 2 等),用于用于建立唯一性的一组字段."

One way to implement such a cache is to use a trie, perhaps implemented with a hash table at each node, and where each "level" corresponds to a field in the object (first level - field 1, second level, field 2, etc...) for the "set of fields used to establish uniqueness."

还有其他可行的方法来实现这种缓存.请原谅我避免进一步讨论此类问题,而是让我提出避免处理该问题的方法.

There are other viable ways to implement such a cache. Forgive me for avoiding any further discussion of such, and allow me instead to present ways to avoid dealing with the issue.

声明:您很可能通过以下方式获得足够有效的结果使用快速散列(并在内部缓存它),equals 的传统"实现,并从 足够的最小尺寸

Claim: You're likely to have sufficiently effective results by using a fast hash (and caching it internally), a "traditional" implementation of equals, and by starting with a HashMap or HashSet of sufficient minimal size

理想情况下,哈希表中的冲突很少,并且对 equals 的调用次数最少.

Ideally there are few collisions in the hash table, and the number of calls to equals is minimal.

[此方法假定唯一定义对象的字段"是不可变的.如果情况不正确,可以进行适当的调整以进行补偿.]

[This method assumes that the "fields that uniquely define the object" are immutable. Suitable adjustments can be made to compensate if this is not true.]

构建&缓存对应于唯一对象实例的 private unique: String.例如,对于一些简单的对象,以下内容可能是唯一的:

Build & cache a private unique: String that corresponds to the unique object instance. For example, the following may be unique for some simple objects:

用于建立唯一性的字段集"的字符串值的串联,以逗号分隔.

The concatenation of the string values of the "set of fields used to establish uniqueness," separated by commas.

了解您的对象/字段特征将有助于确定如何创建这样一个唯一的字符串.

An understanding of your object/field characteristics will help determine how such a unique string can be created.

有了这样的值,我们就可以避免单独的intern/caching机制并通过同时实现 equalshashCode 来保留许多好处就这个 unique 字符串而言:

With such a value, we can avoid the separate intern/caching mechanism and retain much of the benefits by implementing both equals and hashCode in terms of this unique string:

def equals(thatObj: Any) = thatObj match {
    case that : MyType => unique.equals(that.unique)
    case _             => false
  }

def hashCode() = unique.hashCode

选项 2 的替代方案:

使用 String.intern 并调整 equals 以利用它:

Use String.intern and adjust the equals to take advantage of it:

private val unique = buildUnique().intern

def equals(thatObj: Any) = thatObj match {
    case that : MyType => unique.eq(that.unique) // In Java: unique == that.unique
    case _             => false
  }

这篇关于共享大型不可变对象的工厂/缓存策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆