如何在 Swift 中为 Int 数组(自定义字符串结构)实现 Hashable 协议 [英] How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)

查看:49
本文介绍了如何在 Swift 中为 Int 数组(自定义字符串结构)实现 Hashable 协议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个类似于 String 的结构,除了它只处理 Unicode UTF-32 标量值.因此,它是一个 UInt32 数组.(有关更多背景信息,请参阅这个问题.)

I am making a structure that acts like a String, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32. (See this question for more background.)

我希望能够使用我的自定义 ScalarString 结构作为字典中的键.例如:

I want to be able to use my custom ScalarString struct as a key in a dictionary. For example:

var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value

// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...

// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
    // do something with value
}

问题

为了做到这一点,ScalarString 需要实现 哈希协议.我以为我可以做这样的事情:

Problem

In order to do that, ScalarString needs to implement the Hashable Protocol. I thought I would be able to do something like this:

struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    var hashValue : Int {
        get {
            return self.scalarArray.hashValue // error
        }
    }
}

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.hashValue == right.hashValue
}

但后来我发现 Swift 数组 没有 hashValue.

but then I discovered that Swift arrays don't have a hashValue.

文章实现可哈希协议的策略Swift 有很多很棒的想法,但我没有看到任何看起来在这种情况下会很好用的想法.具体来说,

The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,

  • 对象属性(数组是没有hashValue)
  • ID 属性(不确定如何才能很好地实现)
  • 公式(似乎任何用于 32 位整数字符串的公式都会占用大量处理器,并且会出现大量整数溢出)
  • ObjectIdentifier(我使用的是结构体,而不是类)
  • 从 NSObject 继承(我使用的是结构体,而不是类)
  • Object property (array is does not have hashValue)
  • ID property (not sure how this could be implemented well)
  • Formula (seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
  • ObjectIdentifier (I'm using a struct, not a class)
  • Inheriting from NSObject (I'm using a struct, not a class)

以下是我阅读的其他一些内容:

Here are some other things I read:

Swift 字符串有一个 hashValue 属性,所以我知道这是可能的.

Swift Strings have a hashValue property, so I know it is possible to do.

如何为我的自定义结构创建 hashValue?

How would I create a hashValue for my custom structure?

更新 1: 我想做一些不涉及转换为 String 然后使用 StringhashValue 的事情.我制作自己的结构的全部目的是避免进行大量的 String 转换.String 从某处获取它的 hashValue.好像我可以用同样的方法得到它.

Update 1: I would like to do something that does not involve converting to String and then using String's hashValue. My whole point for making my own structure was so that I could avoid doing lots of String conversions. String gets it's hashValue from somewhere. It seems like I could get it using the same method.

更新 2: 我一直在研究其他上下文中字符串哈希码算法的实现.不过,我在知道哪个最好并用 Swift 表达它们时有点困难.

Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.

  • Java hashCode algorithm
  • C algorithms
  • hash function for string (SO question and answers in C)
  • Hashing tutorial (Virginia Tech Algorithm Visualization Research Group)
  • General Purpose Hash Function Algorithms

更新 3

我不想导入任何外部框架,除非这是处理这些事情的推荐方式.

I would prefer not to import any external frameworks unless that is the recommended way to go for these things.

我使用 DJB 哈希函数提交了一个可能的解决方案.

I submitted a possible solution using the DJB Hash Function.

推荐答案

更新

Martin R 写道:

Swift 4.1 开始,编译器可以合成 EquatableHashable对于类型一致性自动,如果所有成员都符合Equatable/Hashable (SE0185).从 Swift 4.2 开始,高质量的哈希combiner 内置于 Swift 标准库 (SE-0206) 中.

As of Swift 4.1, the compiler can synthesize Equatable and Hashable for types conformance automatically, if all members conform to Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash combiner is built-in into the Swift standard library (SE-0206).

因此不再需要定义自己的散列函数,声明一致性就足够了:

Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance:

struct ScalarString: Hashable, ... {

    private var scalarArray: [UInt32] = []

    // ... }

因此,下面的答案需要重写(再次).在此之前,请参阅上面链接中 Martin R 的回答.

提交我的代码审查的原始答案.

Hashable 协议允许您使用您的自定义类或结构作为字典键.为了实现这个协议,你需要

The Hashable protocol allows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to

  1. 实施 Equatable 协议(Hashable 继承自 Equatable)
  2. 返回一个计算出的hashValue
  1. Implement the Equatable protocol (Hashable inherits from Equatable)
  2. Return a computed hashValue

这些要点来自文档中给出的公理:

These points follow from the axiom given in the documentation:

x == y 意味着 x.hashValue == y.hashValue

其中 xy 是某种类型的值.

where x and y are values of some Type.

为了实现 Equatable 协议,您需要定义您的类型如何使用 ==(等价)运算符.在您的示例中,可以这样确定等效性:

In order to implement the Equatable protocol, you define how your type uses the == (equivalence) operator. In your example, equivalence can be determined like this:

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

== 函数是全局的,所以它在你的类或结构之外.

The == function is global so it goes outside of your class or struct.

您的自定义类或结构还必须有一个计算的 hashValue 变量.一个好的散列算法将提供范围广泛的散列值.但是需要注意的是,您不需要保证哈希值都是唯一的.当两个不同的值具有相同的哈希值时,这称为哈希冲突.当发生碰撞时,它需要一些额外的工作(这就是为什么需要一个良好的分布),但一些碰撞是可以预料的.据我了解, == 函数完成了额外的工作.(更新:看起来像== 可以完成所有的工作.)

Your custom class or struct must also have a computed hashValue variable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the == function does that extra work. (Update: It looks like == may do all the work.)

有多种方法可以计算哈希值.例如,您可以执行一些简单的操作,例如返回数组中的元素数.

There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.

var hashValue: Int {
    return self.scalarArray.count
} 

每当两个数组具有相同数量的元素但不同的值时,这将导致哈希冲突.NSArray 显然使用了这种方法.

This would give a hash collision every time two arrays had the same number of elements but different values. NSArray apparently uses this approach.

DJB 哈希函数

处理字符串的常见散列函数是 DJB 散列函数.这是我将使用的,但请查看其他一些此处.

A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.

一个 Swift 实现由@MartinR 提供如下:

A Swift implementation provided by @MartinR follows:

var hashValue: Int {
    return self.scalarArray.reduce(5381) {
        ($0 << 5) &+ $0 &+ Int($1)
    }
}

这是我原始实现的改进版本,但让我也包括旧的扩展形式,对于不熟悉 reduce.这是等效的,我相信:

This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce. This is equivalent, I believe:

var hashValue: Int {

    // DJB Hash Function
    var hash = 5381

    for(var i = 0; i < self.scalarArray.count; i++)
    {
        hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
    }

    return hash
} 

&+ 运算符允许 Int 溢出并重新开始处理长字符串.

The &+ operator allows Int to overflow and start over again for long strings.

我们已经查看了各个部分,但现在让我展示与 Hashable 协议相关的整个示例代码.ScalarString 是问题中的自定义类型.当然,这对不同的人来说会有所不同.

We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarString is the custom type from the question. This will be different for different people, of course.

// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {

    private var scalarArray: [UInt32] = []

    // required var for the Hashable protocol
    var hashValue: Int {
        // DJB hash function
        return self.scalarArray.reduce(5381) {
            ($0 << 5) &+ $0 &+ Int($1)
        }
    }
}

// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

其他有用的阅读

  • 哪种哈希算法最适合唯一性和速度?
  • 溢出运算符
  • 为什么 5381 和 33 如此重要在djb2算法中?
  • 如何处理哈希冲突?
  • 非常感谢 Martin R 在 Code Review 中的表现.我的重写主要基于他的回答.如果您觉得这对您有帮助,请给他点赞.

    A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.

    Swift 现在是开源的,因此可以从 源代码.它似乎比我在这里给出的答案更复杂,我没有花时间对其进行全面分析.随意自己做.

    Swift is open source now so it is possible to see how hashValue is implemented for String from the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.

    这篇关于如何在 Swift 中为 Int 数组(自定义字符串结构)实现 Hashable 协议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆