缓慢的 Swift 数组和字符串性能 [英] Slow Swift Arrays and Strings performance

查看:23
本文介绍了缓慢的 Swift 数组和字符串性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有两个非常相似的Levenshtein 距离算法.

Here is two pretty similar Levenshtein Distance algorithms.

Swift 实现:https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b

Objective-C 实现:https://gist.github.com/boratlibre/1593632

swiftObjC 实现要慢得多我已经发送了几个小时以使其更快但是......似乎 Swift 数组和 Strings 操作不如 objC 快.

The swift one is dramatically slower then ObjC implementation I've send couple of hours to make it faster but... It seems like Swift arrays and Strings manipulation are not as fast as objC.

在 2000 random Strings 计算 Swift 实现比 ObjC 慢大约 100(!!!) 倍.

On 2000 random Strings calculations Swift implementation is about 100(!!!) times slower then ObjC.

老实说,我不知道哪里出了问题,因为即使是 swift 的这一部分

Honestly speaking, I've got no idea what could be wrong, coz even this part of swift

func levenshtein(aStr: String, bStr: String) -> Int {
// create character arrays
let a = Array(aStr)
let b = Array(bStr)
...

Objective C

有人知道如何加速 swift 计算吗?

Is anyone knows how to speedup swift calculations?

先谢谢你!

附加

毕竟建议的改进 swift 代码看起来像这样.并且它在发布配置中比 ObjC 慢 4 倍.

After all suggested improvements swift code looks like this. And it is 4 times slower then ObjC in release configuration.

import Foundation
class Array2D {
    var cols:Int, rows:Int
    var matrix:UnsafeMutablePointer<Int>


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = UnsafeMutablePointer<Int>(malloc(UInt(cols * rows) * UInt(sizeof(Int))))
        for i in 0...cols*rows {
            matrix[i] = 0
        }

    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col] as Int
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

extension String {
    func levenshteinDistanceFromStringSwift(comparingString: NSString) -> Int {
        let aStr = self
        let bStr = comparingString

//        let a = Array(aStr.unicodeScalars)
//        let b = Array(bStr.unicodeScalars)

        let a:NSString = aStr
        let b:NSString = bStr

        var dist = Array2D(cols: a.length + 1, rows: b.length + 1)



        for i in 1...a.length {
            dist[i, 0] = i
        }

        for j in 1...b.length {
            dist[0, j] = j
        }

        for i in 1...a.length {
            for j in 1...b.length {
                if a.characterAtIndex(i-1) == b.characterAtIndex(j-1) {
                    dist[i, j] = dist[i-1, j-1]  // noop
                } else {
                    dist[i, j] = min(
                        dist[i-1, j] + 1,  // deletion
                        dist[i, j-1] + 1,  // insertion
                        dist[i-1, j-1] + 1  // substitution
                    )
                }
            }
        }

        return dist[a.length, b.length]

    }
    func levenshteinDistanceFromStringObjC(comparingString: String) -> Int {
        let aStr = self
        let bStr = comparingString
        //It is really strange, but I should link Objective-C coz dramatic slow swift performance
        return aStr.compareWithWord(bStr, matchGain: 0, missingCost: 1)

    }

}

malloc??NSString??最后速度降低了 4 倍?现在还有人需要 swift 吗?

malloc?? NSString?? and at the end 4 times speed decrease? Is anybody needs swift anymore?

推荐答案

Swift 代码比 Objective-C 代码慢的原因有很多.我通过比较两个固定字符串 100 次做了一个非常简单的测试用例.

There are multiple reasons why the Swift code is slower than the Objective-C code. I made a very simple test case by comparing two fixed strings 100 times.

  • Objective-C 代码:0.026 秒
  • Swift 代码:3.14 秒

第一个原因是 Swift Character 代表一个扩展的字素簇",它可以包含多个 Unicode 代码点(例如标志").这使得将字符串分解为字符的速度很慢.另一方面,Objective-CNSString 将字符串存储为 UTF-16 代码点序列.

The first reason is that a Swift Character represents an "extended grapheme cluster", which can contain several Unicode code points (e.g. "flags"). This makes the decomposition of a string into characters slow. On the other hand, Objective-C NSString stores the strings as a sequence of UTF-16 code points.

如果更换

let a = Array(aStr)
let b = Array(bStr)

let a = Array(aStr.utf16)
let b = Array(bStr.utf16)

这样 Swift 代码也适用于 UTF-16 序列,然后时间就会减少到 1.88 秒.

so that the Swift code works on UTF-16 sequences as well then the time goes down to 1.88 seconds.

二维数组的分配也很慢.分配速度更快单个一维数组.我在这里找到了一个简单的 Array2D 类:http://blog.trolieb.com/trouble-multidimensional-arrays-swift/

The allocation of the 2-dimensional array is also slow. It is faster to allocate a single one-dimensional array. I found a simple Array2D class here: http://blog.trolieb.com/trouble-multidimensional-arrays-swift/

class Array2D {
    var cols:Int, rows:Int
    var matrix: [Int]


    init(cols:Int, rows:Int) {
        self.cols = cols
        self.rows = rows
        matrix = Array(count:cols*rows, repeatedValue:0)
    }

    subscript(col:Int, row:Int) -> Int {
        get {
            return matrix[cols * row + col]
        }
        set {
            matrix[cols*row+col] = newValue
        }
    }

    func colCount() -> Int {
        return self.cols
    }

    func rowCount() -> Int {
        return self.rows
    }
}

在您的代码中使用该类

func levenshtein(aStr: String, bStr: String) -> Int {
    let a = Array(aStr.utf16)
    let b = Array(bStr.utf16)

    var dist = Array2D(cols: a.count + 1, rows: b.count + 1)

    for i in 1...a.count {
        dist[i, 0] = i
    }

    for j in 1...b.count {
        dist[0, j] = j
    }

    for i in 1...a.count {
        for j in 1...b.count {
            if a[i-1] == b[j-1] {
                dist[i, j] = dist[i-1, j-1]  // noop
            } else {
                dist[i, j] = min(
                    dist[i-1, j] + 1,  // deletion
                    dist[i, j-1] + 1,  // insertion
                    dist[i-1, j-1] + 1  // substitution
                )
            }
        }
    }

    return dist[a.count, b.count]
}

测试用例中的时间减少到 0.84 秒.

the time in the test case goes down to 0.84 seconds.

我在 Swift 代码中发现的最后一个瓶颈是 min() 函数.Swift 库有一个更快的内置 min() 函数.所以只需删除Swift 代码中的自定义函数将测试用例的时间减少到0.04 秒,几乎和 Objective-C 版本一样好.

The last bottleneck that I found in the Swift code is the min() function. The Swift library has a built-in min() function which is faster. So just removing the custom function from the Swift code reduces the time for the test case to 0.04 seconds, which is almost as good as the Objective-C version.

附录:使用 Unicode 标量似乎更快:

Addendum: Using Unicode scalars seems to be even slightly faster:

let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)

并且有一个优点,它可以与代理对一起正常工作,例如作为表情符号.

and has the advantage that it works correctly with surrogate pairs such as Emojis.

这篇关于缓慢的 Swift 数组和字符串性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆