缓慢的Swift数组和字符串性能 [英] Slow Swift Arrays and Strings performance
问题描述
这是两个非常相似的Levenshtein Distance algorithms
.
Here is two pretty similar Levenshtein Distance algorithms
.
Swift
实现:
https://gist.github.com/bgreenlee/52d93a1d8fa1b8c1f38b
和Objective-C
实现:
https://gist.github.com/boratlibre/1593632
swift
的实现要比ObjC
的实现慢得多
我已经花了几个小时使它更快,但是...似乎Swift
数组和Strings
操纵不如objC
快.
The swift
one is dramatically slower then ObjC
implementation
I've send couple of hours to make it faster but... It seems like Swift
arrays and Strings
manipulation are not as fast as objC
.
在2000年random Strings
计算中,Swift
的实现要比ObjC
慢约100(!!!)倍.
On 2000 random Strings
calculations Swift
implementation is about 100(!!!) times slower then ObjC
.
老实说,我不知道可能出什么问题了,因为这是迅速的部分
Honestly speaking, I've got no idea what could be wrong, coz even this part of swift
func levenshtein(aStr: String, bStr: String) -> Int {
// create character arrays
let a = Array(aStr)
let b = Array(bStr)
...
比整个Objective C
有人知道如何加快swift
计算速度吗?
Is anyone knows how to speedup swift
calculations?
提前谢谢!
追加
所有建议的改进之后,快速代码看起来像这样. 在发布配置中,它比ObjC慢了 4倍.
After all suggested improvements swift code looks like this. And it is 4 times slower then ObjC in release configuration.
import Foundation
class Array2D {
var cols:Int, rows:Int
var matrix:UnsafeMutablePointer<Int>
init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = UnsafeMutablePointer<Int>(malloc(UInt(cols * rows) * UInt(sizeof(Int))))
for i in 0...cols*rows {
matrix[i] = 0
}
}
subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col] as Int
}
set {
matrix[cols*row+col] = newValue
}
}
func colCount() -> Int {
return self.cols
}
func rowCount() -> Int {
return self.rows
}
}
extension String {
func levenshteinDistanceFromStringSwift(comparingString: NSString) -> Int {
let aStr = self
let bStr = comparingString
// let a = Array(aStr.unicodeScalars)
// let b = Array(bStr.unicodeScalars)
let a:NSString = aStr
let b:NSString = bStr
var dist = Array2D(cols: a.length + 1, rows: b.length + 1)
for i in 1...a.length {
dist[i, 0] = i
}
for j in 1...b.length {
dist[0, j] = j
}
for i in 1...a.length {
for j in 1...b.length {
if a.characterAtIndex(i-1) == b.characterAtIndex(j-1) {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}
return dist[a.length, b.length]
}
func levenshteinDistanceFromStringObjC(comparingString: String) -> Int {
let aStr = self
let bStr = comparingString
//It is really strange, but I should link Objective-C coz dramatic slow swift performance
return aStr.compareWithWord(bStr, matchGain: 0, missingCost: 1)
}
}
malloc? NSString ??并在最后4倍速度下降?有人需要迅速了吗?
malloc?? NSString?? and at the end 4 times speed decrease? Is anybody needs swift anymore?
推荐答案
Swift代码比Objective-C代码慢的原因有很多. 通过比较两个固定字符串100次,我做了一个非常简单的测试用例.
There are multiple reasons why the Swift code is slower than the Objective-C code. I made a very simple test case by comparing two fixed strings 100 times.
- 目标C代码:0.026秒
- 快速代码:3.14秒
第一个原因是Swift Character
代表扩展字素簇",
其中可以包含多个Unicode代码点(例如标志").这使得
将字符串分解为字符的速度很慢.另一方面,Objective-C
NSString
将字符串存储为UTF-16代码点的序列.
The first reason is that a Swift Character
represents an "extended grapheme cluster",
which can contain several Unicode code points (e.g. "flags"). This makes the
decomposition of a string into characters slow. On the other hand, Objective-C
NSString
stores the strings as a sequence of UTF-16 code points.
如果您替换
let a = Array(aStr)
let b = Array(bStr)
作者
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
,这样Swift代码也可以在UTF-16序列上工作,那么时间就减少了 到1.88秒.
so that the Swift code works on UTF-16 sequences as well then the time goes down to 1.88 seconds.
二维数组的分配也很慢.分配更快
单个一维数组.我在这里找到了一个简单的Array2D
类:
http://blog.trolieb.com/trouble-multiDimension-arrays-swift/
The allocation of the 2-dimensional array is also slow. It is faster to allocate
a single one-dimensional array. I found a simple Array2D
class here:
http://blog.trolieb.com/trouble-multidimensional-arrays-swift/
class Array2D {
var cols:Int, rows:Int
var matrix: [Int]
init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = Array(count:cols*rows, repeatedValue:0)
}
subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col]
}
set {
matrix[cols*row+col] = newValue
}
}
func colCount() -> Int {
return self.cols
}
func rowCount() -> Int {
return self.rows
}
}
在您的代码中使用该类
func levenshtein(aStr: String, bStr: String) -> Int {
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)
var dist = Array2D(cols: a.count + 1, rows: b.count + 1)
for i in 1...a.count {
dist[i, 0] = i
}
for j in 1...b.count {
dist[0, j] = j
}
for i in 1...a.count {
for j in 1...b.count {
if a[i-1] == b[j-1] {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}
return dist[a.count, b.count]
}
测试用例中的时间减少到0.84秒.
the time in the test case goes down to 0.84 seconds.
我在Swift代码中发现的最后一个瓶颈是min()
函数.
Swift库具有内置的min()
函数,该函数更快.所以只要移除
Swift代码中的自定义函数减少了测试用例的时间
0.04秒,几乎与Objective-C版本一样.
The last bottleneck that I found in the Swift code is the min()
function.
The Swift library has a built-in min()
function which is faster. So just removing
the custom function from the Swift code reduces the time for the test case to
0.04 seconds, which is almost as good as the Objective-C version.
附录:使用Unicode标量似乎更快一些:
Addendum: Using Unicode scalars seems to be even slightly faster:
let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)
并具有可以与代理对(例如, 作为表情符号.
and has the advantage that it works correctly with surrogate pairs such as Emojis.
这篇关于缓慢的Swift数组和字符串性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!