Swift Beta性能:对数组进行排序 [英] Swift Beta performance: sorting arrays

查看:74
本文介绍了Swift Beta性能:对数组进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Swift Beta中实现一种算法,并注意到性能非常差.深入研究后,我意识到瓶颈之一就是对数组进行排序一样简单.相关部分在这里:

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor. After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays. The relevant part is here:

let n = 1000000
var x =  [Int](repeating: 0, count: n)
for i in 0..<n {
    x[i] = random()
}
// start clock here
let y = sort(x)
// stop clock here

在C ++中,类似的操作在我的计算机上需要 0.06s .

In C++, a similar operation takes 0.06s on my computer.

在Python中,它花费的时间为 0.6s (没有技巧,只需y = sorted(x)即可获取整数列表).

In Python, it takes 0.6s (no tricks, just y = sorted(x) for a list of integers).

在Swift中,如果我使用以下命令进行编译,则需要 6s :

In Swift it takes 6s if I compile it with the following command:

xcrun swift -O3 -sdk `xcrun --show-sdk-path --sdk macosx`

如果我使用以下命令进行编译,则最多需要 88s :

And it takes as much as 88s if I compile it with the following command:

xcrun swift -O0 -sdk `xcrun --show-sdk-path --sdk macosx`

在Xcode中使用发布"与调试"构建的时间类似.

Timings in Xcode with "Release" vs. "Debug" builds are similar.

这是怎么了?与C ++相比,我可以理解一些性能损失,但与纯Python相比,却不能将性能降低10倍.

What is wrong here? I could understand some performance loss in comparison with C++, but not a 10-fold slowdown in comparison with pure Python.

天气注意到,将-O3更改为-Ofast使得此代码的运行几乎与C ++版本一样快!但是,-Ofast大大改变了语言的语义-在我的测试中,它禁用了对整数溢出和数组索引溢出的检查.例如,使用-Ofast,以下Swift代码将无提示地运行而不会崩溃(并打印出一些垃圾):

weather noticed that changing -O3 to -Ofast makes this code run almost as fast as the C++ version! However, -Ofast changes the semantics of the language a lot — in my testing, it disabled the checks for integer overflows and array indexing overflows. For example, with -Ofast the following Swift code runs silently without crashing (and prints out some garbage):

let n = 10000000
print(n*n*n*n*n)
let x =  [Int](repeating: 10, count: n)
print(x[n])

所以-Ofast不是我们想要的; Swift的全部要点是我们拥有安全网.当然,安全网会对性能产生一定的影响,但不应使程序慢100倍.请记住,Java已经检查了数组的边界,在典型情况下,速度下降的幅度远小于2.在Clang和GCC中,我们有-ftrapv来检查(有符号的)整数溢出,但速度并不是那么慢,或者.

So -Ofast is not what we want; the whole point of Swift is that we have the safety nets in place. Of course, the safety nets have some impact on the performance, but they should not make the programs 100 times slower. Remember that Java already checks for array bounds, and in typical cases, the slowdown is by a factor much less than 2. And in Clang and GCC we have got -ftrapv for checking (signed) integer overflows, and it is not that slow, either.

因此产生一个问题:如何在Swift中获得合理的性能而又不失去安全网?

Hence the question: how can we get reasonable performance in Swift without losing the safety nets?

我做了一些基准测试,其中的循环非常简单

Edit 2: I did some more benchmarking, with very simple loops along the lines of

for i in 0..<n {
    x[i] = x[i] ^ 12345678
}

(这里有xor操作,以便我可以更容易地在汇编代码中找到相关的循环.我尝试选择一种易于发现但在不需要的意义上也是无害的"操作与整数溢出相关的所有检查.)

(Here the xor operation is there just so that I can more easily find the relevant loop in the assembly code. I tried to pick an operation that is easy to spot but also "harmless" in the sense that it should not require any checks related to integer overflows.)

同样,-O3-Ofast之间的性能存在巨大差异.所以我看了一下汇编代码:

Again, there was a huge difference in the performance between -O3 and -Ofast. So I had a look at the assembly code:

  • 使用-Ofast,我得到了我期望的东西.相关部分是一个包含5条机器语言指令的循环.

  • With -Ofast I get pretty much what I would expect. The relevant part is a loop with 5 machine language instructions.

使用-O3,我得到了超出我最疯狂的想象力的东西.内部循环跨越88行汇编代码.我没有尝试了解所有内容,但是最可疑的部分是"callq _swift_retain"的13个调用和"callq _swift_release"的另外13个调用.也就是说,在内部循环中 26个子例程调用

With -O3 I get something that was beyond my wildest imagination. The inner loop spans 88 lines of assembly code. I did not try to understand all of it, but the most suspicious parts are 13 invocations of "callq _swift_retain" and another 13 invocations of "callq _swift_release". That is, 26 subroutine calls in the inner loop!

在评论中,Ferruccio要求提供不依赖于内置函数(例如sort)的公平基准.我认为以下程序是一个很好的示例:

Edit 3: In comments, Ferruccio asked for benchmarks that are fair in the sense that they do not rely on built-in functions (e.g. sort). I think the following program is a fairly good example:

let n = 10000
var x = [Int](repeating: 1, count: n)
for i in 0..<n {
    for j in 0..<n {
        x[i] = x[j]
    }
}

没有算术运算,因此我们不必担心整数溢出.我们唯一要做的就是大量数组引用.结果就在这里-与-Ofast相比,Swift -O3损失了将近500倍:

There is no arithmetic, so we do not need to worry about integer overflows. The only thing that we do is just lots of array references. And the results are here—Swift -O3 loses by a factor almost 500 in comparison with -Ofast:

  • C ++ -O3: 0.05 s
  • C ++ -O0:0.4秒
  • Java: 0.2秒
  • 带有PyPy的Python:0.5秒
  • Python: 12秒
  • 快速-Ofast:0.05 s
  • 快速-O3: 23秒
  • 迅速-O0:443秒
  • C++ -O3: 0.05 s
  • C++ -O0: 0.4 s
  • Java: 0.2 s
  • Python with PyPy: 0.5 s
  • Python: 12 s
  • Swift -Ofast: 0.05 s
  • Swift -O3: 23 s
  • Swift -O0: 443 s

(如果您担心编译器可能会完全优化无意义的循环,则可以将其更改为例如x[i] ^= x[j],并添加输出x[0]的print语句.这不会改变任何内容;计时将是非常相似.)

(If you are concerned that the compiler might optimize out the pointless loops entirely, you can change it to e.g. x[i] ^= x[j], and add a print statement that outputs x[0]. This does not change anything; the timings will be very similar.)

是的,这里的Python实现是一个愚蠢的纯Python实现,带有一个ints列表,并嵌套了for循环.它应该比未优化的Swift慢得多. Swift和数组索引似乎严重破坏了某些东西.

And yes, here the Python implementation was a stupid pure Python implementation with a list of ints and nested for loops. It should be much slower than unoptimized Swift. Something seems to be seriously broken with Swift and array indexing.

这些问题(以及其他一些性能问题)似乎已在Xcode 6 beta 5中得到修复.

Edit 4: These issues (as well as some other performance issues) seems to have been fixed in Xcode 6 beta 5.

对于排序,我现在有以下时间安排:

For sorting, I now have the following timings:

  • c ++ -O3:0.06 s
  • swiftc -Ofast:0.1 s
  • swiftc -O:0.1秒
  • swiftc:4 s

对于嵌套循环:

  • c ++ -O3:0.06 s
  • swiftc -Ofast:0.3 s
  • swiftc -O:0.4秒
  • swiftc:540 s

似乎不再有理由使用不安全的-Ofast(又称-Ounchecked);普通的-O会产生同样好的代码.

It seems that there is no reason anymore to use the unsafe -Ofast (a.k.a. -Ounchecked); plain -O produces equally good code.

推荐答案

tl; dr在此基准测试中,Swift 1.0使用默认的发布优化级别[-O]与C一样快.

tl;dr Swift 1.0 is now as fast as C by this benchmark using the default release optimisation level [-O].

这是Swift Beta中的就地快速排序:

Here is an in-place quicksort in Swift Beta:

func quicksort_swift(inout a:CInt[], start:Int, end:Int) {
    if (end - start < 2){
        return
    }
    var p = a[start + (end - start)/2]
    var l = start
    var r = end - 1
    while (l <= r){
        if (a[l] < p){
            l += 1
            continue
        }
        if (a[r] > p){
            r -= 1
            continue
        }
        var t = a[l]
        a[l] = a[r]
        a[r] = t
        l += 1
        r -= 1
    }
    quicksort_swift(&a, start, r + 1)
    quicksort_swift(&a, r + 1, end)
}

与C语言相同:

void quicksort_c(int *a, int n) {
    if (n < 2)
        return;
    int p = a[n / 2];
    int *l = a;
    int *r = a + n - 1;
    while (l <= r) {
        if (*l < p) {
            l++;
            continue;
        }
        if (*r > p) {
            r--;
            continue;
        }
        int t = *l;
        *l++ = *r;
        *r-- = t;
    }
    quicksort_c(a, r - a + 1);
    quicksort_c(l, a + n - l);
}

两项工作:

var a_swift:CInt[] = [0,5,2,8,1234,-1,2]
var a_c:CInt[] = [0,5,2,8,1234,-1,2]

quicksort_swift(&a_swift, 0, a_swift.count)
quicksort_c(&a_c, CInt(a_c.count))

// [-1, 0, 2, 2, 5, 8, 1234]
// [-1, 0, 2, 2, 5, 8, 1234]

两者都在与编写的程序相同的程序中调用.

Both are called in the same program as written.

var x_swift = CInt[](count: n, repeatedValue: 0)
var x_c = CInt[](count: n, repeatedValue: 0)
for var i = 0; i < n; ++i {
    x_swift[i] = CInt(random())
    x_c[i] = CInt(random())
}

let swift_start:UInt64 = mach_absolute_time();
quicksort_swift(&x_swift, 0, x_swift.count)
let swift_stop:UInt64 = mach_absolute_time();

let c_start:UInt64 = mach_absolute_time();
quicksort_c(&x_c, CInt(x_c.count))
let c_stop:UInt64 = mach_absolute_time();

这会将绝对时间转换为秒:

This converts the absolute times to seconds:

static const uint64_t NANOS_PER_USEC = 1000ULL;
static const uint64_t NANOS_PER_MSEC = 1000ULL * NANOS_PER_USEC;
static const uint64_t NANOS_PER_SEC = 1000ULL * NANOS_PER_MSEC;

mach_timebase_info_data_t timebase_info;

uint64_t abs_to_nanos(uint64_t abs) {
    if ( timebase_info.denom == 0 ) {
        (void)mach_timebase_info(&timebase_info);
    }
    return abs * timebase_info.numer  / timebase_info.denom;
}

double abs_to_seconds(uint64_t abs) {
    return abs_to_nanos(abs) / (double)NANOS_PER_SEC;
}

以下是编译器优化级别的摘要:

Here is a summary of the compiler's optimazation levels:

[-Onone] no optimizations, the default for debug.
[-O]     perform optimizations, the default for release.
[-Ofast] perform optimizations and disable runtime overflow checks and runtime type checks.

以秒为单位的时间, [-Onone] n = 10_000 :

Time in seconds with [-Onone] for n=10_000:

Swift:            0.895296452
C:                0.001223848

这是Swift为 n = 10_000 内置的sort():

Here is Swift's builtin sort() for n=10_000:

Swift_builtin:    0.77865783

此处是 [-O] ,表示 n = 10_000 :

Swift:            0.045478346
C:                0.000784666
Swift_builtin:    0.032513488

如您所见,Swift的性能提高了20倍.

As you can see, Swift's performance improved by a factor of 20.

按照 mweathers的答案,设置 [-Ofast ] 发挥真正作用,导致这些时间为 n = 10_000 :

As per mweathers' answer, setting [-Ofast] makes the real difference, resulting in these times for n=10_000:

Swift:            0.000706745
C:                0.000742374
Swift_builtin:    0.000603576

对于 n = 1_000_000 :

Swift:            0.107111846
C:                0.114957179
Swift_sort:       0.092688548

为进行比较,它与 n = 1_000_000 [-Onone] :

Swift:            142.659763258
C:                0.162065333
Swift_sort:       114.095478272

因此,在开发的这个阶段,没有优化的Swift几乎比本基准测试中的C慢1000倍.另一方面,两个编译器都设置为[-Ofast],即使实际上不比C稍好,Swift的实际性能也至少要好.

So Swift with no optimizations was almost 1000x slower than C in this benchmark, at this stage in its development. On the other hand with both compilers set to [-Ofast] Swift actually performed at least as well if not slightly better than C.

已经指出,[-Ofast]更改了语言的语义,使其潜在地不安全.这就是Apple在Xcode 5.0发行说明中指出的内容:

It has been pointed out that [-Ofast] changes the semantics of the language, making it potentially unsafe. This is what Apple states in the Xcode 5.0 release notes:

LLVM中提供了新的优化级别-Ofast,可以进行积极的优化. -Ofast放宽了一些保守的限制,大多数情况下对于浮点操作是安全的,这些限制对于大多数代码而言都是安全的.它可以从编译器中获得显着的高性能优势.

A new optimization level -Ofast, available in LLVM, enables aggressive optimizations. -Ofast relaxes some conservative restrictions, mostly for floating-point operations, that are safe for most code. It can yield significant high-performance wins from the compiler.

他们全都主张.无论这是否明智,我都不能说,但是据我所知,如果您不执行高精度浮点运算并且您确信没有整数或整数,那么在发行版中使用[-Ofast]似乎是足够合理的.程序中可能发生数组溢出.如果您确实需要高性能的溢出检查/精确算术,请立即选择另一种语言.

They all but advocate it. Whether that's wise or not I couldn't say, but from what I can tell it seems reasonable enough to use [-Ofast] in a release if you're not doing high-precision floating point arithmetic and you're confident no integer or array overflows are possible in your program. If you do need high performance and overflow checks / precise arithmetic then choose another language for now.

BETA 3更新:

n = 10_000 [-O] :

Swift:            0.019697268
C:                0.000718064
Swift_sort:       0.002094721

一般来说,Swift的速度要快一些,看起来Swift的内置排序已经发生了很大变化.

Swift in general is a bit faster and it looks like Swift's built-in sort has changed quite significantly.

最终更新:

[-Onone] :

Swift:   0.678056695
C:       0.000973914

[-O] :

Swift:   0.001158492
C:       0.001192406

[-未选中] :

Swift:   0.000827764
C:       0.001078914

这篇关于Swift Beta性能:对数组进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆