Swift性能:map()和reduce()vs for循环 [英] Swift performance: map() and reduce() vs for loops

查看:148
本文介绍了Swift性能:map()和reduce()vs for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Swift中编写一些性能关键代码。实现了我能想到的所有优化,并在Instruments中分析应用程序之后,我开始意识到绝大多数CPU周期都用于执行 map() reduce()对Floats数组的操作。所以,为了看看会发生什么,我用老式的<$替换了 map reduce 的所有实例c $ c> for 循环。令我惊讶的是...... for 循环要快得多!

I'm writing some performance-critical code in Swift. After implementing all the optimizations I could think of, and profiling the application in Instruments, I came to realize that the vast majority of CPU cycles are spent performing map() and reduce() operations on arrays of Floats. So, just to see what would happen, I replaced all instances of map and reduce with good old-fashioned for loops. And to my amazement... the for loops were much, much faster!

有点困惑,我决定执行一些粗略的基准测试。在一次测试中,我有 map 在执行一些简单的算术后返回一个Floats数组:

A bit puzzled by this, I decided to perform some rough benchmarks. In one test, I had map return an array of Floats after performing some simple arithmetic like so:

// Populate array with 1,000,000,000 random numbers
var array = [Float](count: 1_000_000_000, repeatedValue: 0)
for i in 0..<array.count {
    array[i] = Float(random())
}
let start = NSDate()
// Construct a new array, with each element from the original multiplied by 5
let output = array.map({ (element) -> Float in
    return element * 5
})
// Log the elapsed time
let elapsed = NSDate().timeIntervalSinceDate(start)
print(elapsed)

等效 for 循环实现:

var output = [Float]()
for element in array {
    output.append(element * 5)
}

map 的平均执行时间:20.1秒。 for 循环的平均执行时间:11.2秒。结果类似于使用Integers而不是Floats。

Average execution time for map: 20.1 seconds. Average execution time for the for loop: 11.2 seconds. Results were similar using Integers instead of Floats.

我创建了一个类似的基准来测试Swift的 reduce 的性能。这次, reduce for 循环在求和一个大型数组的元素时获得了几乎相同的性能。但是当我像这样循环测试100,000次时:

I created a similar benchmark to test the performance of Swift's reduce. This time, reduce and for loops achieved nearly the same performance when summing the elements of one large array. But when I loop the test 100,000 times like this:

// Populate array with 1,000,000 random numbers
var array = [Float](count: 1_000_000, repeatedValue: 0)
for i in 0..<array.count {
    array[i] = Float(random())
}
let start = NSDate()
// Perform operation 100,000 times
for _ in 0..<100_000 {
    let sum = array.reduce(0, combine: {$0 + $1})
}
// Log the elapsed time
let elapsed = NSDate().timeIntervalSinceDate(start)
print(elapsed)

vs:

for _ in 0..<100_000 {
    var sum: Float = 0
    for element in array {
        sum += element
    }
}

reduce 方法需要29秒,而循环需要(显然)0.000003秒。

The reduce method takes 29 seconds while the for loop takes (apparently) 0.000003 seconds.

当然,我已经准备好忽略最后一次测试作为编译器优化的结果,但我我想t可以让我们深入了解编译器如何针对Swift与Swift的内置数组方法进行不同的优化。请注意,所有测试都是在2.5 GHz i7 MacBook Pro上使用-Os优化执行的。结果取决于数组大小和迭代次数,但 for 循环总是优于其他方法至少1.5倍,有时高达10倍。

Naturally I'm ready to disregard that last test as the result of a compiler optimization, but I think it may give some insight into how the compiler optimizes differently for loops vs Swift's built-in array methods. Note that all tests were performed with -Os optimization on a 2.5 GHz i7 MacBook Pro. Results varied depending on array size and number of iterations, but for loops always outperformed the other methods by at least 1.5x, sometimes up to 10x.

我对斯威夫特的表现感到有些困惑。内置的Array方法不应该比执行此类操作的天真方法更快吗?也许某些知识水平较低的人可以了解情况。

I'm a bit perplexed about Swift's performance here. Shouldn't the built-in Array methods be faster than the naive approach for performing such operations? Maybe somebody with more low-level knowledge than I can shed some light on the situation.

推荐答案


内置的Array方法不应该比执行此类操作的天真方法
更快吗?也许某人知识水平低于我可以了解情况。

Shouldn't the built-in Array methods be faster than the naive approach for performing such operations? Maybe somebody with more low-level knowledge than I can shed some light on the situation.

我只是想试着解决这个问题问题和更多来自概念层面(我对Swift优化器的本质一无所知)与不一定。它来自编译器设计和计算机体系结构的背景,而不是对Swift优化器性质的深层了解。

I just want to attempt to address this part of the question and more from the conceptual level (with little understanding of the nature of Swift's optimizer on my part) with a "not necessarily". It's coming more from a background in compiler design and computer architecture than deep-rooted knowledge of the nature of Swift's optimizer.

调用开销

使用 map reduce 等函数接受函数作为输入,它给优化器带来了更大的压力,使其成为一种方式。在这种情况下,一些非常激进的优化的自然诱惑是在诸如 map 的实现和你提供的闭包之间不断地来回分支,同样通过这些不同的代码分支传输数据(通常通过寄存器和堆栈)。

With functions like map and reduce accepting functions as inputs, it places a greater strain on the optimizer to put it one way. The natural temptation in such a case short of some very aggressive optimization is to constantly branch back and forth between the implementation of, say, map, and the closure you provided, and likewise transmit data across these disparate branches of code (through registers and stack, typically).

优化器很难消除这种分支/调用开销,特别是给定Swift关闭的灵活性(不是不可能,但在概念上非常困难)。 C ++优化器可以内联函数对象调用,但需要更多的限制和代码生成技术,编译器必须为 map 生成一整套全新的指令。传入的每种类型的函数对象(并且在程序员的明确帮助下指示用于代码生成的函数模板)。

That kind of branching/calling overhead is very difficult for the optimizer to eliminate, especially given the flexibility of Swift's closures (not impossible but conceptually quite difficult). C++ optimizers can inline function object calls but with far more restrictions and code generation techniques required to do it where the compiler would effectively have to generate a whole new set of instructions for map for each type of function object you pass in (and with explicit aid of the programmer indicating a function template used for the code generation).

所以它不应该很好很惊讶地发现你的手动循环可以更快地执行 - 它们给优化器带来了很大的压力。我看到有些人认为这些高阶函数应该能够更快,因为供应商可以做一些事情,比如并行化循环,但是为了有效地并行化循环,首先需要那种典型的信息。允许优化器将嵌套函数调用内联到它们变得像手动循环一样便宜的点。否则,传入的函数/闭包实现对 map / reduce 等函数实际上是不透明的:它们只能调用它并支付这样做的开销,而不能并行化它,因为它们无法假设副作用的性质和线程安全性。

So it shouldn't be of great surprise to find that your hand-rolled loops can perform faster -- they put a great deal of less strain on the optimizer. I have seen some people cite that these higher-order functions should be able to go faster as a result of the vendor being able to do things like parallelize the loop, but to effectively parallelize the loop would first require the kind of information that would typically allow the optimizer to inline the nested function calls within to a point where they become as cheap as the hand-rolled loops. Otherwise the function/closure implementation you pass in is going to be effectively opaque to functions like map/reduce: they can only call it and pay the overhead of doing so, and cannot parallelize it since they cannot assume anything about the nature of the side effects and thread-safety in doing so.

当然这都是概念性的 - Swift可能能够在将来优化这些案例,或者它现在可能已经能够这样做了(参见 -Ofast 作为一种常用的方法,使Swift以某些代价为代价更快安全)。但它确实给优化器带来了更大的压力,至少要在手动循环中使用这些类型的功能,而你在第一个基准测试中看到的时间差异似乎反映了一种可能的差异。期望这个额外的呼叫开销。最好的方法是查看程序集并尝试各种优化标记。

Of course this is all conceptual -- Swift may be able to optimize these cases in the future, or it may already be able to do so now (see -Ofast as a commonly-cited way to make Swift go faster at the cost of some safety). But it does place a heavier strain on the optimizer, at the very least, to use these kinds of functions over the hand-rolled loops, and the time differences you're seeing in the first benchmark seem to reflect the kind of differences one might expect with this additional calling overhead. Best way to find out is to look at the assembly and try various optimization flags.

标准函数

这不是为了阻止使用这些功能。他们更简洁地表达意图,他们可以提高生产力。依赖它们可以让您的代码库在未来版本的Swift中更快地完成,而不需要您的任何参与。但它们并不一定总是会更快 - 认为更直接表达你想要做的更高级别的库函数会更快,这是一个很好的一般规则,但总有一些例外。规则(但事后才最好发现事后有一个分析器,因为在信任方面比在这里不信任更容易犯错误。)

That's not to discourage the use of such functions. They do more concisely express intent, they can boost productivity. And relying on them could allow your codebase to get faster in future versions of Swift without any involvement on your part. But they aren't necessarily always going to be faster -- it is a good general rule to think that a higher-level library function that more directly expresses what you want to do is going to be faster, but there are always exceptions to the rule (but best discovered in hindsight with a profiler in hand since it's far better to err on the side of trust than distrust here).

人工基准

Artificial Benchmarks

至于你的第二个基准测试,几乎可以肯定,编译器优化了没有影响用户输出的副作用的代码。由于优化器为消除不相关的副作用(基本上不影响用户输出的副作用),人工基准测试具有众所周知的误导性倾向。因此,在构建基准测试时必须要小心,因为它们不是优化器的结果,只是跳过了您实际想要进行基准测试的所有工作。至少,您希望测试输出从计算中收集的最终结果。

As for your second benchmark, it is almost certainly a result of the compiler optimizing away code that has no side effects that affect user output. Artificial benchmarks have a tendency to be notoriously misleading as a result of what optimizers do to eliminate irrelevant side effects (side effects that don't affect user output, essentially). So you have to be careful there when constructing benchmarks with times that seem too good to be true that they aren't the result of the optimizer merely skipping all the work you actually wanted to benchmark. At the very least, you want your tests to output some final result gathered from the computation.

这篇关于Swift性能:map()和reduce()vs for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆