如果目标是x64,为什么Seq.iter比for loop快2倍? [英] why Seq.iter is 2x faster than for loop if target is for x64?
问题描述
免责声明:这是微基准测试,如果您对此主题不满意,请不要评论诸如过早优化是邪恶的"之类的引用.
Disclaim: This is micro-benchmark, please do not comment quotes such as "premature optimization is evil" if you feel unhappy about the topic.
示例针对x64,.Net4.5 Visual Studio 2012 F#3.0发行,并在Windows 7 x64中运行
Examples are release targeted for x64, .Net4.5 Visual Studio 2012 F# 3.0 and run in windows 7 x64
分析后,我缩小了我的一个应用程序的瓶颈,因此我想提出这个问题:
After profiling, I narrowed down the bottleneck of one of my applications, so that I want to raise this question:
如果在for in
循环或Seq.iter
内没有循环,则很显然它们的速度相似. (update2与update4)
If there is no loop inside for in
loop or Seq.iter
, then it is clear they are both of similar speed. (update2 vs update4)
如果在for in
循环或Seq.iter
内有一个循环,则看来Seq.iter
的速度是for in
的2倍. (update vs update3)奇怪吗? (如果以fsi运行,它们将是相似的)
If there is a loop inside for in
loop or Seq.iter
, it seems Seq.iter
is 2x as faster as for in
. (update vs update3) strange? (if run in fsi they would be similar)
如果它是针对anycpu并在x64中运行的,则时间没有差异.因此问题就变成了:如果目标是x64, Seq.iter(update3)将提高2倍的速度
If it is targeted for anycpu and run in x64, there is no difference in time. So the question becomes: Seq.iter (update3) would boost up 2x speed if target is x64
update: 00:00:11.4250483 // 2x as much as update3, why?
updatae2: 00:00:01.4447233
updatae3: 00:00:06.0863791
updatae4: 00:00:01.4939535
源代码:
open System.Diagnostics
open System
[<EntryPoint>]
let main argv =
let pool = seq {1 .. 1000000}
let ret = Array.zeroCreate 100
let update pool =
for x in pool do
for y in 1 .. 200 do
ret.[2] <- x + y
let update2 pool =
for x in pool do
//for y in 1 .. 100 do
ret.[2] <- x
let update3 pool =
pool
|> Seq.iter (fun x ->
for y in 1 .. 200 do
ret.[2] <- x + y)
let update4 pool =
pool
|> Seq.iter (fun x ->
//for y in 1 .. 100 do
ret.[2] <- x)
let test n =
let run = match n with
| 1 -> update
| 2 -> update2
| 3 -> update3
| 4 -> update4
for i in 1 .. 50 do
run pool
let sw = new Stopwatch()
sw.Start()
test(1)
sw.Stop()
Console.WriteLine(sw.Elapsed);
sw.Restart()
test(2)
sw.Stop()
Console.WriteLine(sw.Elapsed)
sw.Restart()
test(3)
sw.Stop()
Console.WriteLine(sw.Elapsed)
sw.Restart()
test(4)
sw.Stop()
Console.WriteLine(sw.Elapsed)
0 // return an integer exit code
推荐答案
这不是一个完整的答案,但希望它能帮助您走得更远.
我可以使用相同的配置来重现该行为.这是一个更简单的剖析示例:
I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:
open System
let test1() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
for x in pool do
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test2() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
Seq.iter (fun x ->
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y) pool
let time f =
let sw = new Diagnostics.Stopwatch()
sw.Start()
let result = f()
sw.Stop()
Console.WriteLine(sw.Elapsed)
result
[<EntryPoint>]
let main argv =
time test1
time test2
0
在此示例中,Seq.iter
和for x in pool
仅执行一次,但test1
和test2
之间仍然存在2倍的时间差:
In this example, Seq.iter
and for x in pool
is executed once but there is still 2x time difference between test1
and test2
:
00:00:06.9264843
00:00:03.6834886
它们的IL非常相似,因此编译器优化不是问题.似乎x64抖动无法优化test1
,尽管它可以使用test2
进行优化.有趣的是,如果我将test1
中的嵌套嵌套重构为函数,则JIT优化将再次成功:
Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1
though it is able to do so with test2
. Interestingly, if I refactor nested for loops in test1
as a function, JIT optimization succeeds again:
let body (ret: _ []) x =
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test3() =
let ret = Array.zeroCreate 100
let pool = {1..1000000}
for x in pool do
body ret x
// 00:00:03.7012302
当我使用此处所述的技术禁用JIT优化时,,这些功能的执行时间是可比的.
When I disable JIT optimization using the technique described here, execution times of these functions are comparable.
在特定示例中,为什么x64抖动失败,我不知道.您可以反汇编优化的jitted代码进行比较ASM指令逐行显示.也许具有ASM知识的人可以发现他们之间的差异.
Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.
这篇关于如果目标是x64,为什么Seq.iter比for loop快2倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!