如果目标是x64,为什么Seq.iter比for loop快2倍? [英] why Seq.iter is 2x faster than for loop if target is for x64?

查看:54
本文介绍了如果目标是x64,为什么Seq.iter比for loop快2倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明:这是微基准测试,如果您对此主题不满意,请不要评论诸如过早优化是邪恶的"之类的引用.

Disclaim: This is micro-benchmark, please do not comment quotes such as "premature optimization is evil" if you feel unhappy about the topic.

示例针对x64,.Net4.5 Visual Studio 2012 F#3.0发行,并在Windows 7 x64中运行

Examples are release targeted for x64, .Net4.5 Visual Studio 2012 F# 3.0 and run in windows 7 x64

分析后,我缩小了我的一个应用程序的瓶颈,因此我想提出这个问题:

After profiling, I narrowed down the bottleneck of one of my applications, so that I want to raise this question:

如果在for in循环或Seq.iter内没有循环,则很显然它们的速度相似. (update2与update4)

If there is no loop inside for in loop or Seq.iter, then it is clear they are both of similar speed. (update2 vs update4)

如果在for in循环或Seq.iter内有一个循环,则看来Seq.iter的速度是for in的2倍. (update vs update3)奇怪吗? (如果以fsi运行,它们将是相似的)

If there is a loop inside for in loop or Seq.iter, it seems Seq.iter is 2x as faster as for in. (update vs update3) strange? (if run in fsi they would be similar)

如果它是针对anycpu并在x64中运行的,则时间没有差异.因此问题就变成了:如果目标是x64, Seq.iter(update3)将提高2倍的速度

If it is targeted for anycpu and run in x64, there is no difference in time. So the question becomes: Seq.iter (update3) would boost up 2x speed if target is x64

update:   00:00:11.4250483 // 2x as much as update3, why?
updatae2: 00:00:01.4447233
updatae3: 00:00:06.0863791
updatae4: 00:00:01.4939535

源代码:

open System.Diagnostics
open System

[<EntryPoint>]
let main argv = 
    let pool = seq {1 .. 1000000}

    let ret = Array.zeroCreate 100

    let update pool =
        for x in pool do
            for y in 1 .. 200 do
                ret.[2] <- x + y

    let update2 pool =
        for x in pool do
            //for y in 1 .. 100 do
                ret.[2] <- x


    let update3 pool =
        pool
            |> Seq.iter (fun x ->
                                  for y in 1 .. 200 do
                                      ret.[2] <- x + y)

    let update4 pool =
        pool
            |> Seq.iter (fun x ->
                                  //for y in 1 .. 100 do
                                      ret.[2] <- x)


    let test n =
        let run = match n with
                  | 1 -> update
                  | 2 -> update2
                  | 3 -> update3
                  | 4 -> update4
        for i in 1 .. 50 do
            run pool

    let sw = new Stopwatch()
    sw.Start()
    test(1)
    sw.Stop()
    Console.WriteLine(sw.Elapsed);

    sw.Restart()
    test(2)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)

    sw.Restart()
    test(3)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)

    sw.Restart()
    test(4)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)
    0 // return an integer exit code

推荐答案

这不是一个完整的答案,但希望它能帮助您走得更远.

我可以使用相同的配置来重现该行为.这是一个更简单的剖析示例:

I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:

open System

let test1() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    for x in pool do
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y

let test2() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    Seq.iter (fun x -> 
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y) pool

let time f =
    let sw = new Diagnostics.Stopwatch()
    sw.Start()
    let result = f() 
    sw.Stop()
    Console.WriteLine(sw.Elapsed)
    result

[<EntryPoint>]
let main argv =
    time test1
    time test2
    0

在此示例中,Seq.iterfor x in pool仅执行一次,但test1test2之间仍然存在2倍的时间差:

In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:

00:00:06.9264843
00:00:03.6834886

它们的IL非常相似,因此编译器优化不是问题.似乎x64抖动无法优化test1,尽管它可以使用test2进行优化.有趣的是,如果我将test1中的嵌套嵌套重构为函数,则JIT优化将再次成功:

Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:

let body (ret: _ []) x =
    for _ in 1..50 do
        for y in 1..200 do
            ret.[2] <- x + y

let test3() =
    let ret = Array.zeroCreate 100
    let pool = {1..1000000}    
    for x in pool do
        body ret x

// 00:00:03.7012302

当我使用此处所述的技术禁用JIT优化时,,这些功能的执行时间是可比的.

When I disable JIT optimization using the technique described here, execution times of these functions are comparable.

在特定示例中,为什么x64抖动失败,我不知道.您可以反汇编优化的jitted代码进行比较ASM指令逐行显示.也许具有ASM知识的人可以发现他们之间的差异.

Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.

这篇关于如果目标是x64,为什么Seq.iter比for loop快2倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆