F#PSeq.iter似乎并未使用所有内核 [英] F# PSeq.iter does not seem to be using all cores

查看:103
本文介绍了F#PSeq.iter似乎并未使用所有内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在F#中进行一些计算密集型工作.使用.Net Task并行库的Array.Parallel.map之类的函数使我的代码成倍地加速了我的代码.

I've been doing some computationally intensive work in F#. Functions like Array.Parallel.map which use the .Net Task Parallel Library have sped up my code exponentially for a really quite minimal effort.

但是,由于内存问题,我重新制作了一段代码,以便可以在序列表达式中懒惰地对其求值(这意味着我必须存储和传递较少的信息).当需要评估时,我使用了:

However, due to memory concerns, I remade a section of my code so that it can be lazily evaluated inside a sequence expression (this means I have to store and pass less information). When it came time to evaluate I used:

// processor and memory intensive task, results are not stored
let calculations : seq<Calculation> =  seq { ...yield one thing at a time... }

// extract results from calculations for summary data
PSeq.iter someFuncToExtractResults results

代替:

// processor and memory intensive task, storing these results is an unnecessary task
let calculations : Calculation[] = ...do all the things...

// extract results from calculations for summary data
Array.Parallel.map someFuncToExtractResults calculations 

使用任何Array.Parallel函数时,我可以清楚地看到计算机上的所有内核都已投入使用(约100%的CPU使用率).但是,所需的额外内存意味着该程序永远不会完成.

When using any of the Array.Parallel functions I can clearly see all the cores on my computer kick into gear (~100% CPU usage). However the extra memory required means the program never finished.

使用PSeq.iter版本运行程序时,CPU使用率仅为8%(而RAM使用率则最低).

With the PSeq.iter version when I run the program, there's only about 8% CPU usage (and minimal RAM usage).

因此:PSeq版本运行这么慢有什么原因吗?是因为懒惰的评价吗?我缺少一些不可思议的平行"东西吗?

So: Is there some reason why the PSeq version runs so much slower? Is it because of the lazy evaluation? Is there some magic "be parallel" stuff I am missing?

谢谢

其他资源,两者的源代码实现(它们似乎在.NET中使用不同的并行库):

Other resources, source code implementations of both (they seem to use different Parallel libraries in .NET):

https://github.com/fsharp/fsharp/blob /master/src/fsharp/FSharp.Core/array.fs

https://github.com/fsharp/powerpack /blob/master/src/FSharp.PowerPack.Parallel.Seq/pseq.fs

在代码示例和详细信息中添加了更多详细信息

代码:

  • 序列

  • Seq

// processor and memory intensive task, results are not stored
let calculations : seq<Calculation> =  
    seq { 
        for index in 0..data.length-1 do
            yield calculationFunc data.[index]
    }

// extract results from calculations for summary data (different module)
PSeq.iter someFuncToExtractResults results

  • 数组

  • Array

    // processor and memory intensive task, storing these results is an unnecessary task
    let calculations : Calculation[] =
        Array.Parallel.map calculationFunc data
    
    // extract results from calculations for summary data (different module)
    Array.Parallel.map someFuncToExtractResults calculations 
    

  • 详细信息:

    • 存储中间阵列版本可以在不到10分钟的时间内快速运行(至崩溃前的速度),但在崩溃前会使用约70GB的RAM(64GB物理内存,其余页面已分页)
    • seq版本需要花费34分钟以上的时间,并且仅使用RAM的一小部分(仅约30GB)
    • 我正在计算约十亿个值.因此,十亿倍(每个64位元)= 7.4505806GB.数据的形式更加复杂...我正在清理一些不必要的副本,因此当前大量的RAM使用情况.
    • 是的,架构不是很好,懒惰的评估是我尝试优化程序和/或将数据分批处理成较小块的第一部分
    • 对于较小的数据集,两个代码块都输出相同的结果.
    • @pad,我尝试了您的建议,将PSeq.iter送入Calculation []时似乎可以正常工作(所有内核都处于活动状态),但是仍然存在RAM问题(最终崩溃)
    • 代码的摘要部分和计算部分都占用大量CPU(主要是因为数据集很大)
    • 对于Seq版本,我的目标是并行化一次

    推荐答案

    根据您的最新信息,我将答案简化为相关部分.您只需要这个,而不是当前拥有的东西:

    Based on your updated information, I'm shortening my answer to just the relevant part. You just need this instead of what you currently have:

    let result = data |> PSeq.map (calculationFunc >> someFuncToExtractResults)
    

    无论您使用PSeq.map还是Array.Parallel.map,这都将起作用.

    And this will work the same whether you use PSeq.map or Array.Parallel.map.

    但是,您真正的问题不会得到解决.这个问题可以说是:当达到100%的CPU使用率所需的并行工作程度时,没有足够的内存来支持进程.

    However, your real problem is not going to be solved. This problem can be stated as: when the desired degree of parallel work is reached in order to get to 100% CPU usage, there is not enough memory to support the processes.

    您能看到这将如何解决吗?您可以顺序处理事物(CPU效率较低,但内存效率较低),也可以并行处理事物(CPU效率更高,但内存不足).

    Can you see how this will not be solved? You can either process things sequentially (less CPU efficient, but memory efficient) or you can process things in parallel (more CPU efficient, but runs out of memory).

    然后的选项是:

    1. 将这些函数使用的并行度更改为不会消耗您内存的内容:

    1. Change the degree of parallelism to be used by these functions to something that won't blow your memory:

    let result = data 
                 |> PSeq.withDegreeOfParallelism 2 
                 |> PSeq.map (calculationFunc >> someFuncToExtractResults)
    

  • 更改calculationFunc >> someFuncToExtractResults的基础逻辑,以便它是一个更有效的单一函数,并将数据流式传输到结果.在不了解更多细节的情况下,要看到如何完成并非易事.但在内部,肯定可以进行一些延迟加载.

  • Change the underlying logic for calculationFunc >> someFuncToExtractResults so that it is a single function that is more efficient and streams data through to results. Without knowing more detail, it's not simple to see how this could be done. But internally, certainly some lazy loading may be possible.

    这篇关于F#PSeq.iter似乎并未使用所有内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆