Haskell FFI / C的性能考虑? [英] Performance considerations of Haskell FFI / C?

查看:109
本文介绍了Haskell FFI / C的性能考虑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果使用Haskell作为从从我的C程序中调用,那么调用它会对性能产生什么影响?例如,如果我有一个问题世界数据集说20kB的数据,并且我想运行类似于:

  //通过我的1000名演员,并让他们根据
// HaskellCode()函数做出决定,该函数是编译好的Haskell,我通过
// FFI访问。作为一个参数,发送相同的20kB数据给这些
//函数调用的每一个,以及一些演员特定的数据
// 20kB的常量数据定义了环境以及演员特定的
//数据可能是他们的个性或状态
for(i = 0; i <1000; i ++)
actor [i] .decision = HaskellCode(这里有20kB数据,actor [i]。personalityality );

这里会发生什么事情 - 我能否将这20kB的数据保存为一个全球不可变的引用,可以通过Haskell代码访问,或者每次都必须创建一个数据的副本?



值得关注的是,这些数据可能更大,更大 - 我还希望编写能够处理更多数据集的算法,使用Haskell代码的几次调用所使用的相同模式的不可变数据。



另外,我想对其进行并行化,就像dispatch_apply()GCD或Parallel.ForEach(..)C#。我在Haskell之外进行并行化的基本原理是,我知道我总是在许多单独的函数调用上操作,即1000个参与者,所以在Haskell函数内部使用细粒度并行化并不比在C级别进行管理更好。运行FFI Haskell实例的线程安全,我该如何实现这一点 - 每次启动并行运行时,是否需要初始化一个Haskell实例? (看起来很慢,如果我必须......)我如何实现这个良好的性能?

解决方案


调用它对性能的影响是什么


假设您只启动一次Haskell运行时(像这样),在我的机器上,从C调用Haskell进行函数调用,跨越边界传递Int,需要大约80,000个周期(我的Core 2上的 31,000 ns ) - 通过 rdstc 寄存器

实验确定


是否有可能将20kB的数据保存为Haskell代码访问的全局不可变引用

blockquote>

是的,那肯定是可以的。如果数据真的是不可变的,那么无论你:


  • 通过编组在语言边界之间来回传递数据;

  • 传递来自数据的引用;
  • 或将其缓存在 IORef


哪种策略最好?它取决于数据类型。最习惯的方式是将来自C数据的引用传递给它,将其视为 ByteString Vector 在Haskell方面。


我想并行化这个


我强烈建议颠倒控制,然后从Haskell运行时进行并行化 - 它将更加健壮,因为该路径已经过严格测试。

关于线程安全性,对同一运行时运行的外部导出的函数进行并行调用显然是安全的 - 虽然相当确信没有人为了获得并行性而尝试过。调用获取一个能力,这本质上是一个锁,所以多个调用可能会阻止,从而减少了并行的机会。在多核的情况下(例如 -N4 左右),您的结果可能会有所不同(具备多种功能),但这几乎肯定是提高性能的不好方法。 / p>

再次,通过 forkIO 从Haskell进行多个并行函数调用是一个更好的记录,更好的测试路径,而不是在C端进行工作,最终的代码可能会少一些。



只需调用Haskell函数即可完成并行Haskell线程。简单!

If using Haskell as a library being called from my C program, what is the performance impact of making calls in to it? For instance if I have a problem world data set of say 20kB of data, and I want to run something like:

// Go through my 1000 actors and have them make a decision based on
// HaskellCode() function, which is compiled Haskell I'm accessing through
// the FFI.  As an argument, send in the SAME 20kB of data to EACH of these
// function calls, and some actor specific data
// The 20kB constant data defines the environment and the actor specific
// data could be their personality or state
for(i = 0; i < 1000; i++)
   actor[i].decision = HaskellCode(20kB of data here, actor[i].personality);

What's going to happen here - is it going to be possible for me to keep that 20kB of data as a global immutable reference somewhere that is accessed by the Haskell code, or must I create a copy of that data each time through?

The concern is that this data could be larger, much larger - I also hope to write algorithms that act on much larger sets of data, using the same pattern of immutable data being used by several calls of the Haskell code.

Also, I'd like to parallelize this, like a dispatch_apply() GCD or Parallel.ForEach(..) C#. My rationale for parallelization outside of Haskell is that I know I will always be operating on many separate function calls i.e. 1000 actors, so using fine-grained parallelization inside Haskell function is no better than managing it at the C level. Is running FFI Haskell instances 'Thread Safe' and how do I achieve this - do I need to initialize a Haskell instance every time I kick off a parallel run? (Seems slow if I must..) How do I achieve this with good performance?

解决方案

what is the performance impact of making calls in to it

Assuming you start the Haskell runtime up only once (like this), on my machine, making a function call from C into Haskell, passing an Int back and forth across the boundary, takes about 80,000 cycles (31,000 ns on my Core 2) -- determined experimentally via the rdstc register

is it going to be possible for me to keep that 20kB of data as a global immutable reference somewhere that is accessed by the Haskell code

Yes, that is certainly possible. If the data really is immutable, then you get the same result whether you:

  • thread the data back and forth across the language boundary by marshalling;
  • pass a reference to the data back and forth;
  • or cache it in an IORef on the Haskell side.

Which strategy is best? It depends on the data type. The most idiomatic way would be to pass a reference to the C data back and forth, treating it as a ByteString or Vector on the Haskell side.

I'd like to parallelize this

I'd strongly recommend inverting the control then, and doing the parallelization from the Haskell runtime -- it'll be much more robust, as that path has been heavily tested.

Regarding thread safety, it is apparently safe to make parallel calls to foreign exported functions running in the same runtime -- though fairly sure no one has tried this in order to gain parallelism. Calls in acquire a capability, which is essentially a lock, so multiple calls may block, reducing your chances for parallelism. In the multicore case (e.g. -N4 or so) your results may be different (multiple capabilities are available), however, this is almost certainly a bad way to improve performance.

Again, making many parallel functions calls from Haskell via forkIO is a better documented, better tested path, with less overhead than doing the work on the C side, and probably less code in the end.

Just make a call into your Haskell function, that in turn will do the parallelism via many Haskell threads. Easy!

这篇关于Haskell FFI / C的性能考虑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆