使用OpenMP创建FFTW计划 [英] FFTW plan creation using OpenMP

查看:496
本文介绍了使用OpenMP创建FFTW计划的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试并行执行几个FFT.我正在使用FFTW和OpenMP.每个FFT都是不同的,因此我不依赖FFTW的内置多线程(我知道使用OpenMP).

I am trying to perform several FFT's in parallel. I am using FFTW and OpenMP. Each FFT is different, so I'm not relying on FFTW's build-in multithreading (which I know uses OpenMP).

int m;

// assume:
// int numberOfColumns = 100;
// int numberOfRows = 100;

#pragma omp parallel for default(none) private(m) shared(numberOfColumns, numberOfRows)//  num_threads(4)
    for(m = 0; m < 36; m++){

        // create pointers
        double          *inputTest;
        fftw_complex    *outputTest;
        fftw_plan       testPlan;

        // preallocate vectors for FFTW
         outputTest = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*numberOfRows*numberOfColumns);
         inputTest  = (double *)fftw_malloc(sizeof(double)*numberOfRows*numberOfColumns);

         // confirm that preallocation worked
         if (inputTest == NULL || outputTest == NULL){
             logger_.log_error("\t\t FFTW memory not allocated on m = %i", m);
         }

         // EDIT: insert data into inputTest
         inputTest = someDataSpecificToThisIteration(m); // same size for all m

        // create FFTW plan
        #pragma omp critical (make_plan)
        {
            testPlan = fftw_plan_dft_r2c_2d(numberOfRows, numberOfColumns, inputTest, outputTest, FFTW_ESTIMATE);
        }

         // confirm that plan was created correctly
         if (testPlan == NULL){
             logger_.log_error("\t\t failed to create plan on m = %i", m);
         }

        // execute plan
         fftw_execute(testPlan);

        // clean up
         fftw_free(inputTest);
         fftw_free(outputTest);
         fftw_destroy_plan(testPlan);

    }// end parallelized for loop

一切正常.但是,如果我从创建计划的周围删除关键构造(fftw_plan_dft_r2c_2d),我的代码将失败.有人可以解释为什么吗? fftw_plan_dft_r2c_2d并不是真正的孤儿",对吗?是因为两个线程可能都试图同时击 numberOfRows numberOfColumns 内存位置?

This all works fine. However, if I remove the critical construct from around the plan creation (fftw_plan_dft_r2c_2d) my code will fail. Can someone explain why? fftw_plan_dft_r2c_2d isn't really an "orphan", right? Is it because two threads might both try to hit the numberOfRows or numberOfColumns memory location at the same time?

推荐答案

这几乎全部写在FFTW文档中,有关线程安全:

It's pretty much all written in the FFTW documentation about thread safety:

...但是必须小心,因为计划程序的例程在调用和计划之间共享数据(例如,智慧表和三角表).

... but some care must be taken because the planner routines share data (e.g. wisdom and trigonometric tables) between calls and plans.

结果是FFTW中唯一的线程安全(重入)例程是fftw_execute(及其新数组变体).所有其他例程(例如,计划程序)只能一次从一个线程调用.因此,例如,您可以将信号量锁包装在对计划程序的所有调用周围;甚至更简单地,您可以只从一个线程创建所有计划.我们认为这不是一个重要的限制(FFTW是为仅对性能敏感的代码是转换的实际执行情况而设计的),并且计划之间共享数据的好处很大.

The upshot is that the only thread-safe (re-entrant) routine in FFTW is fftw_execute (and the new-array variants thereof). All other routines (e.g. the planner) should only be called from one thread at a time. So, for example, you can wrap a semaphore lock around any calls to the planner; even more simply, you can just create all of your plans from one thread. We do not think this should be an important restriction (FFTW is designed for the situation where the only performance-sensitive code is the actual execution of the transform), and the benefits of shared data between plans are great.

在典型的FFT应用中,很少构建FFT计划,因此,是否必须同步其创建并不重要.在您的情况下,除非数据的维度发生变化,否则您无需在每次迭代时都创建新的计划.您希望执行以下操作:

In a typical application of FFT plans are constructed seldom, so it doesn't really matter if you have to synchronise their creation. In your case you don't need to create a new plan at each iteration, unless the dimension of the data changes. You would rather do the following:

#pragma omp parallel default(none) private(m) shared(numberOfColumns, numberOfRows)
{
   // create pointers
   double          *inputTest;
   fftw_complex    *outputTest;
   fftw_plan       testPlan;

   // preallocate vectors for FFTW
   outputTest = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*numberOfRows*numberOfColumns);
   inputTest  = (double *)fftw_malloc(sizeof(double)*numberOfRows*numberOfColumns);

   // confirm that preallocation worked
   if (inputTest == NULL || outputTest == NULL){
      logger_.log_error("\t\t FFTW memory not allocated on m = %i", m);
   }

   // create FFTW plan
   #pragma omp critical (make_plan)
   testPlan = fftw_plan_dft_r2c_2d(numberOfRows, numberOfColumns, inputTest, outputTest, FFTW_ESTIMATE);

   #pragma omp for
   for (m = 0; m < 36; m++) {
      // execute plan
      fftw_execute(testPlan);
   }

   // clean up
   fftw_free(inputTest);
   fftw_free(outputTest);
   fftw_destroy_plan(testPlan);
}

现在,在每个线程中仅创建一次计划,并且每次执行fftw_execute()时序列化开销都会减少.如果在NUMA系统(例如多插槽AMD64或Intel(后)Nehalem系统)上运行,则应启用线程绑定以实现最佳性能.

Now the plans are created only once in each thread and the serialisation overhead would diminish with each execution of fftw_execute(). If running on a NUMA system (e.g. a multi-socket AMD64 or Intel (post-)Nehalem system), then you should enable thread binding in order to achieve maximum performance.

这篇关于使用OpenMP创建FFTW计划的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆