使用OpenMP循环时的线程安全 [英] Thread safety while looping with OpenMP

查看:143
本文介绍了使用OpenMP循环时的线程安全的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用C ++和GMP开发小型 Collat​​z猜想计算器,试图使用OpenMP在其上实现并行性,但是我遇到了有关线程安全性的问题.就目前而言,尝试运行代码将产生以下结果:

I'm working on a small Collatz conjecture calculator using C++ and GMP, and I'm trying to implement parallelism on it using OpenMP, but I'm coming across issues regarding thread safety. As it stands, attempting to run the code will yield this:

*** Error in `./collatz': double free or corruption (fasttop): 0x0000000001140c40 ***
*** Error in `./collatz': double free or corruption (fasttop): 0x00007f4d200008c0 ***
[1]    28163 abort (core dumped)  ./collatz

这是再现行为的代码.

 #include <iostream>
 #include <gmpxx.h>

 mpz_class collatz(mpz_class n) {
     if (mpz_odd_p(n.get_mpz_t())) {
         n *= 3;
         n += 1;
     } else {
         n /= 2;
     }
     return n;
 }

 int main() {
     mpz_class x = 1;
 #pragma  omp parallel
     while (true) {
         //std::cout << x.get_str(10);
         while (true) {
             if (mpz_cmp_ui(x.get_mpz_t(), 1)) break;
             x = collatz(x);
         }
         x++;
         //std::cout << " OK" << std::endl;
     }
 }

鉴于我在取消注释输出到屏幕时并没有收到此错误,这很慢,我认为手头的问题与线程安全有关,尤其是与试图在线程上增加x的并发线程有关.同时.

Given that I did not get this error when I uncomment the outputs to screen, which are slow, I assume the issue at hand has to do with thread safety, and in particular with concurrent threads trying to increment x at the same time.

我的假设是否正确?如何解决此问题并使其安全运行?

Am I correct in my assumptions? How can I fix this and make it safe to run?

推荐答案

我假设您要检查的是collat​​z猜想是否对所有数字都成立.您发布的程序在串行和并行的许多级别上都是错误的.

I assume what you want to do is to check if the collatz conjecture holds for all numbers. The program you posted is wrong on many levels both serially and in parallel.

if (mpz_cmp_ui(x.get_mpz_t(), 1)) break;

表示在x != 1时它将断开.如果用正确的0 == mpz_cmp_ui替换它,代码将继续反复测试2.无论如何,您都必须具有两个变量,一个用于表示要检查的内容的外部循环,另一个用于执行检查的内部循环.如果为此创建一个函数,则更容易实现此目的:

Means that it will break when x != 1. If you replace it with the correct 0 == mpz_cmp_ui, the code will just continue to test 2 over and over again. You have to have two variables anyway, one for the outer loop that represents what you want to check, and one for the inner loop performing the check. It's easier to get this right if you make a function for that:

void check_collatz(mpz_class n) {
    while (n != 1) {
        n = collatz(n);
    }
}

int main() {
    mpz_class x = 1;
    while (true) {
        std::cout << x.get_str(10);
        check_collatz(x);
        x++;
    }
}

while (true)循环很难推理和并行化,所以让我们做一个等效的for循环:

The while (true) loop is bad to reason about and parallelize, so let's just make an equivalent for loop:

for (mpz_class x = 1;; x++) {
    check_collatz(x);
}

现在,我们可以讨论并行化代码. OpenMP并行化的基础是工作共享结构.您不能只是在while循环上拍打#pragma omp parallel.幸运的是,您可以使用#pragma omp parallel for轻松标记某些规范的for循环.但是,为此,您不能将mpz_class用作循环变量,而必须指定循环的结尾:

Now, we can talk about parallelizing the code. The basis for OpenMP parallelizing is a worksharing construct. You cannot just slap #pragma omp parallel on a while loop. Fortunately you can easily mark certain canonical for loops with #pragma omp parallel for. For that, however, you cannot use mpz_class as a loop variable, and you must specify an end for the loop:

#pragma omp parallel for
for (long check = 1; check <= std::numeric_limits<long>::max(); check++)
{
    check_collatz(check);
}

请注意,check是隐式私有的,在其上工作的每个线程都有一个副本. OpenMP还将负责在线程之间分配工作[1 ... 2 ^ 63].当线程调用check_collatz时,将为其创建一个新的私有mpz_class对象.

Note that check is implicitly private, there is a copy for each thread working on it. Also OpenMP will take care of distributing the work [1 ... 2^63] among threads. When a thread calls check_collatz a new, private, mpz_class object will be created for it.

现在,您可能会注意到,在每次循环迭代中重复创建一个新的mpz_class对象的成本很高(内存分配).您可以重新使用它(再次破坏check_collatz)并创建一个线程专用的mpz_class工作对象.为此,您将化合物parallel for拆分为单独的parallelfor编译指示:

Now, you might notice, that repeatedly creating a new mpz_class object in each loop iteration is costly (memory allocation). You can reuse that (by breaking check_collatz again) and creating a thread-private mpz_class working object. For this, you split the compound parallel for into separate parallel and for pragmas:

#include <gmpxx.h>
#include <iostream>
#include <limits>

// Avoid copying objects by taking and modifying a reference
void collatz(mpz_class& n)
{
    if (mpz_odd_p(n.get_mpz_t()))
    {
        n *= 3;
        n += 1;
    }
    else
    {
        n /= 2;
    }
}

int main()
{
#pragma omp parallel
    {
        mpz_class x;
#pragma omp for
        for (long check = 1; check <= std::numeric_limits<long>::max(); check++)
        {
            // Note: The structure of this fits perfectly in a for loop.
            for (x = check; x != 1; collatz(x));
        }
    }
}

请注意,在并行区域中声明x将确保它是隐式私有的且已正确初始化.您应该更喜欢在外部声明它并标记它private.这通常会导致混乱,因为来自外部作用域的private变量明确地是统一的.

Note that declaring x in the parallel region will make sure it is implicitly private and properly initialized. You should prefer that to declaring it outside and marking it private. This will often lead to confusion because explicitly private variables from outside scope are unitialized.

您可能会抱怨这只检查了前2 ^ 63个数字.只是让它运行.这使您有足够的时间将OpenMP掌握到专家级别,并为GMP对象编写自己的自定义工作共享.

You might complain that this only checks the first 2^63 numbers. Just let it run. This gives you enough time to master OpenMP to expert level and write your own custom worksharing for GMP objects.

您担心每个线程都有额外的对象.这是必不可少的,以获得良好的性能.您无法使用锁/关键部分/原子来有效地解决此问题.您将必须保护每个读写到您唯一的相关变量.不会再有并行性了.

You were concerned about having extra objects for each thread. This is essential for good performance. You cannot solve this efficiently with locks/critical sections/atomics. You would have to protect each and every read and write to your only relevant variable. There would be no parallelism left.

注意:巨大的for循环可能会导致负载不平衡.因此,某些线程可能比其他线程提前几个世纪完成.您可以通过动态调度或较小的静态块来解决此问题.

Note: The huge for loop will likely have a load imbalance. So some threads will probably finish a few centuries earlier than the others. You could fix that with dynamic scheduling, or smaller static chunks.

出于学术考虑,这是一个如何在GMP对象上直接实现工作共享的想法:

For academic sake, here is one idea how to implement the worksharing directly on GMP objects:

#pragma omp parallel
    {
        // Note this is not a "parallel" loop
        // these are just separate loops on distinct strided 
        int nthreads = omp_num_threads();
        mpz_class check = 1;
        // we already checked those in the other program
        check += std::numeric_limits<long>::max(); 
        check += omp_get_thread_num();
        mpz_class x;
        for (; ; check += nthreads)
        {
            // Note: The structure of this fits perfectly in a for loop.
            for (x = check; x != 1; collatz(x));
        }
    }

这篇关于使用OpenMP循环时的线程安全的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆