OpenMP 在结束 C 程序之前不等待所有线程完成 [英] OpenMP not waiting all threads finish before end C program

查看:152
本文介绍了OpenMP 在结束 C 程序之前不等待所有线程完成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题:我的 C 程序必须计算文本文件中单词列表的出现次数.

I have the following problem: My C program must count the number of occurrences of a list of words in a text file.

我为此使用了 OpenMP,并且该程序在理论上具有正确的逻辑.当我将一些 printfs 放在 For Loop 中时,程序的结果是正确的并且始终相同.

I use OpenMP for this, and the program, in theory, has the correct logic. When I put some printfs inside a For Loop the result of the program is correct and always the same.

当我删除 printfs 时,结果不正确,并且每次执行时其值都会发生变化.鉴于这种情况,我认为原因与执行时间有关.使用 printfs 会增加执行时间,因此有时间完成对所有线程的计数并使程序正常工作.如果没有 printfs,执行时间会呈指数级减少 (0.000893 ms),没有时间完成所有线程/计算,因此程序会为每次执行打印不同的结果.

When I remove printfs the result is incorrect, and with each execution its value changes. Given this scenario I think the reason is related to the execution time. With printfs the execution time is increased, so there is time to finish counting all threads and the program to work correctly. Without prinfts, the execution time decreases exponentially (0.000893 ms), leaving no time to finish all threads / calculations and for this reason the program prints a different result for each execution.

并行化代码如下:

#pragma omp parallel for schedule(dynamic) num_threads(threadNumber) private(word, wordExists) shared(keyWordsOcurrences)
          for (line = 0; line < NUM_LINES; line++)
            {
                // divides the line into words separated by space
                word = strtok(lines[line], " ");
                while (word != NULL)
                {
                    // checks if the word being read is one of the monitored words
                    wordExists = checkWordOcurrences(word);
                    if (wordExists)
                    {
                        #pragma omp critical
                        keyWordsOcurrences[wordExists - 1] += 1;
                    }
                    word = strtok(NULL, " ");
                }
            }

调用的 checkWordOcurrences 函数是我放置 printf 负责使我的代码在每次执行中都能正常工作的地方(增加执行时间).

The checkWordOcurrences function called is where I put the printf responsible to make my code work properly in every execution (increasing execution time).

int checkWordOcurrences(char *word)
{
    int res = 0;
    int i;

    for (i = 0; i < QTD_WORDS; i++)
    {
        // **this is the almighty Printf that makes everything work properly, and without it things stop working :(**
        printf("palavra %d %s - palavra 2 %s \n", i, keyWords[i], word);
        // compares current word with monitored words
        if (!strcmp(keyWords[i], word))
        {
            // if it's monitored word, returns its index (+1 because the first word has index 0 and the return type is checked as true or false)
            res = i + 1;
        }
    }

    // returns word index or 0, if current word is not monitored
    return res;
}

有人可以向我解释可能会发生什么和/或如何解决吗?

Can someone explain to me what may be happening and / or how to solve it?

推荐答案

在 OpenMP for 构造的末尾和每个并行区域的末尾有一个隐式屏障,所以它不是程序可能在所有线程完成分配的工作之前完成.

There is an implicit barrier at the end of the OpenMP for construct and at the end of each parallel region, so it is not possible for the program to finish before all the threads have finished their assigned work.

该问题很可能是由strtok 的使用引起的.它不是线程安全函数,因为搜索点的位置存储在 C 库内部.当一个线程正在标记某些东西并且另一个线程调用 strtok(lines[line], " "); 时,这会覆盖指向正在搜索的字符串的指针,现在所有其他线程调用strtok(NULL, " "); 正在对新设置的字符串进行分词,而不是之前分词中间的字符串.这是一个典型的数据竞争案例.

The problem is most likely caused by the use of strtok. It is not a thread-safe function since the position of the search point is stored internally in the C library. When one thread is in the middle of tokenising something and another thread calls strtok(lines[line], " ");, this overwrites the pointer to the string being searched and now all other threads calling strtok(NULL, " "); are tokenising the newly set string instead of the string they were in the middle of tokenising before. It is a classical case of data race.

解决方案是使用 strtok_r 代替.

The solution is to use strtok_r instead.

#pragma omp parallel for schedule(dynamic) num_threads(threadNumber) private(word, wordExists) shared(keyWordsOcurrences)
          for (line = 0; line < NUM_LINES; line++)
            {
                char *saveptr;
                // divides the line into words separated by space
                word = strtok_r(lines[line], " ", &saveptr);
                while (word != NULL)
                {
                    // checks if the word being read is one of the monitored words
                    wordExists = checkWordOcurrences(word);
                    if (wordExists)
                    {
                        #pragma omp critical
                        keyWordsOcurrences[wordExists - 1] += 1;
                    }
                    word = strtok_r(NULL, " ", &saveptr);
                }
            }

在一个单独的帐户中,critical 是一个非常重量级的同步构造,使用锁实现.简单的增量,例如 keyWordsOcurrences[wordExists - 1] += 1; 可以用原子更新来保护,这样会更快:

On a separate account, critical is a very heavyweight synchronisation construct implemented with locks. Simple increments such as keyWordsOcurrences[wordExists - 1] += 1; can be protected with atomic updates instead, which are way quicker:

if (wordExists)
{
    #pragma omp atomic update
    keyWordsOcurrences[wordExists - 1] += 1;
}

如果QTD_WORDS不是一个很大的数字,你也可以使用数组缩减:

If QTD_WORDS isn't a very large number, you may also use array reduction:

#pragma omp parallel for schedule(dynamic) num_threads(threadNumber) \
                         private(word, wordExists) \
                         reduction(+:keyWordsOcurrences[0:QTD_WORDS])
          for (line = 0; line < NUM_LINES; line++)
            {
                char *saveptr;
                // divides the line into words separated by space
                word = strtok_r(lines[line], " ", &saveptr);
                while (word != NULL)
                {
                    // checks if the word being read is one of the monitored words
                    wordExists = checkWordOcurrences(word);
                    if (wordExists)
                    {
                        keyWordsOcurrences[wordExists - 1] += 1;
                    }
                    word = strtok_r(NULL, " ", &saveptr);
                }
            }

C 和 C++ 的数组缩减是一个相对较新的 OpenMP 功能,需要支持 OpenMP 4.5 的编译器.对于较旧的编译器,可以手动执行此操作,但这超出了原始问题的范围.

Array reduction for C and C++ is a relatively new OpenMP feature though and requires a compiler that supports OpenMP 4.5. It is possible to do it by hand for older compilers, but that goes way out of the scope of the original question.

这篇关于OpenMP 在结束 C 程序之前不等待所有线程完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆