使用readdir在for循环中并行递归 [英] Parallelizing recursion in a for-loop using readdir

查看:130
本文介绍了使用readdir在for循环中并行递归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想并行化一个C程序,该程序使用OpenMP和C递归计算目录及其子目录的大小.

I'd like to parallelize a C-program which recursively calculates the size of a directory and its sub-directories, using OpenMP and C.

我的问题是,当我使用opendir进入目录并使用readdir遍历子目录时,只能逐个访问它们,直到到达最后一个子目录.一切都可以很好地按顺序进行.

My issue is, that when I get into a directory using opendir, and I iterate through the sub-directories using readdir I can only access them one by one until I've reached the last sub-directory. It all works well sequentially.

但是,在并行化程序时,我认为将子目录的数量分成一半(甚至更小的分​​区)并使用OpenMp Tasks遍历子目录是有意义的.

When parallelizing the program, however, I think it would make sense to split the number of sub-directories in half (or even smaller partitions) and recurse through the sub-directories using OpenMp Tasks.

显然,由于for循环的结构,我不能简单地将问题大小(=子目录数)分成两半,而这样的循环不能使用#pragma omp for并行化.

Obviously I can't simply split the problem size (= number of sub-directories) in half, because of the structure of the for-loop, and loops like this cannot be parallelized using #pragma omp for.

有人对如何将此功能拆分为任务有想法吗?任何帮助将不胜感激.

Does anybody have an idea on how to split this function into tasks? Any help would be greatly appreciated.

这是我的一些代码(我删除了我认为与该问题不相关的部分.)

This is some of my code (I've removed parts I do not deem relevant for this question.)

int calculate_folder_size(const char *path) {

    struct stat sb;

    if (S_ISREG(sb.st_mode)) { // if it's a file, not a directory (base case)
        return sb.st_size;
    }

    DIR *folder = opendir(path);

    struct dirent *element;
    size_t size = 4096;

    for (element = readdir(folder); element != NULL; element = readdir(folder)) {

   //(...)
            if (element->d_type == DT_DIR) {
                // recursive call of calculate_folder_size
                size += calculate_folder_size(name);
            } else {
             //(...)
            }
        }
    }
    closedir(folder);
    return size;
}

推荐答案

您需要一个支持OpenMP任务的现代编译器,该编译器将从等式中删除Visual C ++.只要您使用的是这样的编译器,您要做的就是将对calculate_folder_size()的递归调用转换为OpenMP任务:

You need a modern compiler with support for OpenMP tasks, which removes Visual C++ from the equation. Provided you are using such a compiler, all you need to do is to turn the recursive invocations to calculate_folder_size() into OpenMP tasks:

int calculate_folder_size(const char *path) {
    struct stat sb;

    if (S_ISREG(sb.st_mode)) { // if it's a file, not a directory (base case)
        return sb.st_size;
    }

    DIR *folder = opendir(path);

    struct dirent *element;
    size_t size = 4096;

    for (element = readdir(folder); element != NULL; element = readdir(folder)) {
        //(...)
        if (element->d_type == DT_DIR) {
            // Make sure the task receives a copy of the path
            char *priv_name = strdup(name); // (1)
            // recursive call of calculate_folder_size
            //               (2)
            #pragma omp task shared(size) firstprivate(priv_name)
            {
                // (3)
                #pragma omp atomic update
                size += calculate_folder_size(priv_name);
                free(priv_name); // (4)
            }
        } else {
            //(...)
        }
    }
    // (5)
    #pragma omp taskwait

    closedir(folder);
    return size;
}

这里的重要部分是:

  1. 您需要向任务传递一个名称参数,该名称参数将一直存在并保留其值,直到任务执行完毕为止(可能在将来的任何时间).因此,您需要制作name的副本,例如使用strdup(3).

该任务应记住priv_name的当前值,因为它会在循环的下一次迭代期间改变.因此,对priv_name进行firstprivate处理.它还需要能够在父上下文中修改size,因此也可以对其进行修改.

The task should remember the current value of priv_name since it will change during the next iteration of the loop. Therefore, the firstprivate treatment of priv_name. It also needs to be able to modify size in the parent context, hence shared for it.

由于所有任务都在父作用域中更新相同的size变量,因此需要使用atomic update保护访问.

Since all tasks are updating the same size variable in the parent scope, the access needs to be protected with atomic update.

不再需要私人名称,而必须将其丢弃.

The private name is no longer needed and must be disposed.

父任务应该先等待所有子任务先完成,然后再返回size.

The parent task should wait for all child tasks to finish first before returning size.

必须在并行区域内调用此函数才能并行执行其工作:

This function must be called from within a parallel region in order to do its job in parallel:

int size;

#pragma omp parallel
#pragma omp single
size = calculate_folder_size("/some/path");

最好限制事物仍然并行运行的并行度的深度.我把它留给你弄清楚如何:)

It might be a good idea to limit the depth of parallelism at which things still run in parallel. I leave it to you to figure it out how :)

这篇关于使用readdir在for循环中并行递归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆