需要帮助使用openmp并行化C ++代码 [英] Need help parallelizing the C++ code using openmp

查看:111
本文介绍了需要帮助使用openmp并行化C ++代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是openmp的新手。如果有人可以提出一些更改来让这个程序更快地运行,那将非常有帮助。在这个程序中,values.txt是一个包含60千万(6亿)行的文件,每行有一个32位的16 0和16 1的排列。

说明 -





这里解释起来很难,但我会尝试。

我在一个文本文件中生成了16 0和16 1的所有32位排列,values.txt。

Eg-

00000000000000001111111111111111
$ b $ 00 00000000000000010111111111111111
$ b $ 00 00000000000000011011111111111111

00000000000000011101111111111111

00000000000000011110111111111111

00000000000000011111011111111111

00000000000000011111101111111111
$ b $ 00 00000000000000011111110111111111

00000000000000011111111011111111

等......



让我们考虑文本文件的每一行都是布尔函数。

我需要检查域中此函数的可逆性。



为此,我从文本文件中选取第一行并将其存储到维度为32x1的矩阵中,矩阵a [] []嵌套for循环中的


我基本上以3x3矩阵的形式生成域值,我需要检查函数的可逆性。 />
我创建了一个维度3x3的矩阵g [] [],它将存储所有no的二进制表示。从1到2 ^ 9。例如 -

为0矩阵g看起来像 -

0 0 0

0 0 0

0 0 0



为1,矩阵g将是 -



0 0 0

0 0 0

0 0 1


2 b矩阵的
将是



0 0 0

0 0 0

0 1 0



依此类推至2 ^ 9。



上面生成的每个矩阵从0到2 ^ 9,我正在根据我的函数计算维度3x3的新矩阵u [] []。

这是通过读取矩阵的每个元素的5个相邻值来完成的。



例如 - 考虑g矩阵为

0 0 0

0 1 1

1 0 0



我先取货元素,即g [0] [0],使用五个相邻值(顶值,左值,元素本身,右值,低于值)计算一个新值,即g [2] [0],g [ 0] [2],G [0] [0],G [0] [1],G [1] [0]。这5个没有。合并代表二进制否。我计算它的十进制等值,十进制值对应于行号。矩阵a [] [],我必须更新u [0] [0]的值。

我将对g的每个元素重复上述过程,并最终得到3x3的au矩阵。



这个完整的过程是针对一个矩阵,矩阵对应于0.

对于从0到2 ^ 9的每个g [] []矩阵,我会创建2 ^ 9矩阵。



在任何时间点如果对于两个矩阵g [] [],矩阵u [] []碰巧是相同的我中止函数,读取文本文件的第二行并再次开始上面的过程,即我对导致重复矩阵的函数不感兴趣。如果所有2 ^ 9矩阵碰巧都不同,我会将相应函数的值(文本文件中的行)写入另一个文本文件。



因此总结一下,我需要为整体计算创建总计60亿* 2 ^ 9的矩阵。





事情是对于来自文本文件的特定函数,2 ^ 9矩阵是单独计算的。如果我能以某种方式将它们并行化,我会大大减少计算时间...而且我需要帮助。



我希望你有这个方法。

实际上我还使用了三个嵌套循环,因为这是一个示例程序。实际上我不需要计算2 ^ 9矩阵但总共需要2 ^ 128个矩阵。我的实际g矩阵的阶数为16x8,同样是u矩阵。由于没有128位数据类型,openMP只支持标准库中的整数数据类型,我想到了使用嵌套循环。



新程序更新:

I am new to openmp. If anyone could suggest some more changes to get this program work faster, it would be very helpful. In this program values.txt is a file containing 60 crore ( 600 millions) lines with each line having a 32 bit permutation of 16 0's and 16 1's.
Explanation-


It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....

Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.

For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].

inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0

for 1, matrix g would be-

0 0 0
0 0 0
0 0 1

for 2 matrix g would be

0 0 0
0 0 0
0 1 0

and so on upto 2^9.

for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.

for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0

I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.

this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.

At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.

So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.


The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.

I hope you got the method.
Also actually I used three nested loops because this was a sample program. In actual I donot need to calculate 2^9 matrices but a total of 2^128 matrices. My actual g matrix would be of order 16x8 and same will be u matrix. And since there is no 128 bit datatype and openMP only support integer datatype in standard library, I thought of using nested loops.

New program Updated:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
#include <boost/multiprecision/cpp_int.hpp>
using namespace boost::multiprecision;
#include <boost/lexical_cast.hpp>
#include <cctype>
#include <boost/assign/list_of.hpp>
#include <set>
#include <stdint.h>
#include <omp.h>
#define convertToString(x) #x
using namespace boost::assign;

int main()
{
    int xyz=0;
//    omp_set_dynamic(0);
//    omp_set_num_threads(1);
//    Dec2Bin in;

    ifstream infile;
    infile.open("values.txt");
    ofstream outfile;
    outfile.open("haha.txt");
    short a[32][1];
    while(!infile.eof())
    {
        string STRING;
        getline(infile,STRING);
        set<string> SET;
        int count=0;


        for(int i=0;i<32;i++)
        {
                a[i][0]=STRING.at(i)-'0';
        }


        int g[9];
        int u[9];
        char buffer[10];
        buffer[9] = 0;
        uint16_t f = 0;

        int max = (int)pow(2,3);


        for(int r=0;r<max && count!=1;r++)
        {
           for(int s=0;s<max && count!=1;s++)
           {
              for(int t=0;t<max && count!=1;t++)
              {
                for(int i = 0; i < 9; ++i)
                {
                   g[i] = (f & (1 << (8 - i))) != 0;
                }
                ++f;

                u[0]=a[(g[6]*2*2*2*2)+(g[2]*2*2*2)+(g[0]*2*2)+(g[1]*2)+(g[3]*1)][0];
                u[1]=a[(g[7]*2*2*2*2)+(g[0]*2*2*2)+(g[1]*2*2)+(g[2]*2)+(g[4]*1)][0];
                u[2]=a[(g[8]*2*2*2*2)+(g[1]*2*2*2)+(g[2]*2*2)+(g[0]*2)+(g[5]*1)][0];
                u[3]=a[(g[0]*2*2*2*2)+(g[5]*2*2*2)+(g[3]*2*2)+(g[4]*2)+(g[6]*1)][0];
                u[4]=a[(g[1]*2*2*2*2)+(g[3]*2*2*2)+(g[4]*2*2)+(g[5]*2)+(g[7]*1)][0];
                u[5]=a[(g[2]*2*2*2*2)+(g[4]*2*2*2)+(g[5]*2*2)+(g[3]*2)+(g[8]*1)][0];
                u[6]=a[(g[3]*2*2*2*2)+(g[8]*2*2*2)+(g[6]*2*2)+(g[7]*2)+(g[0]*1)][0];
                u[7]=a[(g[4]*2*2*2*2)+(g[6]*2*2*2)+(g[7]*2*2)+(g[8]*2)+(g[1]*1)][0];
                u[8]=a[(g[5]*2*2*2*2)+(g[7]*2*2*2)+(g[8]*2*2)+(g[6]*2)+(g[2]*1)][0];

                
                for(int i = 0; i < 9; ++i)
                {
                   buffer[i] = '0' + u[i];
                }
                if(!SET.insert(::std::string(buffer)).second)
                {
                   count = 1;
                }
             }
          }
        }

        if(count==0)
        {
           /* xyz++;
            if(xyz>3)
            break; */
            outfile<<STRING<<"\n";
            cout<<STRING<<"\n";
        }


    }
        infile.close();
        outfile.close();
        return 0;
    }





我的尝试:



我尝试了很多并行化这个代码并最终进行了一些修改,但我仍然觉得我没有得到所需的时间。



What I have tried:

I tried a lot parallelizing this code and ended up with some modifications but I still feel I am not getting the desired timing.

推荐答案

使用之前线程,你应该从优化你的代码开始。



只是一些优化:

- 无处不在,你使用 convertToString(0),替换为'0'。同样适用于 convertToString(1)'1'



等待......

你有3个嵌套循环

Before using threads, you should start by optimizing your code.

Just a few optimizations:
- Everywhere, you use convertToString(0), replace with '0'. Same for convertToString(1) and '1'.

Wait ...
you have 3 nested loops
for(int r=0;r<(int)pow(2,3);r++)
{
    for(int s=0;s<(int)pow(2,3);s++)
    {

        for(int t=0;t<(int)pow(2,3);t++)
        {



为什么你拥有这个 ?不使用3个变量 r s t 在循环内。我看不出任何理由。



我认为完全重写是有序的。



顺便问一下:多线程的实际收益是多少?



[更新]

我认为你错了方式。

1)如果您的项目符合已知问题,请提供名称,给出并参考。

2)您应该解释真实项目并显示真实代码。

3)现在忘了Openmp。

4)获得针对1个线程优化的真实代码。

5)然后你可以想到多线程。



只需要几个提示。这种多线程优化是专业的工作,专业人士可以为这种东西(而不是花生)获得报酬,因为它甚至不适合每一位专业人士。

这个问题本身就是出于问题快速回答的范围,分析你做了什么并设备正确的优化需要几个小时。



[更新]

有可能的更多优化,但你有一个更大的问题。

这个词中没有计算机可以跟踪2 ^ 128矩阵,它们没有足够的内存,包括硬盘,包括计算机农场,包含数据中心。



要验证,请尝试运行满足条件的条目值的完整大小的程序。程序将因内存不足而崩溃。


Why do you have this ? the 3 variables r, s and t are not used inside the loops. I can't see any reason for this.

I think a complete rewrite is in order.

By the way: how much is the actual gain of multi-threads ?

[Update]
I think you are on the wrong way.
1) if your project match a known problem, give the name, give it and reference.
2) you should explain the real project and show real code.
3) for now forget about Openmp.
4) get the real code optimized for 1 thread.
5) then you can think of multi-threading.

Expect only a few hints. This kind on optimization for multi-threading is professional job and professionals get paid for this kind of thing (and not peanuts) because it is not even for every professional.
The question by itself is out of the scope of a quick answer, it takes hours to analyze what you have done and device the right optimizations.

[Update]
There is more optimizations that are possible, but you have a bigger problem.
No computer in the word can get track of 2^128 matrix, they don't have enough memory, HDD included, computer farm included, data centers included.

To verify, try to run a full sized program with an entry value that satisfy the conditions. the program will crash with out of memory.


这篇关于需要帮助使用openmp并行化C ++代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆