从传递FORTRAN阵列产生尖点coo_matrix [英] Generating CUSP coo_matrix from passed FORTRAN arrays

查看:327
本文介绍了从传递FORTRAN阵列产生尖点coo_matrix的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作的尖点解算器集成到现有的FORTRAN code。作为第一步,我只是想从FORTRAN一对整型数组,并(在FORTRAN实际* 4)的浮点将被用于构建然后打印COO格式尖点矩阵通过。

到目前为止,我已经能够按照这个线程,并得到了一切编译和链接:<一个href=\"http://stackoverflow.com/questions/24462580/unresolved-references-using-ifort-with-nvcc-and-cusp\">Unresolved使用带有NVCC,牙尖IFORT引用

不幸的是,该方案显然是发送垃圾推向风口浪尖矩阵,并结束了以下错误而崩溃:

  $。/ fort_cusp_test
 测试1 2 3
稀疏矩阵&LT; 1339222572,1339222572&GT;与1339222568项
的libc ++ abi.dylib:型推力未捕获的异常::系统:: SYSTEM_ERROR终止:无效的参数程序接收到的信号SIGABRT:进程中止的信号。回溯跟踪此错误:
#0 0x10ff86ff6
#1 0x10ff86593
#2 0x7fff8593ff19
中止陷阱:6

在code对CUDA和Fortran来源如下:

cusp_runner.cu

 的#include&LT;&stdio.h中GT;
#包括LT&;风口浪尖/ coo_matrix.h&GT;
#包括LT&;&iostream的GT;
#包括LT&;风口浪尖/克雷洛夫/ cg.h&GT;
#包括LT&;风口浪尖/ print.h&GT;#如果定义(__ CPLUSPLUS)
为externC{
#万一无效test_coo_mat_print_为(int * row_i,为int * col_j,浮* val_v,整数N,INT NNZ){   //包裹推力:: device_ptr原始输入指针
   推力:: device_ptr&LT; INT&GT; wrapped_device_I(row_i);
   推力:: device_ptr&LT; INT&GT; wrapped_device_J(col_j);
   推力:: device_ptr&LT;浮动&GT; wrapped_device_V(val_v);   //使用array1d_view来包装各个阵列
   的typedef typename的风口浪尖:: array1d_view&LT;推力:: device_ptr&LT; INT&GT; &GT; DeviceIndexArrayView;
   的typedef typename的风口浪尖:: array1d_view&LT;推力:: device_ptr&LT;浮动&GT; &GT; DeviceValueArrayView;   DeviceIndexArrayView row_indices(wrapped_device_I,wrapped_device_I + N);
   DeviceIndexArrayView column_indices(wrapped_device_J,wrapped_device_J + NNZ);
   DeviceValueArrayView值(wrapped_device_V,wrapped_device_V + NNZ);   //结合array1d_views到coo_matrix_view
   风口浪尖的typedef :: coo_matrix_view&LT; D​​eviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView&GT; DeviceView;   //构造从array1d_views coo_matrix_view
   DeviceView A(N,N,NNZ,row_indices,column_indices,价值观);   风口浪尖::打印(A);
}
#如果定义(__ CPLUSPLUS)
}
#万一

fort_cusp_test.f90

 程序fort_cuda_test   隐无接口
   子程序test_coo_mat_print_(row_i,col_j,val_v,N,NNZ)绑定(C)
      使用时,内在:: ISO_C_BINDING,ONLY:c_int的,C_FLOAT
      隐无
      整数(c_int的):: N,NNZ,row_i(:),col_j(:)
      实(C_FLOAT):: val_v(:)
   结束子程序test_coo_mat_print_
结束接口   整数* 4 N
   整数* 4 NNZ   整数* 4,目标:: rowI(9),colJ(9)
   真正的* 4,目标:: VALV(9)   整数* 4,指针:: row_i(:)
   整数* 4,指针:: col_j(:)
   真正的* 4,指针:: val_v(:)   N = 3
   NNZ = 9
   rowI =(/ 1,1,1,2,2,2,3,3,3 /)
   colJ =(/ 1,2,3,1,2,3,1,2,3 /)
   VALV =(/ 1,2,3,4,5,6,7,8,9 /)   row_i =&GT; rowI
   col_j =&GT; colJ
   val_v =&GT; VALV   写(*,*)测试1 2 3   调用test_coo_mat_print_(row_i,col_j,val_v,N,NNZ)程序结束fort_cuda_test

如果你想尝试一下自己,这是我的(而不雅)生成文件:

 测试:
   NVCC -Xcompiler = - 子卡-shared cusp_runner.cu -o cusp_runner.so -I /开发商/ NVIDIA / CUDA-6.5 /有/风口浪尖
   gfortran -c fort_cusp_test.f90
   gfortran fort_cusp_test.o cusp_runner.so -L /开发商/ NVIDIA / CUDA-6.5 / lib目录-lcudart -o fort_cusp_test清洁:
   RM * *的.o。所以

库路径将需要改变作为当然适当

任何人都可以点我如何在需要的阵列正确传递从FORTRAN code正确的方向?


在接口块取出,在除C函数,我可以看到阵列正确传递开始的print语句,但n和NNZ导致问题。
我得到以下的输出:

  $ ./fort_cusp_test
 测试1 2 3
N:1509677596,NNZ:1509677592
     我,row_i,col_j,val_v
     0,1,1,1.0000e + 00
     1,1,2,2.0000E + 00
     2,1,3,3.0000e + 00
     3,2,1,4.0000e + 00
     4,2,2,5.0000e + 00
     5,2,3,6.0000e + 00
     6,3,1,7.0000e + 00
     7,3,2,8.0000e + 00
     8,3,3,9.0000e + 00
     9,0,32727,0.0000E + 00
    ...
    等等
    ...
    计划接收信号SIGSEGV:段错误 - 无效的内存引用。回溯跟踪此错误:
#0 0x105ce7ff6
#1 0x105ce7593
#2 0x7fff8593ff19
#3 0x105c780a2
#4 0x105c42dbc
#5 0x105c42df4
段故障:11

fort_cusp_test

 接口
       子程序test_coo_mat_print_(row_i,col_j,val_v,N,NNZ)绑定(C)
          使用时,内在:: ISO_C_BINDING,ONLY:c_int的,C_FLOAT
          隐无
          整数(c_int的),价值:: N,NNZ
          整数(c_int的):: row_i(:),col_j(:)
          实(C_FLOAT):: val_v(:)
       结束子程序test_coo_mat_print_
    结束接口       整数* 4 N
       整数* 4 NNZ       整数* 4,目标:: rowI(9),colJ(9)
       真正的* 4,目标:: VALV(9)       整数* 4,指针:: row_i(:)
       整数* 4,指针:: col_j(:)
       真正的* 4,指针:: val_v(:)       N = 3
       NNZ = 9
       rowI =(/ 1,1,1,2,2,2,3,3,3 /)
       colJ =(/ 1,2,3,1,2,3,1,2,3 /)
       VALV =(/ 1,2,3,4,5,6,7,8,9 /)       row_i =&GT; rowI
       col_j =&GT; colJ
       val_v =&GT; VALV       写(*,*)测试1 2 3       调用test_coo_mat_print_(rowI,colJ,VALV,N,NNZ)    程序结束fort_cuda_test

cusp_runner.cu

 的#include&LT;&stdio.h中GT;
    #包括LT&;风口浪尖/ coo_matrix.h&GT;
    #包括LT&;&iostream的GT;
    //#包括LT&;风口浪尖/克雷洛夫/ cg.h&GT;
    #包括LT&;风口浪尖/ print.h&GT;    #如果定义(__ CPLUSPLUS)
    为externC{
    #万一    无效test_coo_mat_print_为(int * row_i,为int * col_j,浮* val_v,整数N,INT NNZ){       的printf(N:%D,NNZ数:%d \\ N,N,NNZ);       的printf(%6S,%6S,%6S,%12S的\\ n,我,row_i,col_j,val_v);
       的for(int i = 0; I&LT; N;我++){
          的printf(%6D,%6d中,%6d中,%12.4e \\ N,我,row_i [I],col_j [I],val_v [I]);
       }
       如果(假){
       //包裹推力:: device_ptr原始输入指针
       推力:: device_ptr&LT; INT&GT; wrapped_device_I(row_i);
       推力:: device_ptr&LT; INT&GT; wrapped_device_J(col_j);
       推力:: device_ptr&LT;浮动&GT; wrapped_device_V(val_v);       //使用array1d_view来包装各个阵列
       的typedef typename的风口浪尖:: array1d_view&LT;推力:: device_ptr&LT; INT&GT; &GT; DeviceIndexArrayView;
       的typedef typename的风口浪尖:: array1d_view&LT;推力:: device_ptr&LT;浮动&GT; &GT; DeviceValueArrayView;       DeviceIndexArrayView row_indices(wrapped_device_I,wrapped_device_I + N);
       DeviceIndexArrayView column_indices(wrapped_device_J,wrapped_device_J + NNZ);
       DeviceValueArrayView值(wrapped_device_V,wrapped_device_V + NNZ);       //结合array1d_views到coo_matrix_view
       风口浪尖的typedef :: coo_matrix_view&LT; D​​eviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView&GT; DeviceView;       //构造从array1d_views coo_matrix_view
       DeviceView A(N,N,NNZ,row_indices,column_indices,价值观);       风口浪尖::打印(A); }
    }
    #如果定义(__ CPLUSPLUS)
    }
    #万一


解决方案

有两种方法从Fortran的参数传递到C程序:首先是要使用的接口块(在现代Fortran语言的新方法),而第二不使用接口模块(甚至的Fortran77一个老办法有效)。

首先,下面是关于使用接口块的第一个方法。由于C例程期望接收C指针(row_i,col_j和val_v),我们需要通过地址从侧面的Fortran这些变量。要做到这一点,我们必须使用星号(*),而不是冒号(:)在接口块,如下所示。 (如果我们使用冒号,那么这告诉Fortran编译送Fortran指针指向的对象[1]的地址,这是不期望的行为。)另外,在C程序n和NNZ被声明为值(不是指针) ,接口块需要具有用于这些变量VALUE属性,使得Fortran编译发送n和NNZ而不是他们的地址的值。总之,在这第一种方法中,C和Fortran程序是这样的:

  Fortran例程:
...
接口
    子程序test_coo_mat_print_(row_i,col_j,val_v,N,NNZ)绑定(C)
        使用时,内在:: ISO_C_BINDING,ONLY:c_int的,C_FLOAT
        隐无
        整数(c_int的):: row_i(*),col_j(*)
        实(C_FLOAT):: val_v(*)
        整数(c_int的),价值:: N,NNZ!见注[2]下面还
    结束子程序test_coo_mat_print_
结束接口
...
调用test_coo_mat_print_(rowI,colJ,VALV,N,NNZ)C例程:
无效test_coo_mat_print_为(int * row_i,为int * col_j,浮* val_v,整数N,INT NNZ)


以下是关于无接口块第二种方法。在这种方法中,首先删除接口块和指针数组完全,改变的Fortran code如下:

  Fortran例程:整数rowI(9),colJ(9)中,n,NNZ !!没有目标属性必要
真正VALV(9)! ...设置rowI和前面一样...调用test_coo_mat_print(rowI,colJ,VALV,N,NNZ)! _被丢弃

和C例程如下:

 无效test_coo_mat_print_为(int * row_i,为int * col_j,浮* val_v,为int * N_,为int * nnz_)
{
    INT N = * N_,NNZ = * nnz_;    的printf(%D \\ N,N,NNZ);
    对于(INT K = 0; K&LT; 9; k ++){
        的printf(%D%10.6f \\ n,row_i [K],col_j [K],val_v [K]);
    }    //现在去推...
}

注意N_和nnz_中被声明为C例程的指针,因为没有接口块,Fortran编译器始终将实际参数的地址到C例程。还要注意,在上面的C例程的row_i等内容印制,以确保参数正确传递。如果打印出来的值是正确的,那么我想这个问题会更容易在推力例程(包括如何传似n和NNZ大小信息)的电话。

[1]宣布为Fortran指针真正的,指针::一(:)实际上重新presents像一个数组的视图类(在C ++术语),这是从实际指向的数据不同。这里需要的是发送实际数据的地址,此数组视图对象不是地址。另外,在接口块中的星号(一(*))重新presents假定大小阵列,它是用于使一个阵列中的Fortran一个老方法。在这种情况下,该阵列的第一个元素的地址被传递,如预期

[2]如果n和NNZ被声明为在C程序指针(如第二种方法),那么这个值属性应的的连接,因为C程序想要的地址实际参数,而不是它们的值。

I am working on integrating CUSP solvers into an existing FORTRAN code. As a first step, I am simply trying to pass in a pair of integer arrays and a float (real*4 in FORTRAN) from FORTRAN which will be used to construct and then print a COO format CUSP matrix.

So far, I have been able to follow this thread and got everything to compile and link: Unresolved references using IFORT with nvcc and CUSP

Unfortunately, the program is apparently sending garbage into the CUSP matrix and ends up crashing with the following error:

$./fort_cusp_test
 testing 1 2 3
sparse matrix <1339222572, 1339222572> with 1339222568 entries
libc++abi.dylib: terminating with uncaught exception of type thrust::system::system_error: invalid argument

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x10ff86ff6
#1  0x10ff86593
#2  0x7fff8593ff19
Abort trap: 6

The code for the cuda and fortran sources are as follows:

cusp_runner.cu

#include <stdio.h>
#include <cusp/coo_matrix.h>
#include <iostream>
#include <cusp/krylov/cg.h>
#include <cusp/print.h>

#if defined(__cplusplus)
extern "C" {
#endif

void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {

   //wrap raw input pointers with thrust::device_ptr
   thrust::device_ptr<int> wrapped_device_I(row_i);
   thrust::device_ptr<int> wrapped_device_J(col_j);
   thrust::device_ptr<float> wrapped_device_V(val_v);

   //use array1d_view to wrap individual arrays
   typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
   typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;

   DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
   DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
   DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);

   //combine array1d_views into coo_matrix_view
   typedef   cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;

   //construct coo_matrix_view from array1d_views
   DeviceView A(n,n,nnz,row_indices,column_indices,values);

   cusp::print(A);
}
#if defined(__cplusplus)
}
#endif

fort_cusp_test.f90

program fort_cuda_test

   implicit none

interface
   subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
      use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
      implicit none
      integer(C_INT) :: n, nnz, row_i(:), col_j(:)
      real(C_FLOAT) :: val_v(:)
   end subroutine test_coo_mat_print_
end interface

   integer*4   n
   integer*4   nnz

   integer*4, target :: rowI(9),colJ(9)
   real*4, target :: valV(9)

   integer*4, pointer ::   row_i(:)
   integer*4, pointer ::   col_j(:)
   real*4, pointer ::   val_v(:)

   n     =  3
   nnz   =  9
   rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
   colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
   valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)

   row_i => rowI
   col_j => colJ
   val_v => valV

   write(*,*) "testing 1 2 3"

   call test_coo_mat_print_(row_i,col_j,val_v,n,nnz)

end program fort_cuda_test

If you want to try it yourself, here is my (rather inelegant) makefile:

Test:
   nvcc -Xcompiler="-fPIC" -shared cusp_runner.cu -o cusp_runner.so -I/Developer/NVIDIA/CUDA-6.5/include/cusp
   gfortran -c fort_cusp_test.f90
   gfortran fort_cusp_test.o cusp_runner.so -L/Developer/NVIDIA/CUDA-6.5/lib -lcudart -o fort_cusp_test

clean:
   rm *.o *.so

The library paths will need to be changed as appropriate of course.

Can anyone point me in the right direction for how to properly pass in the needed arrays from the fortran code?


with the interface block removed and addition of a print statement at the start of the C function I can see that the arrays are being passed correctly, but n and nnz are causing problems. I get the following output:

$ ./fort_cusp_test
 testing 1 2 3
n: 1509677596, nnz: 1509677592
     i,  row_i,  col_j,        val_v
     0,      1,      1,   1.0000e+00
     1,      1,      2,   2.0000e+00
     2,      1,      3,   3.0000e+00
     3,      2,      1,   4.0000e+00
     4,      2,      2,   5.0000e+00
     5,      2,      3,   6.0000e+00
     6,      3,      1,   7.0000e+00
     7,      3,      2,   8.0000e+00
     8,      3,      3,   9.0000e+00
     9,      0,  32727,   0.0000e+00
    ...
    etc
    ...
    Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x105ce7ff6
#1  0x105ce7593
#2  0x7fff8593ff19
#3  0x105c780a2
#4  0x105c42dbc
#5  0x105c42df4
Segmentation fault: 11

fort_cusp_test

    interface
       subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
          use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
          implicit none
          integer(C_INT),value :: n, nnz
          integer(C_INT) :: row_i(:), col_j(:)
          real(C_FLOAT) :: val_v(:)
       end subroutine test_coo_mat_print_
    end interface

       integer*4   n
       integer*4   nnz

       integer*4, target :: rowI(9),colJ(9)
       real*4, target :: valV(9)

       integer*4, pointer ::   row_i(:)
       integer*4, pointer ::   col_j(:)
       real*4, pointer ::   val_v(:)

       n     =  3
       nnz   =  9
       rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
       colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
       valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)

       row_i => rowI
       col_j => colJ
       val_v => valV

       write(*,*) "testing 1 2 3"

       call test_coo_mat_print_(rowI,colJ,valV,n,nnz)

    end program fort_cuda_test

cusp_runner.cu

   #include <stdio.h>
    #include <cusp/coo_matrix.h>
    #include <iostream>
    // #include <cusp/krylov/cg.h>
    #include <cusp/print.h>

    #if defined(__cplusplus)
    extern "C" {
    #endif

    void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {

       printf("n: %d, nnz: %d\n",n,nnz);

       printf("%6s, %6s, %6s, %12s \n","i","row_i","col_j","val_v");
       for(int i=0;i<n;i++) {
          printf("%6d, %6d, %6d, %12.4e\n",i,row_i[i],col_j[i],val_v[i]);
       }
       if ( false ) {
       //wrap raw input pointers with thrust::device_ptr
       thrust::device_ptr<int> wrapped_device_I(row_i);
       thrust::device_ptr<int> wrapped_device_J(col_j);
       thrust::device_ptr<float> wrapped_device_V(val_v);

       //use array1d_view to wrap individual arrays
       typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
       typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;

       DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
       DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
       DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);

       //combine array1d_views into coo_matrix_view
       typedef cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;

       //construct coo_matrix_view from array1d_views
       DeviceView A(n,n,nnz,row_indices,column_indices,values);

       cusp::print(A); }
    }
    #if defined(__cplusplus)
    }
    #endif

解决方案

There are two methods for passing arguments from Fortran to C routines: The first is to use the interface block (a new approach in modern Fortran), and the second is not to use the interface block (an old approach valid even for Fortran77).

First, the following is about the first method for using the interface block. Because the C routine expects to receive C pointers (row_i, col_j, and val_v), we need to pass the address for those variables from the Fortran side. To do so, we have to use asterisk (*) rather than colon(:) in the interface block, as shown below. (If we use colon, then this tells the Fortran compiler to send the address of Fortran pointer objects [1], which is not the desired behavior.) Also, since n and nnz in the C routine are declared as values (not pointers), the interface block needs to have the VALUE attribute for these variables, so that the Fortran compiler sends the value of n and nnz rather than their addresses. To summarize, in this first approach, the C and Fortran routines look like this:

Fortran routine:
...
interface
    subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
        use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
        implicit none
        integer(C_INT) :: row_i(*), col_j(*)
        real(C_FLOAT) :: val_v(*)
        integer(C_INT), value :: n, nnz     !! see note [2] below also
    end subroutine test_coo_mat_print_
end interface
...
call test_coo_mat_print_( rowI, colJ, valV, n, nnz )

C routine:
void test_coo_mat_print_ (int * row_i, int * col_j, float * val_v, int n, int nnz ) 


The following is about the second approach without the interface block. In this approach, first delete the interface block and array pointers entirely, and change the Fortran code as follows

Fortran routine:

integer  rowI( 9 ), colJ( 9 ), n, nnz     !! no TARGET attribute necessary
real     valV( 9 )

! ...set rowI etc as above...

call test_coo_mat_print ( rowI, colJ, valV, n, nnz )   !! "_" is dropped

and the C routine as follows

void test_coo_mat_print_ ( int* row_i, int* col_j, float* val_v, int* n_, int* nnz_ )
{
    int n = *n_, nnz = *nnz_;

    printf( "%d %d \n", n, nnz );
    for( int k = 0; k < 9; k++ ) {
        printf( "%d %d %10.6f \n", row_i[ k ], col_j[ k ], val_v[ k ] );
    }

    // now go to thrust...
}

Note that n_ and nnz_ are declared as pointers in the C routine, because without the interface block, the Fortran compiler always sends the address of actual arguments to the C routine. Also note that in the above C routine, the contents of row_i etc are printed to make sure that the arguments are passed correctly. If the printed values are correct, then I guess the problem will be more likely in the call of thrust routines (including how to pass the size info like n and nnz).

[1] Fortran pointers declared as "real, pointer :: a(:)" actually represents something like an array view class (in C++ jargon), which is different from the actual data pointed. What is needed here is to send the address of actual data, not the address of this array-view object. Also, the asterisk in the interface block (a(*)) represents an assumed-size array, which is an old method for passing an array in Fortran. In this case the address of the first element of the array is passed, as expected.

[2] If n and nnz are declared as pointers in the C routine (as in the second approach), then this VALUE attribute should not be attached, because the C routine wants the address of actual arguments rather than their values.

这篇关于从传递FORTRAN阵列产生尖点coo_matrix的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆