如何使用CUDA Fortran在结构中分配数组数组? [英] How to allocate arrays of arrays in structure with CUDA Fortran?

查看:361
本文介绍了如何使用CUDA Fortran在结构中分配数组数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用CUDA,我试图在一个结构中分配数组,但是我遇到了一个问题,我不知道为什么。所以这里是一个简短的代码(存储在一个名为 struct.cuf 的文件中),它描述了我的问题。我正在使用 PGI 16.10版进行编译,并使用以下选项: -O3 -Muda = cc60 -tp = x64 struct。 cuf -o struct_out

 模块结构
包含

type mytype
integer :: alpha,beta,gamma
real,dimension(:),pointer :: a
结束类型mytype

类型mytypeDevice
整数:: alpha,beta,gamma
实数,维(:),指针,device :: a
结束类型mytypeDevice

结束模块结构

程序main
使用cudafor
使用结构

type(mytype):: T(3)
type(mytypeDevice),device :: T_Device(3)

!对于主机
do i = 1,3
allocate(T(i)%a(10))
end do
T(1)%a = 1; T(2)%A = 2; T(3)%a = 3

!对于设备
print *,'从现在开始一切正常'
do i = 1,3
allocate(T_Device(i)%a(10))
end do
!do i = 1,3
! T_Device(i)%a = T(i)%a
!end do

end program main

输出错误:

 现在的一切都正常
分段错误

我在这里做错了什么?



我发现(和工作)的唯一解决方案是将值存储在不同的阵列中并将它们传送到GPU,但它非常重。大多数情况下,如果我使用很多像mytype这样的结构。编辑:代码已被修改为使用Vladimir F的解决方案。如果我从 T_Device(3)声明中删除设备属性,那么分配似乎可以,并且也给出值(注释行低于分配)。但是我需要为 T_Device(3)设置设备属性,因为我要在内核中使用它。



谢谢!

解决方案

T_Device 。要使用主机端分配,首先要填充设备结构的主机存储器副本,然后将其复制到设备内存。这:

  type(mytypeDevice):: T_Device(3)

do i = 1,3
allocate(T_Device(i)%a(10))
end do

将正常工作。这是基于C ++的CUDA代码中的非常标准设计模式,这里的原则是相同的。


With CUDA, I'm trying to allocate arrays in a structure, but I'm having an issue and I don't know why. So here is a short code (stored in a file called struct.cuf) that describe my problem. I'm compiling with the PGI 16.10 version, and I'm using the following options : -O3 -Mcuda=cc60 -tp=x64 struct.cuf -o struct_out

module structure
contains

type mytype
 integer :: alpha,beta,gamma
 real,dimension(:),pointer :: a
end type mytype

type mytypeDevice
 integer :: alpha,beta,gamma
 real,dimension(:),pointer,device :: a
end type mytypeDevice

end module structure

program main
 use cudafor
 use structure

 type(mytype) :: T(3)
 type(mytypeDevice),device :: T_Device(3)

 ! For the host
 do i=1,3
  allocate(T(i)%a(10))
 end do
 T(1)%a=1; T(2)%a=2; T(3)%a=3

 ! For the device
 print *, 'Everything from now is ok'
 do i=1,3
  allocate(T_Device(i)%a(10))
 end do
 !do i=1,3
 ! T_Device(i)%a=T(i)%a
 !end do

end program main

The output error :

 Everything from now is ok
Segmentation fault     

What I am doing wrong here ?

The only solution I found (and working) is to stored the values in differents arrays and transfers them to the GPU, but it's very "Heavy". Mostly if I use a lot of structures like mytype.

EDIT : Code has been modified to use Vladimir F's solution. If I remove the device attribute from T_Device(3) declaration, then allocation seems ok and giving values too (commented lines below allocation). But I need that device attribute for T_Device(3), because I'm gonna use it in kernels.

Thanks !

解决方案

The problem here is how you have declared T_Device. To use host side allocation you first populate a host memory copy of the device structure, and then copy it to device memory. This:

type(mytypeDevice) :: T_Device(3)

do i=1,3
  allocate(T_Device(i)%a(10))
 end do

will work correctly. This is a very standard design pattern in C++ based CUDA code, and the principle here is identical.

这篇关于如何使用CUDA Fortran在结构中分配数组数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆