CUDA函数不会在带有Numba的Python上执行For循环 [英] CUDA Function Won't Execute For Loop on Python with Numba

查看:277
本文介绍了CUDA函数不会在带有Numba的Python上执行For循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在GPU上运行模拟的简单更新循环。基本上有很多由圆圈表示的生物,它们在每个更新循环中都会移动,然后将检查它们是否相交。

I'm trying to run a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect.

import numpy as np
import math
from numba import cuda


@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):
    for c in range(cycles):
        for i in range(num_creatures):
            velocities[i] = velocities[i] + acceleration
            if velocities[i] > max_velocities[i]:
                velocities[i] = max_velocities[i]
            p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])
            p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])
        for i in range(num_creatures):
            for j in range(i, num_creatures):
                delta_x = p_x[j] - p_x[i]
                delta_y = p_y[j] - p_y[i]
                distance_squared = (delta_x * delta_x) + (delta_y * delta_y)
                sum_of_radii = radii[types[i]] + radii[types[i]]
                if distance_squared < sum_of_radii * sum_of_radii:
                    pass


acceleration = .1
creature_radius = 10
spacing = 20
food_radius = 3

max_num_creatures = 1500
num_creatures = 0
max_num_food = 500
num_food = 0
max_num_entities = max_num_creatures + max_num_food
num_entities = 0
cycles = 1


p_x = np.empty((max_num_entities, 1), dtype=np.float32)
p_y = np.empty((max_num_entities, 1), dtype=np.float32)
radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
types = np.empty((max_num_entities, 1), dtype=np.uint8)

velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
max_velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
# types:
# male - 0
# female - 1
# food - 2
for x in range(1, 800 // spacing):
    for y in range(1, 600 // spacing):
        if num_creatures % 2 == 0:
            types[num_creatures] = 0
        else:
            types[num_creatures] = 1
        p_x[num_creatures] = x * spacing
        p_y[num_creatures] = y * spacing
        max_velocities[num_creatures] = 5
        num_creatures += 1


device_p_x = cuda.to_device(p_x)
device_p_y = cuda.to_device(p_y)
device_radii = cuda.to_device(radii)
device_types = cuda.to_device(types)
device_velocities = cuda.to_device(velocities)
device_max_velocities = cuda.to_device(max_velocities)
update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,
        acceleration, num_creatures, cycles)
print(device_p_x.copy_to_host()[0])

math.cos和math.sin中的1.0只是单个生物
方向的占位符我有一个​​围绕循环执行的循环次数。如果我尝试删除它,而只留下代码块来移动这些生物,即使我向它们添加常数,p_x,p_y或速度都不会改变。为什么不呢?

The 1.0 in math.cos and math.sin is just a placeholder for the directions of the individual creatures I have a surrounding loop executed cycles amount of times. If I try to remove it and only leave the block of code moving the creatures neither p_x, p_y or velocities have changed, even if I add a constant to them. Why not?

推荐答案

至少有两个问题:


  1. 您不是要初始化速度

velocities = np.empty((max_num_creatures, 1), dtype=np.float32)

我们可以通过以下方法来对其进行简单的测试:

we can fix that for a trivial test with:

velocities = np.ones((max_num_creatures, 1), dtype=np.float32)


  • 这不是正确的数组形状:

  • This isn't the correct array shape:

    p_x = np.empty((max_num_entities, 1), dtype=np.float32)
                   ^^^^^^^^^^^^^^^^^^^^^
    

    匹配您的内核签名:

    @cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
                    ^^^^^^^^^^
    

    我们可以通过以下方式解决此问题:

    we can fix that with:

    p_x = np.empty(max_num_entities, dtype=np.float32)
    

    ,对于 p_y 类型也是如此速度 max_velocities 。 (我想可能还会对 radio 进行一些更改,但是并不清楚您打算做什么,因为看起来您想要一个多维数组,但是正在以一维数组AFAICT的形式在内核中进行访问。此外,内核代码的这一部分是无能为力的,因此与手头的问题或多或少无关紧要。

    and likewise for p_y, types, velocities, and max_velocities. (I imagine some change may possibly be in order also for radii, but it's not entirely clear what you intend with that, since it appears you want a multi-dimensional array, but are accessing it in-kernel as a single-dimensional array, AFAICT. Furthermore, that section of your kernel code is a do-nothing, so it is more or less irrelevant for the problem at hand).

    当我进行这些更改时,我得到的似乎是合理的输出:

    When I make those changes, I get what appears to be rational output:

    $ cat t9.py
    import numpy as np
    import math
    from numba import cuda
    
    
    @cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
    def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):
        for c in range(cycles):
            for i in range(num_creatures):
                velocities[i] = velocities[i] + acceleration
                if velocities[i] > max_velocities[i]:
                    velocities[i] = max_velocities[i]
                p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])
                p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])
            for i in range(num_creatures):
                for j in range(i, num_creatures):
                    delta_x = p_x[j] - p_x[i]
                    delta_y = p_y[j] - p_y[i]
                    distance_squared = (delta_x * delta_x) + (delta_y * delta_y)
                    sum_of_radii = radii[types[i]] + radii[types[i]]
                    if distance_squared < sum_of_radii * sum_of_radii:
                        pass
    
    
    acceleration = .1
    creature_radius = 10
    spacing = 20
    food_radius = 3
    
    max_num_creatures = 1500
    num_creatures = 0
    max_num_food = 500
    num_food = 0
    max_num_entities = max_num_creatures + max_num_food
    num_entities = 0
    cycles = 1
    
    
    p_x = np.empty(max_num_entities, dtype=np.float32)
    p_y = np.empty(max_num_entities, dtype=np.float32)
    radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
    types = np.empty(max_num_entities, dtype=np.uint8)
    
    velocities = np.ones(max_num_creatures, dtype=np.float32)
    max_velocities = np.empty(max_num_creatures, dtype=np.float32)
    # types:
    # male - 0
    # female - 1
    # food - 2
    for x in range(1, 800 // spacing):
        for y in range(1, 600 // spacing):
            if num_creatures % 2 == 0:
                types[num_creatures] = 0
            else:
                types[num_creatures] = 1
            p_x[num_creatures] = x * spacing
            p_y[num_creatures] = y * spacing
            max_velocities[num_creatures] = 5
            num_creatures += 1
    
    
    device_p_x = cuda.to_device(p_x)
    device_p_y = cuda.to_device(p_y)
    device_radii = cuda.to_device(radii)
    device_types = cuda.to_device(types)
    device_velocities = cuda.to_device(velocities)
    device_max_velocities = cuda.to_device(max_velocities)
    update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,
            acceleration, num_creatures, cycles)
    print(device_p_x.copy_to_host())
    $ python t9.py
    [  2.05943317e+01   2.05943317e+01   2.05943317e+01 ...,   3.64769361e-11
       1.52645868e-19   1.80563260e+28]
    $
    

    还要注意,当前您仅启动一个线程的一个块,但我认为这与您的重新启动无关任务,目前。

    Also note that currently you are only launching one block of one thread, but I assume that is not pertinent to your request, currently.

    这篇关于CUDA函数不会在带有Numba的Python上执行For循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆