为什么Python和CUDA不支持半精度复数浮点运算? [英] Why is half-precision complex float arithmetic not supported in Python and CUDA?

查看:131
本文介绍了为什么Python和CUDA不支持半精度复数浮点运算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

NumPY具有 complex64 对应于两个float32.

NumPY has complex64 corresponding to two float32's.

但是它也有float16,但是没有complex32.

But it also has float16's but no complex32.

为什么?我有涉及FFT的信号处理计算,我认为我可以使用complex32,但我不知道如何到达那里.特别是,我希望通过 cupy 在NVidia GPU上加快速度.

How come? I have signal processing calculation involving FFT's where I think I'd be fine with complex32, but I don't see how to get there. In particular I was hoping for speedup on NVidia GPU with cupy.

但是,float16似乎是

However it seems that float16 is slower on GPU rather than faster.

为什么不支持和/或忽略半精度?

Why is half-precision unsupported and/or overlooked?

与此相关的是为什么我们没有提速的机会.

Also related is why we don't have complex integers, as this may also present an opportunity for speedup.

推荐答案

CuPy存储库中已出现此问题已有一段时间:

This issue has been raised in the CuPy repo for some time:

https://github.com/cupy/cupy/issues/3370

但是还没有具体的工作计划;大多数事情仍然具有探索性.

But there's no concrete work plan yet; most of the things are still of explorative nature.

解决这个问题并不容易的原因之一是,没有可以直接导入的 numpy.complex32 dtype(请注意,所有CuPy的dtypes都是NumPy的别名),因此d当请求设备主机传输时出现问题.另一件事是,没有为 complex32 在CPU或GPU上编写本机数学函数,因此我们将需要自己编写它们才能进行强制转换,ufunc和其他操作.在链接的问题中,有一个指向NumPy讨论的链接,我的印象是,目前尚未考虑它.

One of the reasons that it's not trivial to work out is that there's no numpy.complex32 dtype that we can directly import (note that all CuPy's dtypes are just alias of NumPy's), so there'd be problems when a device-host transfer is asked. The other thing is there's no native mathematical functions written either on CPU or GPU for complex32, so we will need to write them all ourselves to do casting, ufunc, and what not. In the linked issue there is a link to a NumPy discussion, and my impression is it's currently not being considered...

这篇关于为什么Python和CUDA不支持半精度复数浮点运算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆