为什么GPU可以比CPU更快地做矩阵乘法? [英] Why can GPU do matrix multiplication faster than CPU?

查看:37
本文介绍了为什么GPU可以比CPU更快地做矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 GPU 一段时间了,但我没有质疑它,但现在我很好奇.

I've been using GPU for a while without questioning it but now I'm curious.

为什么GPU做矩阵乘法的速度比CPU快得多?是因为并行处理吗?但是我没有写任何并行处理代码.它自己会自动完成吗?

Why can GPU do matrix multiplication much faster than CPU? Is it because of parallel processing? But I didn't write any parallel processing code. Does it do it automatically by itself?

任何直觉/高级解释将不胜感激!

Any intuition / high-level explanation will be appreciated!

推荐答案

你如何并行化计算?

GPU 能够进行大量并行计算.比 CPU 能做的要多得多.看看这个向量相加的例子,假设有 1M 个元素.

How do you parallelize the computations?

GPU's are able to do a lot of parallel computations. A Lot more than a CPU could do. Look at this example of vector addition of let's say 1M elements.

使用 CPU 假设您可以运行的最大线程数为 100:(100 多得多,但让我们假设一段时间)

Using a CPU let's say you have 100 maximum threads you can run : (100 is lot more but let's assume for a while)

在一个典型的多线程示例中,假设您在所有线程上并行添加.

In a typical multi-threading example let's say you parallelized additions on all threads.

这就是我的意思:

c[0] = a[0] + b[0] # let's do it on thread 0
c[1] = a[1] + b[1] # let's do it on thread 1
c[101] = a[101] + b[101] # let's do it on thread 1

我们能够这样做是因为 c[0] 的值不依赖于除 a[0] 和 b[0] 之外的任何其他值.所以每个添加都是独立于其他的.因此,我们能够轻松地将任务并行化.

We are able to do it because value of c[0], doesn't depend upon any other values except a[0] and b[0]. So each addition is independent of others. Hence, we were able to easily parallelize the task.

正如您在上面的示例中看到的那样,同时添加 100 个不同的元素可以节省您的时间.这样,添加所有元素需要 1M/100 = 10,000 步.

As you see in above example that simultaneously all the addition of 100 different elements take place saving you time. In this way it takes 1M/100 = 10,000 steps to add all the elements.

现在考虑一下今天的 GPU 大约有 2048 个线程,所有线程可以在恒定时间内独立完成 2048 个不同的操作.因此给予提升.

Now consider today's GPU with about 2048 threads, all threads can independently do 2048 different operations in constant time. Hence giving a boost up.

在你的矩阵乘法的情况下.您可以并行化计算,因为 GPU 有更多线程,并且在每个线程中您有多个块.所以很多计算都是并行化的,因此计算速度很快.

In your case of matrix multiplication. You can parallelize the computations, Because GPU have much more threads and in each thread you have multiple blocks. So a lot of computations are parallelized, resulting quick computations.

但我没有为我的 GTX1080 编写任何并行处理程序!它自己做吗?

But I didn't write any parallel processing for my GTX1080! Does it do it by itself?

几乎所有机器学习框架都使用所有可能操作的并行化实现.这是通过 CUDA 编程、NVIDIA API 在 NVIDIA GPU 上进行并行计算来实现的.你不显式写,都是底层做的,连你自己都不知道.

Almost all the framework for machine learning uses parallelized implementation of all the possible operations. This is achieved by CUDA programming, NVIDIA API to do parallel computations on NVIDIA GPU's. You don't write it explicitly, it's all done at low level, and you do not even get to know.

是的,这并不意味着您编写的 C++ 程序会自动并行化,仅仅因为您有 GPU.不,你需要使用CUDA来编写它,只有这样它才会被并行化,但大多数编程框架都有它,所以你最终不需要.

Yes it doesn't mean that a C++ program you wrote will automatically be parallelized, just because you have a GPU. No, you need to write it using CUDA, only then it will be parallelized, but most programming framework have it, So it is not required from your end.

这篇关于为什么GPU可以比CPU更快地做矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆