如果您的内核代码需要更多的线程而不是图形卡无法处理,该怎么办? [英] What happens if your kernel code requires more threads than your graphic card can handle?

查看:82
本文介绍了如果您的内核代码需要更多的线程而不是图形卡无法处理,该怎么办?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果内核代码需要更多的线程而不是图形卡无法处理,该怎么办?

What happens if kernel code requires more threads than graphic card can handle?

如果我创建的线程多于图形卡无法处理的会发生什么?

What happens if I am creating more threads than my graphic card can handle?

Visual Studio 2012 C ++ AMP如何处理此问题?

How does Visual Studio 2012 C++ AMP handles this?

运行时会处理某种排队模式机制吗?
例如:当一个线程结束时,下一个等待的线程可以继续执行其工作.

Will there be some kind of queueing mode mechanism handled by the runtime?
Ex: When one thread finish, then the next waiting thread can go ahead and do its job.

会降低性能/速度吗?
例如:如果没有排队模式机制,则所有创建的线程将同时启动,并争夺各种硬件资源.

Will it slow down the performance/speed?
Ex: If there is no queueing mode mechanism, then all created threads will start at same time and compete for all kind of hardware resources.

我不知道Visual Studio 2012在这种情况下如何设计,但是如果我的Nvidia GeForce 560 TI仅能处理12,288个(384个内核*每个内核32个线程)线程,而我的内核代码创建了20,000个线程,那么我期望VS 2012把多余的 在排队模式下有约8000个线程,每当一个工作线程完成时,就在队列模式下释放等待线程,并开始处理内核代码. VS 2012如何处理此问题?

I don't know how Visual Studio 2012 is designed to work in this situation but if my Nvidia GeForce 560 TI can only handle 12,288 (384 cores * 32 threads per core) threads and my kernel code creates 20,000 threads then I expect the VS 2012 to put the extra ~8000 threads in the queueing mode and whenever one of the working thread is finished, then the waiting thread in the queue mode is released and start working on a kernel code. How does VS 2012 handles this ?

推荐答案

调度是由GPU管理的.无论您计划在GPU上执行多少个线程,它们可能都不会在相同的时间运行,因为您知道GPU资源是有限的.

Hi, the scheduling is managed by the GPU. No matter how many threads you are planning to execute on the GPU, they might not be running at the *same* time because, you know, the GPU resource is limited.

在GPU上调度线程时,首先将它们分为多个块(或者,您可以根据需要将它们称为组).然后,将这些块按块基础分配给可用的内核.您可能会想到 如果没有空闲的内核可以执行,则有一些块等待调度.

When you're scheduling threads on the GPU, they are first divided into blocks (or you can call them groups if you like) of threads. Then those blocks will be assigned to available cores in a block-by-block basic. You can think of there'll be some blocks waiting for scheduling if there's no idle cores for executing.

实际上,您提到的数字12288是最大值.您的GPU并非专门用于计算,因此您将无法达到该数字,因为GPU需要用于图形处理的资源.所以回到您的问题,VS 2012不必关心这个,而是 GPU需要处理什么.

Actually, the number you mentioned, 12,288, is a max value. You GPU is not dedicated for computing, so you won't reach that number as the GPU needs resources for graphics things. So back to your question, VS 2012 doesn't have to care about this, rather it's what the GPU needs to take care of.

此外,由于使用了逐块基础,因此您的代码将具有透明可伸缩性的优势.假设您有20,000个线程,这些线程分为100个块(每个200个线程).如果您的代码在可以执行的低端卡上运行 同时20个方块,则需要5轮才能完成.如果在可以同时执行50个区块的高端卡上,则需要2轮才能完成.

What's more, thanks to the block-by-block basic, your code will have the benefit of transparent scalability. Say you have 20,000 threads which were divided into 100 blocks (each 200 threads). If your code is running on a low end card which could execute 20 blocks at the same time, then it'll take 5 rounds to finish. If on a high end card which could execute 50 blocks at the same time, then it'll take 2 rounds to finish.

线程号不是限制您的代码执行的限制,只需安排所需的线程号即可.通常,这将是成千上万个线程.但是每个线程使用的内存资源可能会占用,因此您需要注意这一点.

Thread number is not a limitation which prevents your code from executing, just schedule the number you need. Usually this will be hundreds of thousands of threads. But the memory resources each thread uses might, so you need to pay attention to that.


这篇关于如果您的内核代码需要更多的线程而不是图形卡无法处理,该怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆