如何从Windows 10上的单个进程在辅助GPU上使用100%的VRAM? [英] How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

查看:215
本文介绍了如何从Windows 10上的单个进程在辅助GPU上使用100%的VRAM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是在Windows 10计算机上,没有将显示器连接到Nvidia卡。
我已经包含nvida-smi的输出,显示> 5.04G可用。

This is on windows 10 computer with no monitor attached to the Nvidia card. I've included output from nvida-smi showing > 5.04G was available.

这是张量流代码,要求它分配的分配量比我的分配量略多以前见过:(我希望它尽可能接近内存分数= 1.0)

Here is the tensorflow code asking it to allocate just slightly more than I had seen previously: (I want this to be as close as possible to memory fraction=1.0)

config = tf.ConfigProto()
#config.gpu_options.allow_growth=True
config.gpu_options.per_process_gpu_memory_fraction=0.84
config.log_device_placement=True
sess = tf.Session(config=config)

在运行jupyter笔记本中的上述行之前,我运行了nvida-smi:

Just before running the above line in a jupyter notebook I ran nvida-smi:

    +-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.51                 Driver Version: 376.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 0000:01:00.0     Off |                  N/A |
|  0%   27C    P8     5W / 120W |     43MiB /  6144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

TF成功分配5.01GB后的输出,显示无法从设备分配5.04G(5411658752字节):CUDA_ERROR_OUT_OF_MEMORY(您需要滚动到右侧才能在下面看到它)

Output from TF after it successfully allocates 5.01GB, shows "failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY" (you need to scroll to the right to see it below)

2017-12-17 03:53:13.959871: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 5.01GiB
2017-12-17 03:53:13.960006: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2017-12-17 03:53:13.961152: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
2017-12-17 03:53:14.151073: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

我的最佳猜测是Nvidia用户级dll中的某些策略阻止使用所有内存(也许允许连接显示器?)

My best guess is some policy in an Nvidia user level dll is preventing use of all of the memory (perhaps to allow for attaching a monitor?)

如果那个理论是正确的,我正在寻找可以在Windows 10上将其关闭的任何用户可访问的旋钮。
如果我走错了轨道,那么可以向正确的方向提供帮助。

If that theory is correct I'm looking for any user accessible knob to turn that off on windows 10. If I'm on the wrong track any help to point in the right direction is appreciated.

我意识到我没有进行这项研究:tensorflow中的以下代码表明stream_exec正在'告诉'仅5.01GB可用的TensorFlow。这是我目前的理论认为某些Nvidia组件阻止分配的主要原因。 (但是我可能会误解是哪个组件实现了实例化的stream_exec。)

I realized I did not include this bit of research: The following code in tensorflow indicates stream_exec is 'telling' TensorFlow that only 5.01GB is free. This is the primary reason for my current theory that some Nvidia component is preventing the allocation. (However I could be misunderstanding what component implements the instantiated stream_exec.)

auto stream_exec = executor.ValueOrDie();
int64 free_bytes;
int64 total_bytes;
if (!stream_exec->DeviceMemoryUsage(&free_bytes, &total_bytes)) {
  // Logs internally on failure.
  free_bytes = 0;
  total_bytes = 0;
}
const auto& description = stream_exec->GetDeviceDescription();
int cc_major;
int cc_minor;
if (!description.cuda_compute_capability(&cc_major, &cc_minor)) {
  // Logs internally on failure.
  cc_major = 0;
  cc_minor = 0;
}
LOG(INFO) << "Found device " << i << " with properties: "
          << "\nname: " << description.name() << " major: " << cc_major
          << " minor: " << cc_minor
          << " memoryClockRate(GHz): " << description.clock_rate_ghz()
          << "\npciBusID: " << description.pci_bus_id() << "\ntotalMemory: "
          << strings::HumanReadableNumBytes(total_bytes)
          << " freeMemory: " << strings::HumanReadableNumBytes(free_bytes);
}



编辑#2:



下面的线程表明Windows 10通过抢占一定百分比的VRAM,阻止在用于计算的辅助视频卡中普遍使用VRAM:
https://social.technet.microsoft.com/Forums/windows/zh-CN/15b9654e-5da7-45b7-93de-e8b63faef064/windows-10-does-not- let-cuda-applications-to-all-vram-on-secondary-graphics-cards?forum = win10itprohardware

该线程似乎难以置信

将标题更新为更清晰位置。反馈表明,作为Microsoft或Nvidia的错误,这样做可能会更好。我正在寻找其他途径来解决这个问题。但是,我不希望这不能直接解决。

进一步的实验表明,我遇到的问题是针对单个流程的大量分配。

Update title to more clearly be a question. Feedback indicates this may be better as a bug to Microsoft or Nvidia. I am pursuing other avenues to get this addressed. However I don't want to assume this cannot be resolved directly.
Further experiments do indicate that the issue I am hitting is for the case of a large allocation from a single process. All of the VRAM can be used when another process comes into play.

这里的失败是所有的VRAM都可以使用。是分配失败,根据上面的NVIDIA-SMI,我使用了43MiB(也许是系统使用的),但不是可识别的过程。我看到的失败类型是单一分配。在典型的分配模型下,需要连续的地址空间。因此,相关的问题可能是:导致该43MiB被使用的原因是什么?是否将其放置在地址空间中,使得5.01 GB的分配空间是最大的可用连续空间?

The failure here is an allocation failure, and according to the NVIDIA-SMI above I have 43MiB in use (perhaps by the system?), but not by an identifiable process. The type of failure I'm seeing is of a monolithic single allocation. Under a typical allocation model that requires a continuous address space. So the pertinent question may be: What is causing that 43MiB to be used? Is that placed in the address space such that the 5.01 GB allocation is the max contiguous space available?

推荐答案

显然不可能目前,由于Windows Display Driver Model 2.x已定义了限制,因此没有任何过程可以覆盖{ Legally }。

It is clearly not possible for now, as Windows Display Driver Model 2.x has a limit defined, and no process can override it {Legally}.

假定您已经使用首选最大性能设置,可以使用电源将其设置为最大92%。

Assuming you have played with "Prefer Maximum Performance Setting" with that you can push it to at max 92% with Power Supply.

如果您想了解有关WDDM 2.x的更多信息,这将为您提供详细帮助:

This would help you in detail, if you like to know more about the WDDM 2.x:

https ://docs.microsoft.com/zh-CN/windows-hardware/drivers/display/what-s-new-for-windows-threshold-display-drivers--wddm-2-0-

这篇关于如何从Windows 10上的单个进程在辅助GPU上使用100%的VRAM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆