Google Cloud AI Platform Notebook实例将不会在Jupyter中使用GPU [英] Google Cloud AI Platform Notebook Instance won't use GPU with Jupyter

本文介绍了Google Cloud AI Platform Notebook实例将不会在Jupyter中使用GPU的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用预先构建的AI平台Jupyter Notebook实例通过一张Tesla K80卡训练模型.问题是我不认为该模型实际上是在GPU上训练的.

I'm using the pre-built AI Platform Jupyter Notebook instances to train a model with a single Tesla K80 card. The issue is that I don't believe the model is actually training on the GPU.

nvidia-smi在训练过程中返回以下内容:

nvidia-smi returns the following during training:

未找到正在运行的进程

不是找不到正在运行的进程",而是可变GPU使用率"是100%.似乎有些奇怪...

Not the "No Running Process Found" yet "Volatile GPU Usage" is 100%. Something seems strange...

...而且训练太慢了.

...And the training is excruciatingly slow.

几天前,每次笔记本运行后,我都无法释放GPU.发生这种情况时,我会收到一个OOM(内存不足错误).这要求我每次都进入控制台,找到GPU运行进程PID,并在重新运行笔记本电脑之前使用kill -9.但是,今天,我根本无法运行GPU?它永远不会显示正在运行的进程.

A few days ago, I was having issues with the GPU not being released after each notebook run. When this occurred I would receive a OOM (Out of memory error). This required me to go into the console every time, find the GPU running process PID and use kill -9 before re-running the notebook. However, today, I can't get the GPU to run at all? It never shows a running process.

我尝试了2个不同的GCP AI Platform Notebook实例(可用的tensorflow版本选项),但都没有成功.这些预构建"实例是否缺少某些东西?

I've tried 2 different GCP AI Platform Notebook instances (both of the available tensorflow version options) with no luck. Am I missing something with these "pre-built" instances.

预构建的AI平台笔记本部分

请澄清一下,我没有构建自己的实例,然后安装了对Jupyter笔记本的访问权限.相反,我使用了AI Platform子菜单下的内置Notebook实例选项.

Just to clarify, I did not build my own instance and then install access to Jupyter notebooks. Instead, I used the built-in Notebook instance option under the AI Platform submenu.

我是否仍需要在某处配置设置或安装库以继续使用/重置我选择的GPU?我的印象是虚拟机已经加载了Nvidia堆栈,应该可以随插即用的GPU.

Do I still need to configure a setting somewhere or install a library to continue using/reset my chosen GPU? I was under the impression that the virtual machine was already loaded with the Nvidia stack and should be plug and play with GPUs.

有想法吗?

以下是有关此问题的完整视频->

Here is a full video of the issue as requested --> https://www.youtube.com/watch?v=N5Zx_ZrrtKE&feature=youtu.be

推荐答案

通常来说,您将希望使用可能重现错误的最小代码尝试尝试调试此类问题.这样可以消除您遇到的问题的许多可能原因.

Generally speaking, you'll want to try to debug issues like this using the smallest possible bit of code that could reproduce your error. That removes many possible causes for the issue you're seeing.

在这种情况下,您可以通过运行以下代码来检查是否正在使用您的GPU(从

In this case, you can check if your GPUs are being used by running this code (copied from the TensorFlow 2.0 GPU instructions):

import tensorflow as tf
print("GPU Available: ", tf.test.is_gpu_available())

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

在同一TF 2.0笔记本上运行它会为我提供输出:

Running it on the same TF 2.0 Notebook gives me the output:

GPU Available:  True
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)

那里的那边表明它正在使用GPU

That right there shows that it's using the GPUs

类似地,如果您需要更多证据,运行nvidia-smi可以得到输出:

Similarly, if you need more evidence, running nvidia-smi gives the output:

jupyter@tf2:~$ nvidia-smi
Tue Jul 30 00:59:58 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    58W / 149W |  10900MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7852      C   /usr/bin/python3                           10887MiB |
+-----------------------------------------------------------------------------+

那么为什么您的代码不使用GPU?您正在使用其他人编写的库,可能是出于教程目的.这些库函数很可能正在执行某些操作,导致使用CPU而不是GPU.

So why isn't your code using GPUs? You're using a library someone else wrote, probably for tutorial purposes. Most likely those library functions are doing something that is causing CPUs to be used instead of GPUs.

您将要直接调试该代码.

You'll want to debug that code directly.

这篇关于Google Cloud AI Platform Notebook实例将不会在Jupyter中使用GPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆