如何在数据块上的ML运行时环境中启用GPU可见? [英] how to enable GPU visible for ML runtime environment on databricks?
问题描述
我正在尝试在数据砖/GPU(p2.xlarge)上以以下环境运行一些TensorFlow(2.2)示例代码:
I am trying to run some TensorFlow (2.2) example code on databricks/GPU (p2.xlarge) with environment as:
6.6 ML, spark 2.4.5, GPU, Scala 2.11
Keras version : 2.2.5
nvidia-smi
NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2
我已经检查了但是,我不想每次重新启动数据块GPU集群时都运行shell命令.
But, I do not want to run the shell commands every time the databricks GPU clusters is restarted.
所以,我通过databricks libs UI安装了TensorFlow,
so, I installed TensorFlow from databricks libs UI by
tensorflow==2.2.*
我不表示它适用于GPU或CPU.我认为默认情况下它是针对GPU的.
I do not indicate it is for GPU or CPU. I assume that it is for GPU by default.
我发现python3代码仅在CPU上而不在GPU上运行.
I found that the python3 code is only run on CPUs not on GPU.
import tensorflow as tf
physical_devices = tf.config.list_physical_devices()
physical_devices : [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'), PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]
visible_devices = tf.config.get_visible_devices()
visible devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
tf.test.gpu_device_name() # return empty string
is_built_with_cuda: True
is_built_with_gpu_support: True
is_built_with_rocm: False
is_built_with_xla: True
get_soft_device_placement : True
我正在尝试将ML运行时设置为可见的'XLA_GPU':
I am trying to set the 'XLA_GPU' visible to the ML runtime:
# https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices
# set GPU visible for TF runtime
physical_devices = tf.config.list_physical_devices('XLA_GPU')
try:
# enable first GPU
tf.config.set_visible_devices(physical_devices[0], 'XLA_GPU') # exception here !!!
logical_devices = tf.config.list_logical_devices('XLA_CPU')
# Logical device was created for first GPU
assert len(logical_devices) == len(physical_devices)
except:
# Invalid device or cannot modify virtual devices once initialized.
print('Invalid device or cannot modify virtual devices once initialized.')
但是,我有例外.
如何启用GPU,以便TF代码可以在其上运行?
How to enable GPU so that TF code can run on it ?
谢谢
推荐答案
安装 tensorflow-gpu
而不是tensorflow,因为它将主要在gpu上运行,而tensorflow将主要在cpu上运行.您无需编辑代码,因为它仍可以通过别名 tensorflow
Install tensorflow-gpu
instead of tensorflow, as that will run primarily on gpu while tensorflow will run primarily on cpu. You won't need to edit the code as it still imports by the alias tensorflow
这篇关于如何在数据块上的ML运行时环境中启用GPU可见?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!