在同一个GPU上运行多个Tensorflow进程是否不安全? [英] Is it unsafe to run multiple tensorflow processes on the same GPU?

查看:383
本文介绍了在同一个GPU上运行多个Tensorflow进程是否不安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只有一个GPU(Titan X Pascal,12 GB VRAM),我想在同一GPU上并行训练多个模型.

I only have one GPU (Titan X Pascal, 12 GB VRAM) and I would like to train multiple models, in parallel, on the same GPU.

我尝试将模型封装在单个python程序(称为model.py)中,并在model.py中包含代码以限制VRAM使用(基于

I tried encapsulated my model in a single python program (called model.py), and I included code in model.py to restrict VRAM usage (based on this example). I was able to run up to 3 instances of model.py concurrently on my GPU (with each instance taking a little less than 33% of my VRAM). Mysteriously, when I tried with 4 models I received an error:

2017-09-10 13:27:43.714908: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] coul d not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2017-09-10 13:27:43.714973: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] coul d not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-09-10 13:27:43.714988: F tensorflow/core/kernels/conv_ops.cc:672] Check failed : stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNon fusedAlgo<T>(), &algorithms) Aborted (core dumped)

2017-09-10 13:27:43.714908: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] coul d not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2017-09-10 13:27:43.714973: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] coul d not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-09-10 13:27:43.714988: F tensorflow/core/kernels/conv_ops.cc:672] Check failed : stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNon fusedAlgo<T>(), &algorithms) Aborted (core dumped)

我后来在tensorflow Github上观察到 人们似乎认为这是不安全的每个GPU运行一个以上的tensorflow进程.这是真的吗?是否有这种情况的解释?为什么我可以在同一GPU上运行3个tensorflow进程,而不是4个?

I later observed on the tensorflow Github that people seem to think that it is unsafe to have more than one tensorflow process running per GPU. Is this true, and is there an explanation for why this is the case? Why was I able to have 3 tensorflow processes running on the same GPU and not 4?

推荐答案

简而言之:是的,在同一个GPU上运行多个过程是安全的(截至2017年5月).以前这样做是不安全的.

In short: yes it is safe to run multiple procceses on the same GPU (as of May 2017). It was previously unsafe to do so.

链接到tensorflow源代码以确认这一点

这篇关于在同一个GPU上运行多个Tensorflow进程是否不安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆