同时运行多个tensorflow会话 [英] Running multiple tensorflow sessions concurrently

查看:194
本文介绍了同时运行多个tensorflow会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在具有64个CPU的CentOS 7计算机上同时运行多个TensorFlow会话.我的同事报告说,他可以使用以下两个代码块使用4个内核在其计算机上产生并行加速:

I am trying to run several sessions of TensorFlow concurrently on a CentOS 7 machine with 64 CPUs. My colleague reports that he can use the following two blocks of code to produce a parallel speedup on his machine using 4 cores:

mnist.py

import numpy as np
import input_data
from PIL import Image
import tensorflow as tf
import time


def main(randint):
    print 'Set new seed:', randint
    np.random.seed(randint)
    tf.set_random_seed(randint)
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

    # Setting up the softmax architecture
    x = tf.placeholder("float", [None, 784])
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    y = tf.nn.softmax(tf.matmul(x, W) + b)

    # Setting up the cost function
    y_ = tf.placeholder("float", [None, 10])
    cross_entropy = -tf.reduce_sum(y_*tf.log(y))
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

    # Initialization 
    init = tf.initialize_all_variables()
    sess = tf.Session(
        config=tf.ConfigProto(
            inter_op_parallelism_threads=1,
            intra_op_parallelism_threads=1
        )
    )
    sess.run(init)

    for i in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

    print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

if __name__ == "__main__":
    t1 = time.time()
    main(0)
    t2 = time.time()
    print "time spent: {0:.2f}".format(t2 - t1)

parallel.py

import multiprocessing
import numpy as np

import mnist
import time

t1 = time.time()
p1 = multiprocessing.Process(target=mnist.main,args=(np.random.randint(10000000),))
p2 = multiprocessing.Process(target=mnist.main,args=(np.random.randint(10000000),))
p3 = multiprocessing.Process(target=mnist.main,args=(np.random.randint(10000000),))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
t2 = time.time()
print "time spent: {0:.2f}".format(t2 - t1)

特别是,他说他遵守

Running a single process took: 39.54 seconds
Running three processes took: 54.16 seconds

但是,当我运行代码时:

However, when I run the code:

python mnist.py
==> Time spent: 5.14

python parallel.py 
==> Time spent: 37.65

如您所见,通过使用多处理程序,我的速度显着下降,而我的同事却没有.有谁知道为什么会发生这种情况以及如何解决它?

As you can see, I get a significant slowdown by using multiprocessing whereas my colleague does not. Does anyone have any insight as to why this could be occurring and what can be done to fix it?

编辑

这是一些示例输出.请注意,加载数据似乎是并行进行的,但是训练各个模型的结果在输出中会非常有序(并且可以在程序执行时通过查看top中的CPU使用率来验证)

Here is some example output. Notice that loading the data seems to occur in parallel, but training the individual models has a very sequential look in the output (and which can be verified by looking at CPU usage in top as the program executes)

#$ python parallel.py 
Set new seed: 9672406
Extracting MNIST_data/train-images-idx3-ubyte.gz
Set new seed: 4790824
Extracting MNIST_data/train-images-idx3-ubyte.gz
Set new seed: 8011659
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 1
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 1
0.9136
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 1
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 1
0.9149
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 1
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 1
0.8931
time spent: 41.36

另一个编辑

假设我们希望确认此问题似乎是由TensorFlow引起的,而不是由多处理引起的.我将mnist.py的内容替换为一个大循环,如下所示:

Suppose we wish to confirm that the issue is seemingly with TensorFlow and not with multiprocessing. I replaced the contents of mnist.py with a big loop as follows:

def main(randint):
    c = 0
    for i in xrange(100000000):
        c += i

对于输出:

#$ python mnist.py
==> time spent: 5.16
#$ python parallel.py 
==> time spent: 4.86

因此,我认为这里的问题不在于多重处理本身.

Hence I think the problem here is not with multiprocessing itself.

推荐答案

来自OP的评论( user1936768 ):

我有个好消息:事实证明,至少在我的系统上,我的试用程序没有执行足够长的时间来启动TF的其他实例.当我在主机中放置一个运行时间更长的示例程序时,的确确实看到了并发计算

I have good news: It turns out, on my system at least, my trial programs didn't execute long enough for the other instances of TF to start up. When I put a longer running example program in main, I do indeed see concurrent computations

这篇关于同时运行多个tensorflow会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆