使用sess.run()时Tensorflow崩溃 [英] Tensorflow crashes when using sess.run()

查看:76
本文介绍了使用sess.run()时Tensorflow崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python v2.7中使用了tensorflow 0.8.0。我的IDE是PyCharm,我的操作系统是Linux Ubuntu 14.04

I'm using tensorflow 0.8.0 with Python v2.7. My IDE is PyCharm and my os is Linux Ubuntu 14.04

我注意到以下代码导致计算机死机和/或崩溃:

I'm noticing that the following code causes my computer to freeze and/or crash:

# you will need these files!
# https://www.kaggle.com/c/digit-recognizer/download/train.csv
# https://www.kaggle.com/c/digit-recognizer/download/test.csv

import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.cm as cm

# read in the image data from the csv file
# the format is:    imagelabel  pixel0  pixel1 ... pixel783  (there are 42,000 rows like this)
data = pd.read_csv('../train.csv')
labels = data.iloc[:,:1].values.ravel()  # shape = (42000, 1)
labels_count = np.unique(labels).shape[0]  # = 10
images = data.iloc[:,1:].values   # shape = (42000, 784)
images = images.astype(np.float64)
image_size = images.shape[1]
image_width = image_height = np.sqrt(image_size).astype(np.int32)  # since these images are sqaure... hieght = width


# turn all the gray-pixel image-values into percentages of 255
# a 1.0 means a pixel is 100% black, and 0.0 would be a pixel that is 0% black (or white)
images = np.multiply(images, 1.0/255)


# create oneHot vectors from the label #s
oneHots = tf.one_hot(labels, labels_count, 1, 0)  #shape = (42000, 10)


#split up the training data even more (into validation and train subsets)
VALIDATION_SIZE = 3167

validationImages = images[:VALIDATION_SIZE]
validationLabels = labels[:VALIDATION_SIZE]

trainImages = images[VALIDATION_SIZE:]
trainLabels = labels[VALIDATION_SIZE:]






# -------------  Building the NN -----------------

# set up our weights (or kernals?) and biases for each pixel
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(.1, shape=shape, dtype=tf.float32)
    return tf.Variable(initial)


# convolution
def conv2d(x, W):
    return tf.nn.conv2d(x, W, [1,1,1,1], 'SAME')

# pooling
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


# placeholder variables
# images
x = tf.placeholder('float', shape=[None, image_size])
# labels
y_ = tf.placeholder('float', shape=[None, labels_count])



# first convolutional layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

# turn shape(40000,784)  into   (40000,28,28,1)
image = tf.reshape(trainImages, [-1,image_width , image_height,1])
image = tf.cast(image, tf.float32)
# print (image.get_shape()) # =>(40000,28,28,1)




h_conv1 = tf.nn.relu(conv2d(image, W_conv1) + b_conv1)
# print (h_conv1.get_shape()) # => (40000, 28, 28, 32)
h_pool1 = max_pool_2x2(h_conv1)
# print (h_pool1.get_shape()) # => (40000, 14, 14, 32)





# second convolutional layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
#print (h_conv2.get_shape()) # => (40000, 14,14, 64)
h_pool2 = max_pool_2x2(h_conv2)
#print (h_pool2.get_shape()) # => (40000, 7, 7, 64)




# densely connected layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

# (40000, 7, 7, 64) => (40000, 3136)
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
#print (h_fc1.get_shape()) # => (40000, 1024)





# dropout
keep_prob = tf.placeholder('float')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
print h_fc1_drop.get_shape()


#readout layer for deep neural net
W_fc2 = weight_variable([1024,labels_count])
b_fc2 = bias_variable([labels_count])
print b_fc2.get_shape()
mull= tf.matmul(h_fc1_drop, W_fc2)
print mull.get_shape()
print
mull2 = mull + b_fc2
print mull2.get_shape()

y = tf.nn.softmax(mull2)



# dropout
keep_prob = tf.placeholder('float')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)


sess = tf.Session()
sess.run(tf.initialize_all_variables())

print sess.run(mull[0,2])

租赁行导致崩溃:

print sess.run(mull [ 0,2])

print sess.run(mull[0,2])

这基本上是一个非常大的2d数组中的一个位置。导致sess.run出现问题。我也收到一个脚本问题弹出窗口……某种Google脚本(想想它是tensorflow吗?)。我无法复制链接,因为我的计算机已完全冻结。

This is basically one location in a very big 2d array. Something about the sess.run is causing it. I'm also getting a script issue popup... some sort of google script (think maybe it's tensorflow?). I can't copy the link because my computer is completely frozen.

推荐答案

我怀疑问题是由于 mull [0,2] —尽管它的表观大小很小-取决于非常大的计算量,包括多个卷积,最大池化和较大的矩阵乘法;因此,您的计算机可能长时间处于满载状态,或者内存不足。 (通过运行 top 并检查 python 进程使用了​​哪些资源,您应该能够分辨出哪个正在运行TensorFlow。)

I suspect the problem arises because mull[0, 2]—despite its small apparent size—depends on a very large computation, including multiple convolutions, max-poolings, and a large matrix multiplication; and therefore either your computer becomes fully loaded for a long period of time, or it runs out of memory. (You should be able to tell which by running top and checking what resources are used by the python process in which you are running TensorFlow.)

计算量之所以如此之大,是因为您的TensorFlow图是根据整个训练数据集 trainImages定义的,其中包含40000张图像:

The amount of computation is so large because your TensorFlow graph is defined in terms of the entire training dataset, trainImages, which contains 40000 images:

image = tf.reshape(trainImages, [-1,image_width , image_height,1])
image = tf.cast(image, tf.float32)

,使用 tf.placeholder()定义网络的效率更高,您可以将 单个培训示例或迷你一批例子。有关更多信息,请参见有关喂养的文档。特别是,由于您只对 mull 的第0行感兴趣,因此只需要从 trainImages 并对其进行计算以产生必要的值。 (在您当前的程序中,还将计算所有其他示例的结果,然后在最终的slice运算符中将其丢弃。)

Instead, it would be more efficient to define your network in terms of a tf.placeholder() to which you can feed individual training examples, or mini-batches of examples. See the documentation on feeding for more information. In particular, since you are only interested in the 0th row of mull, you only need to feed the 0th example from trainImages and perform computation on it to produce the necessary values. (In your current program, the results for all other examples are also being computed, and then discarded in the final slice operator.)

这篇关于使用sess.run()时Tensorflow崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆