Keras + Tensorflow优化摊位 [英] Keras + Tensorflow optimization stalls

查看:74
本文介绍了Keras + Tensorflow优化摊位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我安装了Theano(TH),Tensorflow(TF)和Keras. 基本测试似乎表明它们可以与GPU(GTX 1070),Cuda 8.0,cuDNN5.1一起使用.

I installed Theano (TH), Tensorflow (TF) and Keras. Basic testing seem to indicate that they work with the GPU (GTX 1070), Cuda 8.0, cuDNN5.1 .

如果我运行cifar10_cnn.py Keras示例以TH为后端,似乎可以正常工作,大约需要18s/epoch. 如果我使用TF来运行,那么几乎所有时间都 (它偶尔起作用,无法重现),在每个时期之后,优化将以acc = 0.1停止.好像权重没有更新.

If I run the cifar10_cnn.py Keras example with TH as backend, it seems to work ok, taking ~18s/epoch. If I run it with TF then,almost all the times (it has worked occasionally, can't reproduce it), the optimization stalls with acc=0.1 after every epoch. It is as if weights were not updated.

这很可惜,因为TF后端以大约10s/epoch的速度运行(即使它工作的次数很少).我正在使用Conda,对Python还是很陌生.如果有帮助,"conda list"似乎显示某些软件包的两个版本.

This is a shame because TF backend was taking ~10s/epoch (even the very few times it worked). I'm using Conda and I am very new to Python. If that helps, "conda list" seems to show two versions for some of the packages.

如果您有任何线索,请告诉我.谢谢.下面的屏幕截图:

If you have any clues, please let me know. Thanks. Screenshot below :

python cifar10_cnn.py

Using TensorFlow backend.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

X_train shape: (50000, 32, 32, 3)

50000 train samples

10000 test samples

Using real-time data augmentation.

Epoch 1/200

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 

name: GeForce GTX 1070

major: 6 minor: 1 memoryClockRate (GHz) 1.7845

pciBusID 0000:01:00.0

Total memory: 7.92GiB

Free memory: 7.60GiB

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)

50000/50000 [==============================] - 11s - loss: 2.3029 - acc: 0.0999 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 2/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 3/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 4/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 5/200

13184/50000 [======>.......................] - ETA: 7s - loss: 2.3026 - acc: 0.1044^CTraceback (most recent call last):

推荐答案

在我看来,这只是随机猜测,因为有10种可能性,而且有10%的时间是正确的.我唯一能想到的是您的学习率太高了.我已经看到学习率很高的模型有时会收敛,有时会不收敛.现在,在后端,我认为theano会执行更多优化,因此这可能会稍微影响某些方面.尝试将学习率降低10倍,看看它是否收敛.

It look to me like it is just random guessing since there are 10 possibilities and it is right 10% of the time. The only thing I can think of is that you learning rate is a bit too high. I have seen with a high learning rate models will sometimes converge and sometimes not converge. On the backend right now I think theano performs more optimizations so maybe this is slightly affecting something. Try lowering the learning rate by a factor of 10 and see if it converges.

这篇关于Keras + Tensorflow优化摊位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆