在Tensorflow中跳过不存在或损坏的文件 [英] Skipping nonexistent or corrupt files in Tensorflow

查看:118
本文介绍了在Tensorflow中跳过不存在或损坏的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些文件包括图像文件路径和功能,并且某些图像可能丢失或损坏。我想知道如何强制地处理错误,通过跳过这些图像并将其从队列中删除。



我注意到,只需抓取错误并继续,将导致队列输出相同的图像,因此它会在同一图像上重复出错。有没有办法出错出现图像?



另外,我有一个'tf.Print()'语句来记录文件名,但是'Result:'我的日志中的行显示有效的图像被处理没有相应的打印输出。为什么'tf.Print()'仅打印不存在的文件的名称,而不是正确处理的文件?



下面是一个小例子,处理代码作为我的较大程序:



代码:

 #! / usr / bin / python3 

import tensorflow as tf

example_filename ='example.csv'
max_iterations = 20

## #创建图表###
filename_container_queue = tf.train.string_input_producer([example_filename])
filename_container_reader = tf.TextLineReader()

_,filename_container_contents = filename_container_reader.read( filename_container_queue)
image_filenames = tf.decode_csv(filename_container_contents,[tf.constant('',shape = [1],dtype = tf.string)])

#decode_jpeg只适用于一次单一图像
image_filename_batch = tf.train.shuffle_batch([image_filenames],batch_size = 1,capacity = 100,min_after_dequeue = 0)
image_filename = tf.reshape(image _filename_batch,[1])$ ​​b
$ b image_filenames_queue = tf.train.string_input_producer(image_filename)
image_reader = tf.WholeFileReader()
_,image_contents = image_reader.read(image_filenames_queue)
image = tf.image.decode_jpeg(tf.Print(image_contents,[image_filename]),channels = 3)

counter = tf.count_up_to(tf.Variable(tf.constant(0) ),max_iterations)

result_op = tf.reduce_mean(tf.image.convert_image_dtype(image,tf.float32),[0,1])#输出平均值红色,绿色,蓝色值。

init_op = tf.initialize_all_variables()

###运行图表###
print(Running graph)
with tf。 Session()as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord = coord)
sess.run([init_op])
n = 0
try:
while not coord.should_stop():
try:
result,n = sess.run([result_op,counter])
print (Result:,result)
除了tf.errors.NotFoundError作为e:
print(由于图像不存在而跳过文件)
#coord.request_stop(e) ---我们只想跳过,不要停止整个过程。
除了tf.errors.OutOfRangeError作为e:
print('完成训练 - %d迭代后达到的时期限制%n)
coord.request_stop(e)
finally :
coord.request_stop()
coord.join(threads)

数据:



example.csv包含:

  / home / mburge /图片/垃圾/ 109798.jpg 
nonexistent.jpg

程序输出:

  I tensorflow / stream_executor / dso_loader.cc:111]成功打开CUDA库libcublas.so本地
I tensorflow / stream_executor / dso_loader。 cc:111]成功打开CUDA库libcudnn.so本地
I tensorflow / stream_executor / dso_loader.cc:111]成功打开CUDA库libcufft.so本地
I tensorflow / stream_executor / dso_loader.cc:111]成功打开CUDA库libcuda.so.1本地
I tensorflow / stream_executor / dso_loader.cc:111]成功打开CUDA lib rary libcurand.so本地
运行图
I tensorflow / stream_executor / cuda / cuda_gpu_executor.cc:925]从SysFS读取的成功NUMA节点具有负值(-1),但必须至少有一个NUMA节点,所以返回N
UMA节点零
I tensorflow / core / common_runtime / gpu / gpu_device.cc:951]找到具有属性的设备0:
名称:GeForce GTX 1080
主要:6小调:1 memoryClockRate(GHz)1.8475
pciBusID 0000:01:00.0
总内存:7.92GiB
可用内存:6.83GiB
I tensorflow / core / common_runtime / gpu / gpu_device.cc:972] DMA:0
I tensorflow / core / common_runtime / gpu / gpu_device.cc:982] 0:Y
I tensorflow / core / common_runtime / gpu / gpu_device.cc: 1041]创建TensorFlow设备(/ gpu:0) - > (设备:0,名称:GeForce GTX 1080,pci总线id:0000:01:00.0)
I tensorflow / core / kernel / logging_ops.cc:79] [nonexistent.jpg]
结果: 0.33875707 0.39879724 0.28882763]
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
W tensorflow / core / framework / op_kernel.cc :968]未找到:nonexistent.jpg
[[Node:ReaderRead_1 = ReaderRead [_class = [loc:@WholeFileReader,loc:@ input_producer_1],_device =/ job:localhost / replica:0 / task:0 / cpu:0](WholeFileReader,input_produ
cer_1)]]
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像不存在而跳过文件
由于图像而跳过文件不存在
由于图像不存在而跳过文件
由于图像不存在而跳过文件
完成训练 - 0次迭代后达到的时期限制


解决方案

您可以手动定义出队操作:

  filename_deq = image_filenames_queue.dequeue()

如果找到一个文件读取问题,从文件名队列中出现该文件:

 除了tf.errors.NotFoundError为e:
print(由于图像不存在而跳过文件)
sess.run(filename_deq)


I have some files that include image filepaths and features, and some of the images may be missing or corrupt. I'm wondering how to robustly handle errors, by skipping these images and removing them from the queue.

I notice that simply catching the error and continuing will cause the queue to output the same image, so it will repeatedly error out on the same image. Is there a way to dequeue the image on error?

Also, I have a 'tf.Print()' statement to log the filename, but the 'Result:' line in my log shows that the valid image was processed with no corresponding print output. Why does 'tf.Print()' only print the name of the nonexistent file, not the correctly processed file?

Below is a small example, with the same error-handling code as my larger program:

Code:

#!/usr/bin/python3

import tensorflow as tf

example_filename = 'example.csv'
max_iterations = 20

### Create the graph ###
filename_container_queue = tf.train.string_input_producer([ example_filename ])
filename_container_reader = tf.TextLineReader()

_, filename_container_contents = filename_container_reader.read(filename_container_queue)
image_filenames = tf.decode_csv(filename_container_contents, [ tf.constant('', shape=[1], dtype=tf.string) ])

# decode_jpeg only works on a single image at a time
image_filename_batch = tf.train.shuffle_batch([ image_filenames ], batch_size=1, capacity=100, min_after_dequeue=0)
image_filename = tf.reshape(image_filename_batch, [1])

image_filenames_queue = tf.train.string_input_producer(image_filename)
image_reader = tf.WholeFileReader()
_, image_contents = image_reader.read(image_filenames_queue)
image = tf.image.decode_jpeg(tf.Print(image_contents, [ image_filename ]), channels=3)

counter = tf.count_up_to(tf.Variable(tf.constant(0)), max_iterations)

result_op = tf.reduce_mean(tf.image.convert_image_dtype(image, tf.float32), [0,1]) # Output average Red, Green, Blue values.

init_op = tf.initialize_all_variables()

### Run the graph ###
print("Running graph")
with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    sess.run([ init_op ])
    n = 0
    try:
        while not coord.should_stop():
            try:
                result, n = sess.run([ result_op, counter ])
                print("Result:", result)
            except tf.errors.NotFoundError as e:
                print("Skipping file due to image not existing")
                # coord.request_stop(e) <--- We only want to skip, not stop the entire process.
    except tf.errors.OutOfRangeError as e:
        print('Done training -- epoch limit reached after %d iterations' % n)
        coord.request_stop(e)
    finally:
        coord.request_stop()
        coord.join(threads)

Data:

example.csv contains:

/home/mburge/Pictures/junk/109798.jpg
nonexistent.jpg

Program Output:

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Running graph
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning N
UMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8475
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.83GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/kernels/logging_ops.cc:79] [nonexistent.jpg]
Result: [ 0.33875707  0.39879724  0.28882763]
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
W tensorflow/core/framework/op_kernel.cc:968] Not found: nonexistent.jpg
         [[Node: ReaderRead_1 = ReaderRead[_class=["loc:@WholeFileReader", "loc:@input_producer_1"], _device="/job:localhost/replica:0/task:0/cpu:0"](WholeFileReader, input_produ
cer_1)]]
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Done training -- epoch limit reached after 0 iterations

解决方案

You can manually define a dequeue op:

filename_deq = image_filenames_queue.dequeue()

and later, if you find a problem with reading a file, dequeue that file from the filename queue:

except tf.errors.NotFoundError as e:
    print("Skipping file due to image not existing")
    sess.run(filename_deq)

这篇关于在Tensorflow中跳过不存在或损坏的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆