如何*正确*从csv的数据读入TensorFlow [英] How to *correctly* read data from csv's into TensorFlow

查看:6353
本文介绍了如何*正确*从csv的数据读入TensorFlow的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这样发布,告诉我们如何设置代码使用队列在csv文件中读取。但是,每次我运行它,我遇到一个错误。我试过调试它,但不能弄清楚错误的意思。任何人都可以帮助我吗?



我使用的代码几乎是逐字地发布在上面的帖子:

 导入张量流为tf 

dataset ='/Users/hdadmin/Data/actions/testing.csv'

def file_len (fname):
with open(fname)as f:
for i,l in enumerate(f):
pass
return i + 1
$ b b def read_from_csv(filename_queue):
reader = tf.TextLineReader(skip_header_lines = 1)
_,csv_row = reader.read(filename_queue)
record_defaults = [[0],[0] [0],[0],[0]]
colHour,colQuarter,colAction,colUser,colLabel = tf.decode_csv(csv_row,record_defaults = record_defaults)
features = tf.pack([colHour,colQuarter ,colAction,colUser])
label = tf.pack([colLabel])
返回特征,标签

def input_pipeline(batch_size,num_epochs = None):
filename_queue = tf.train.string_input_producer([dataset],num_epochs = num_epochs,shuffle = True)
example,label = read_from_csv(filename_queue)
min_after_dequeue = 1000
capacity = min_after_dequeue + 3 * batch_size
example_batch,label_batch = tf.train.shuffle_batch(
[example,label],batch_size = batch_size,capacity = capacity,
min_after_dequeue = min_after_dequeue)
return example_batch,label_batch

file_length = file_len(dataset) - 1
示例,labels = input_pipeline(file_length,1)

使用tf.Session()as sess:
tf .initialize_all_variables()。run()

#开始填充文件名队列
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners

try:
while not coord.should_stop():
example_batch,label_batch = sess.run([examples,labels])
print(example_batch)
except tf.errors.OutOfRangeError:
print('Done training,epoch reached')
finally:
coord.request_stop()

coord.join线程)

我得到的错误是:


$ b尝试使用未初始化的值input_producer / limit_epochs / epochs
[[Node:input_producer / limit_epochs / CountUpTo = CountUpTo [T = DT_INT64,_class = [loc:@ input_producer / limit_epochs / epochs],limit = 1,_device =/ job:localhost / replica:0 / task:0 / cpu:0](input_producer / limit_epochs / epochs)]]
E tensorflow / core / client / tensor_c_api.cc:485] RandomShuffleQueue'_2_shuffle_batch / random_shuffle_queue'关闭并且元素不足(请求10000,当前大小为0)
[ shuffle_batch = QueueDequeueMany [_class = [loc:@ shuffle_batch / random_shuffle_queue],component_types = [DT_INT32,DT_INT32],timeout_ms = -1,_device =/ job:localhost / replica:0 / task:0 / cpu: ](shuffle_batch / random_shuffle_queue,shuffle_batch / n)]]
完成训练,到达纪元
E tensorflow / core / client / tensor_c_api.cc:485] FIFOQueue'_0_input_producer' 1,当前大小0)
[[Node:ReaderRead = ReaderRead [_class = [loc:@TextLineReader,loc:@input_producer],_device =/ job:localhost / replica:0 / task: 0 / cpu:0](TextLineReader,input_producer)]]
E tensorflow / core / client / tensor_c_api.cc:485]队列'_2_shuffle_batch / random_shuffle_queue'
[[Node:shuffle_batch / random_shuffle_queue_Close = QueueClose [_class = [loc:@ shuffle_batch / random_shuffle_queue],cancel_pending_enqueues = false,_device =/ job:localhost / replica:0 / task:0 / cpu:0 ](shuffle_batch / random_shuffle_queue)]]
回溯(最近一次调用):
文件csv_test.py,第49行,在< module>
coord.join(threads)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py,第357行,在join
six.reraise(* self._exc_info_to_raise)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/queue_runner.py,第185行_run
sess.run(enqueue_op)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py,第382行,在运行
run_metadata_ptr)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py,第655行,位于_run
中feed_dict_string ,options,run_metadata)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py,行723,在_do_run
target_list中, options,run_metadata)
在_do_call
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py,行743中提高类型e)(node_def,op,message)
tensorflow.python.framework.errors.FailedPreconditionError:尝试使用未初始化的值input_producer / limit_epochs / epochs
[Node:input_producer / limit_epochs / CountUpTo = CountUpTo [T = DT_INT64,_class = [loc:@ input_producer / limit_epochs / epochs],limit = 1,_device =/ job:localhost / replica:0 / task:0 / cpu:0](input_producer / limit_epochs / epochs) ]]
由op u'input_producer / limit_epochs / CountUpTo创建,定义为:
在< module>中的文件csv_test.py,第31行。
examples,labels = input_pipeline(file_length,1)
文件csv_test.py,第21行,在input_pipeline中
filename_queue = tf.train.string_input_producer([dataset],num_epochs = num_epochs, shuffle = True)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py,第194行,在string_input_producer中
summary_name = fraction_of_%d_full%capacity)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py,第133行,在input_producer
中input_tensor = limit_epochs(input_tensor,num_epochs)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py,第84行,在limit_epochs
counter = epochs.count_up_to(num_epochs)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py,第577行,在count_up_to
return state_ops.count_up_to(self._variable,limit = limit)
文件/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py,行127,in count_up_to
result = _op_def_lib.apply_op(CountUpTo,ref = ref,limit = limit,name = name)
文件/usr/local/anaconda2/lib/python2.7/site -packages / tensorflow / python / framework / op_def_library.py,行703,在apply_op
中op_def = op_def)
文件/usr/local/anaconda2/lib/python2.7/site-packages/ tensorflow / python / framework / ops.py,第2310行,位于create_op
original_op = self._default_original_op,op_def = op_def)
文件/usr/local/anaconda2/lib/python2.7/site -packages / tensorflow / python / framework / ops.py,行1232,在__init__
self._traceback = _extract_stack()

我构成了由五列组成的数据以匹配示例。它是这样的:

 v1,v2,v3,v4,v5 
1,1,1,3,10
4,2,1,10,8
1,4,1,9,3
3,3,1,1 ,5
3,4,1,4,3
3,2,1,5,8
1,1,1,9,7
4,1,1 ,4,9
2,3,1,8,4

提前感谢。

解决方案

我想你缺少的是初始化 local 变量c> input_producer / limit_epochs / epochs )。



您可以添加以下初始化操作,初始化所有的变量

  init_op = tf.group(tf.initialize_all_variables(),
tf.initialize_local_variables())

,然后:

 code> sess.run(init_op)


I came across this so posting showing us how to setup the code to read in csv files using a queue. However, each time I run it, I run into an error. I've tried debugging it, but can't figure out what the error means. Can anyone help me out?

The code I'm using is almost verbatim what was posted in the above post:

import tensorflow as tf

dataset = '/Users/hdadmin/Data/actions/testing.csv'

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

def read_from_csv(filename_queue):
  reader = tf.TextLineReader(skip_header_lines=1)
  _, csv_row = reader.read(filename_queue)
  record_defaults = [[0],[0],[0],[0],[0]]
  colHour,colQuarter,colAction,colUser,colLabel = tf.decode_csv(csv_row, record_defaults=record_defaults)
  features = tf.pack([colHour,colQuarter,colAction,colUser])  
  label = tf.pack([colLabel])  
  return features, label

def input_pipeline(batch_size, num_epochs=None):
  filename_queue = tf.train.string_input_producer([dataset], num_epochs=num_epochs, shuffle=True)  
  example, label = read_from_csv(filename_queue)
  min_after_dequeue = 1000
  capacity = min_after_dequeue + 3 * batch_size
  example_batch, label_batch = tf.train.shuffle_batch(
      [example, label], batch_size=batch_size, capacity=capacity,
      min_after_dequeue=min_after_dequeue)
  return example_batch, label_batch

file_length = file_len(dataset) - 1
examples, labels = input_pipeline(file_length, 1)

with tf.Session() as sess:
  tf.initialize_all_variables().run()

  # start populating filename queue
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  try:
    while not coord.should_stop():
      example_batch, label_batch = sess.run([examples, labels])
      print(example_batch)
  except tf.errors.OutOfRangeError:
    print('Done training, epoch reached')
  finally:
    coord.request_stop()

  coord.join(threads) 

The error I'm getting is:

E tensorflow/core/client/tensor_c_api.cc:485] Attempting to use uninitialized value input_producer/limit_epochs/epochs
     [[Node: input_producer/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@input_producer/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](input_producer/limit_epochs/epochs)]]
E tensorflow/core/client/tensor_c_api.cc:485] RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 10000, current size 0)
     [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]
Done training, epoch reached
E tensorflow/core/client/tensor_c_api.cc:485] FIFOQueue '_0_input_producer' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: ReaderRead = ReaderRead[_class=["loc:@TextLineReader", "loc:@input_producer"], _device="/job:localhost/replica:0/task:0/cpu:0"](TextLineReader, input_producer)]]
E tensorflow/core/client/tensor_c_api.cc:485] Queue '_2_shuffle_batch/random_shuffle_queue' is already closed.
     [[Node: shuffle_batch/random_shuffle_queue_Close = QueueClose[_class=["loc:@shuffle_batch/random_shuffle_queue"], cancel_pending_enqueues=false, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue)]]
Traceback (most recent call last):
  File "csv_test.py", line 49, in <module>
    coord.join(threads) 
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 357, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/queue_runner.py", line 185, in _run
    sess.run(enqueue_op)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 382, in run
    run_metadata_ptr)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 655, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 723, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 743, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value input_producer/limit_epochs/epochs
     [[Node: input_producer/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@input_producer/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](input_producer/limit_epochs/epochs)]]
Caused by op u'input_producer/limit_epochs/CountUpTo', defined at:
  File "csv_test.py", line 31, in <module>
    examples, labels = input_pipeline(file_length, 1)
  File "csv_test.py", line 21, in input_pipeline
    filename_queue = tf.train.string_input_producer([dataset], num_epochs=num_epochs, shuffle=True)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 194, in string_input_producer
    summary_name="fraction_of_%d_full" % capacity)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 133, in input_producer
    input_tensor = limit_epochs(input_tensor, num_epochs)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 84, in limit_epochs
    counter = epochs.count_up_to(num_epochs)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 577, in count_up_to
    return state_ops.count_up_to(self._variable, limit=limit)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 127, in count_up_to
    result = _op_def_lib.apply_op("CountUpTo", ref=ref, limit=limit, name=name)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2310, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1232, in __init__
    self._traceback = _extract_stack()

I made up data comprised of five columns to match with the example. It's something along the lines of:

"v1","v2","v3","v4","v5"
1,1,1,3,10
4,2,1,10,8
1,4,1,9,3
3,3,1,1,5
3,4,1,4,3
3,2,1,5,8
1,1,1,9,7
4,1,1,4,9
2,3,1,8,4

Thanks ahead of time.

解决方案

I think what you are missing is initialization of local variables (e.g. input_producer/limit_epochs/epochs). Local variables are not initialized during initialization of all variables, which btw is quite confusing.

You can add the following initialization operation that will initialize everything at once:

init_op = tf.group(tf.initialize_all_variables(),
                   tf.initialize_local_variables())

and then:

sess.run(init_op)

这篇关于如何*正确*从csv的数据读入TensorFlow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆