Tensorflow:使用输入管道 (.csv) 作为训练字典 [英] Tensorflow: using an input-pipeline (.csv) as a dictionary for training

查看:38
本文介绍了Tensorflow:使用输入管道 (.csv) 作为训练字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 .csv 数据集(5008 列,533 行)上训练模型.我正在使用 textreader 将数据解析为两个张量,一个保存要在 [example] 上训练的数据,另一个保存正确的标签 [label]:

I'm trying to train a model on a .csv dataset (5008 columns, 533 rows). I'm using a textreader to parse the data into two tensors, one holding the data to train on [example] and one holding the correct labels [label]:

def read_my_file_format(filename_queue):
    reader = tf.TextLineReader()
    key, record_string = reader.read(filename_queue)
    record_defaults = [[0.5] for row in range(5008)]

    #Left out most of the columns for obvious reasons
    col1, col2, col3, ..., col5008 = tf.decode_csv(record_string, record_defaults=record_defaults)
    example = tf.stack([col1, col2, col3, ..., col5007])
    label = col5008
    return example, label

def input_pipeline(filenames, batch_size, num_epochs=None):
    filename_queue = tf.train.string_input_producer(filenames, num_epochs=num_epochs, shuffle=True)
    example, label = read_my_file_format(filename_queue)
    min_after_dequeue = 10000
    capacity = min_after_dequeue + 3 * batch_size
    example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue)
    return example_batch, label_batch

这部分正在工作,执行以下操作时:

This part is working, when executing something like:

with tf.Session() as sess:
    ex_b, l_b = input_pipeline(["Tensorflow_vectors.csv"], 10, 1)
    print("Test: ",ex_b)

我的结果是 Test: Tensor("shuffle_batch:0", shape=(10, 5007), dtype=float32)

到目前为止,这对我来说似乎很好.接下来,我创建了一个包含两个隐藏层(分别为 512 和 256 个节点)的简单模型.出现问题的地方是我尝试训练模型时:

So far this seems fine to me. Next I've created a simple model consising of two hidden layers (512 and 256 nodes respectively). Where things go wrong is when I'm trying to train the model:

batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size)
_, cost = sess.run([optimizer, cost], feed_dict={x: batch_x.eval(), y: batch_y.eval()})

我基于 这个例子使用 MNIST 数据库.但是,当我执行此操作时,即使我仅使用 batch_size = 1,Tensorflow 也会挂起.如果我省略了应该从张量中获取实际数据的 .eval() 函数,我会得到以下响应:

I've based this approach on this example that uses the MNIST database. However, when I'm executing this, even when I'm just using batch_size = 1, Tensorflow just hangs. If I leave out the .eval() functions that should get the actual data from the tensors, I get the following response:

TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

现在我可以理解了,但我不明白为什么当我包含 .eval() 函数时程序会挂起,我不知道在哪里可以找到有关此的任何信息问题.

Now this I can understand, but I don't understand why the program hangs when I do include the .eval() function and I don't know where I could find any information about this issue.

我在此处包含了我的整个脚本的最新版本.即使我实施了(据我所知)由 vijay m

I included the most recent version of my entire script here. The program still hangs even though I implemented (as far as I know correctly) the solution that was offered by vijay m

推荐答案

正如错误所说,您正在尝试将张量提供给 feed_dict.您已经定义了一个 input_pipeline 队列,但不能将其作为 feed_dict 传递.将数据传递给模型和训练的正确方式如下面的代码所示:

As the error says, you are trying to feed a tensor to feed_dict. You have defined a input_pipeline queue and you cant pass it as feed_dict. The proper way for the data to be passed to the model and train is shown in the code below:

 # A queue which will return batches of inputs 
 batch_x, batch_y = input_pipeline(["Tensorflow_vectors.csv"], batch_size)

 # Feed it to your neural network model: 
 # Every time this is called, it will pull data from the queue.
 logits = neural_network(batch_x, batch_y, ...)

 # Define cost and optimizer
 cost = ...
 optimizer = ...

 # Evaluate the graph on a session:
 with tf.Session() as sess:
    init_op = ...
    sess.run(init_op)

    # Start the queues
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    # Loop through data and train
    for ( loop through steps ):
        _, cost = sess.run([optimizer, cost])

    coord.request_stop()
    coord.join(threads) 

这篇关于Tensorflow:使用输入管道 (.csv) 作为训练字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆