如何*实际*读取 TensorFlow 中的 CSV 数据? [英] How to *actually* read CSV data in TensorFlow?

查看:20
本文介绍了如何*实际*读取 TensorFlow 中的 CSV 数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 TensorFlow 的世界相对较新,并且对您如何实际上将 CSV 数据读入 TensorFlow 中可用的示例/标签张量感到非常困惑.TensorFlow 阅读 CSV 教程中的示例数据非常零散,只能帮助您完成 CSV 数据训练.

I'm relatively new to the world of TensorFlow, and pretty perplexed by how you'd actually read CSV data into a usable example/label tensors in TensorFlow. The example from the TensorFlow tutorial on reading CSV data is pretty fragmented and only gets you part of the way to being able to train on CSV data.

这是我根据 CSV 教程拼凑的代码:

Here's my code that I've pieced together, based off that CSV tutorial:

from __future__ import print_function
import tensorflow as tf

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

filename = "csv_test_data.csv"

# setup text reader
file_length = file_len(filename)
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = reader.read(filename_queue)

# setup CSV decoding
record_defaults = [[0],[0],[0],[0],[0]]
col1,col2,col3,col4,col5 = tf.decode_csv(csv_row, record_defaults=record_defaults)

# turn features back into a tensor
features = tf.stack([col1,col2,col3,col4])

print("loading, " + str(file_length) + " line(s)
")
with tf.Session() as sess:
  tf.initialize_all_variables().run()

  # start populating filename queue
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(file_length):
    # retrieve a single instance
    example, label = sess.run([features, col5])
    print(example, label)

  coord.request_stop()
  coord.join(threads)
  print("
done loading")

这是我正在加载的 CSV 文件中的一个简短示例 - 非常基本的数据 - 4 个特征列和 1 个标签列:

And here is an brief example from the CSV file I'm loading - pretty basic data - 4 feature columns, and 1 label column:

0,0,0,0,0
0,15,0,0,0
0,30,0,0,0
0,45,0,0,0

上面的所有代码都是从 CSV 文件中一个一个地打印每个示例,这虽然很好,但对训练毫无用处.

All the code above does is print each example from the CSV file, one by one, which, while nice, is pretty darn useless for training.

我在这里苦苦挣扎的是,您实际上如何将那些一个接一个加载的单个示例转换为训练数据集.例如,这是一个笔记本我正在使用Udacity 深度学习课程.我基本上想获取我正在加载的 CSV 数据,并将其放入类似 train_datasettrain_labels 的内容中:

What I'm struggling with here is how you'd actually turn those individual examples, loaded one-by-one, into a training dataset. For example, here's a notebook I was working on in the Udacity Deep Learning course. I basically want to take the CSV data I'm loading, and plop it into something like train_dataset and train_labels:

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

我试过像这样使用 tf.train.shuffle_batch,但它莫名其妙地挂了:

I've tried using tf.train.shuffle_batch, like this, but it just inexplicably hangs:

  for i in range(file_length):
    # retrieve a single instance
    example, label = sess.run([features, colRelevant])
    example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size=file_length, capacity=file_length, min_after_dequeue=10000)
    print(example, label)

总而言之,以下是我的问题:

So to sum up, here are my questions:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆