转换 TensorFlow 教程以使用我自己的数据 [英] Converting TensorFlow tutorial to work with my own data

查看:38
本文介绍了转换 TensorFlow 教程以使用我自己的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我上一个问题的后续内容从 Pandas 数据帧转换为 TensorFlow 张量对象

This is a follow on from my last question Converting from Pandas dataframe to TensorFlow tensor object

我现在正在进行下一步,需要更多帮助.我正在尝试替换这行代码

I'm now on the next step and need some more help. I'm trying to replace this line of code

batch = mnist.train.next_batch(100)

替换我自己的数据.我在 StackOverflow 上找到了这个答案:TensorFlow教程中的next_batch batch_xs,batch_ys = mnist.train.next_batch(100)是从哪里来的?但我不明白:

with a replacement for my own data. I've found this answer on StackOverflow: Where does next_batch in the TensorFlow tutorial batch_xs, batch_ys = mnist.train.next_batch(100) come from? But I don't understand:

1) 为什么 .next_batch() 对我的张量不起作用.我是否错误地创建了它

1) Why the .next_batch() doesn't work on my tensor. Am I creating it incorrectly

2) 如何实现 .next_batch() 问题答案中给出的伪代码

2) How to implement the pseudocode that was given in the answer to the question on .next_batch()

我目前有两个张量对象,一个带有我希望用来训练模型的参数 (dataVar_tensor),另一个带有正确的结果 (depth_tensor).我显然需要保持它们的关系,以使用正确的参数保持正确的响应.

I currently have two tensor objects, one with the parameters I wish to use to train the model (dataVar_tensor) and one with the correct result (depth_tensor). I obviously need to keep their relationship to keep the correct response with the correct parameters.

能否请您花些时间帮助我了解发生了什么并替换这行代码?

Please can you take some time to help me understand what's going on and to replace this line of code?

非常感谢

推荐答案

我去掉了不相关的东西,以保留格式和缩进.希望现在应该清楚了.以下代码分批读取 N 行(N 在顶部的常量中指定)中的 CSV 文件.每行包含一个日期(第一个单元格),然后是一个浮点数列表(480 个单元格)和一个单热向量(3 个单元格).然后,代码在读取这些日期、浮点数和单热向量时简单地打印这些批次.打印它们的位置通常是您实际运行模型并提供这些数据以代替占位符变量的位置.

I stripped off the non-relevant stuff so as to preserve the formatting and indentation. Hopefully it should be clear now. The following code reads a CSV file in batches of N lines (N specified in a constant at the top). Each line contains a date (first cell), then a list of floats (480 cells) and a one-hot vector (3 cells). The code then simply prints the batches of these dates, floats, and one-hot vector as it reads them. The place where it prints them is normally where you'd actually run your model and feed these in place of the placeholder variables.

请记住,这里它将每一行读取为一个字符串,然后将该行中的特定单元格转换为浮点数,因为第一个单元格更容易作为字符串读取.如果您的所有数据都是数字,那么只需将默认值设置为 float/int 而不是 'a' 并摆脱将字符串转换为浮点数的代码.否则不需要!

Just keep in mind that here it reads each line as a String, and then converts the specific cells within that line into floats, simply because the first cell is easier to read as a string. If all your data is numeric, then simply set the defaults into a float/int rather than an 'a' and get rid of the code that converts strings to floats. It's not needed otherwise!

我发表了一些评论来澄清它在做什么.如果有不清楚的地方,请告诉我.

I put some comments to clarify what it's doing. Let me know if something is unclear.

import tensorflow as tf

fileName = 'YOUR_FILE.csv'

try_epochs = 1
batch_size = 3

TD = 1 # this is my date-label for each row, for internal pruposes
TS = 480 # this is the list of features, 480 in this case
TL = 3 # this is one-hot vector of 3 representing the label

# set defaults to something (TF requires defaults for the number of cells you are going to read)
rDefaults = [['a'] for row in range((TD+TS+TL))]

# function that reads the input file, line-by-line
def read_from_csv(filename_queue):
    reader = tf.TextLineReader(skip_header_lines=False) # i have no header file
    _, csv_row = reader.read(filename_queue) # read one line
    data = tf.decode_csv(csv_row, record_defaults=rDefaults) # use defaults for this line (in case of missing data)
    dateLbl = tf.slice(data, [0], [TD]) # first cell is my 'date-label' for internal pruposes
    features = tf.string_to_number(tf.slice(data, [TD], [TS]), tf.float32) # cells 2-480 is the list of features
    label = tf.string_to_number(tf.slice(data, [TD+TS], [TL]), tf.float32) # the remainin 3 cells is the list for one-hot label
    return dateLbl, features, label

# function that packs each read line into batches of specified size
def input_pipeline(fName, batch_size, num_epochs=None):
    filename_queue = tf.train.string_input_producer(
        [fName],
        num_epochs=num_epochs,
        shuffle=True)  # this refers to multiple files, not line items within files
    dateLbl, features, label = read_from_csv(filename_queue)
    min_after_dequeue = 10000 # min of where to start loading into memory
    capacity = min_after_dequeue + 3 * batch_size # max of how much to load into memory
    # this packs the above lines into a batch of size you specify:
    dateLbl_batch, feature_batch, label_batch = tf.train.shuffle_batch(
        [dateLbl, features, label], 
        batch_size=batch_size,
        capacity=capacity,
        min_after_dequeue=min_after_dequeue)
    return dateLbl_batch, feature_batch, label_batch

# these are the date label, features, and label:
dateLbl, features, labels = input_pipeline(fileName, batch_size, try_epochs)

with tf.Session() as sess:

    gInit = tf.global_variables_initializer().run()
    lInit = tf.local_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    try:
        while not coord.should_stop():
            # load date-label, features, and label:
            dateLbl_batch, feature_batch, label_batch = sess.run([dateLbl, features, labels])      

            print(dateLbl_batch);
            print(feature_batch);
            print(label_batch);
            print('----------');

    except tf.errors.OutOfRangeError:
        print("Done looping through the file")

    finally:
        coord.request_stop()

    coord.join(threads)

这篇关于转换 TensorFlow 教程以使用我自己的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆