如何将Tensorflow数据集API与训练和验证集结合使用 [英] How to use Tensorflow dataset API with training and validation sets

查看:146
本文介绍了如何将Tensorflow数据集API与训练和验证集结合使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


手头简单的任务:对N个历元进行训练,在每个历元之后执行精确的验证准确性计算。纪元大小可以等于完整的训练集,也可以等于预定义的迭代次数。在验证过程中,每个验证集输入必须精确地评估一次。



对于该任务,将one_shot_iterators,可初始化迭代器和/或句柄混合在一起的最佳方法是什么? / p>

以下是我如何看待它的工作原理:

  def build_training_dataset():
通过

def build_validation_dataset():
通过

def Construct_train_op(dataset):
pass

def magic(迭代器):
pass

USE_CUSTOM_EPOCH_SIZE = True
CUSTOM_EPOCH_SIZE = 60
MAX_EPOCHS = 100


training_dataset = build_training_dataset()
validation_dataset = build_validation_dataset()


#魔术在这里建立了一个很好的单实例数据集
数据集=魔术(training_dataset,validation_dataset)

train_op = Constructor_train_op(数据集)

#运行N个历时,遍历训练数据集,然后是
#验证数据集。
,其中tf.Session()为ses:MAX_EPOCHS中的时期为


#培训
如果USE_CUSTOM_EPOCH_SIZE:
为_范围(CUSTOM_EPOCH_SIZE) :
sess.run(train_op)
其他:
而True:
#我猜是这样的:
试试:
sess.run(train_op )
,除了tf.errors.OutOfRangeError:
break#我们已经完成了时代

#验证
validation_predictions = []
而True:
try:
np.append(validation_predictions,sess.run(train_op))#但这次用于验证
,除了tf.errors.OutOfRangeError:
print('epoch%d完成准确度:%f'%(epochvalidation_predictions.mean()))
休息


解决方案


自解决方案以来比我预期的要混乱得多,有两种情况:



0)两个示例共享的辅助代码:



< pre_class = lang-py prettyprint-override> USE_CUSTOM_EPOCH_SIZE =真
CUSTOM_EPOCH_SIZE = 60
MAX_EPOCHS = 100

TRAIN_SIZE = 500
VALIDATION_SIZE = 145
BATCH_SIZE = 64


def Construct_train_op(批次):
退货批次


def build_train_dataset() :
return tf.data.Dataset.range(TRAIN_SIZE)\
.map(lambda x:x + tf.random_uniform([],-10,10,tf.int64))\
.batch(BATCH_SIZE)

def build_test_dataset():
返回tf.data.Dataset.range(VALIDATION_SIZE)\
.batch(BATCH_SIZE)

1)对于等于火车数据集大小的纪元:



< #数据集构造
training_dataset = build_train_dataset()
validation_dataset = build_test_dataset()

#处理构造。 Handle允许我们通过在feed_dict中提供参数来馈送来自不同数据集的数据
handle = tf.placeholder(tf.string,shape = [])
iterator = tf.data.Iterator.from_string_handle(handle, training_dataset.output_types,training_dataset.output_shapes)
next_element = iterator.get_next()

train_op = Construct_train_op(next_element)

training_iterator = training_dataset.make_initializable_iterator()
validation_iterator = validation_dataset.make_initializable_iterator()

,其中tf.Session()为sess:
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess。运行(validation_iterator.string_handle())

在范围内的时期(MAX_EPOCHS):
#train
sess.run(training_iterator.initializer)
total_in_train = 0
而True:
试试:
train_output = sess.run(train_op,feed_dict = {handle:training_handle})
total_in_train + = len(train_output)
,除了tf.errors.OutOfRangeError:
assert total_in_train == TRAIN_SIZE
break#我们完成了时代

#验证
validation_predictions = []
sess.run(validation_iterator.initializer)
而True:
试试:
pred = sess.run(train_op,feed_dict = {句柄:validation_handle })
validation_predictions = np.append(validation_predictions,pred)
除了tf.errors.OutOfRangeError:
assert len(validation_predictions)== VALIDATION_SIZE
print('Epoch%d完成了准确度:%f'%(epoch,np.mean(validation_predictions)))
休息

2)对于自定义纪元大小:

 #数据集构造
training_dataset = build_train _dataset()。repeat()#更改1
validation_dataset = build_test_dataset()

#处理构造。句柄允许我们通过在feed_dict中提供参数来馈送来自不同数据集的数据
handle = tf.placeholder(tf.string,shape = [])
iterator = tf.data.Iterator.from_string_handle(handle, training_dataset.output_types,training_dataset.output_shapes)
next_element = iterator.get_next()


train_op = Construct_train_op(next_element)

training_iterator = training_dataset.make_one_shot_iterator ()#更改2
validation_iterator = validation_dataset.make_initializable_iterator()

,其中tf.Session()为sess:
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(validation_iterator.string_handle())

在范围内的时期(MAX_EPOCHS):
#train
#变更3:不初始化,不尝试/捕获
范围内的_(CUSTOM_EPOCH_SIZE):
train_output = sess.run(train_op,feed_dict = {handle:training_handle})


#vali dation
validation_predictions = []
sess.run(validation_iterator.initializer)
而True:
试试:
pred = sess.run(train_op,feed_dict = {handle :validate_handle})
validation_predictions = np.append(validation_predictions,pred)
除了tf.errors.OutOfRangeError:
assert len(validation_predictions)== VALIDATION_SIZE
print('Epoch% d准确完成:%f'%(epoch,np.mean(validation_predictions)))
休息


Simple task at hand: run training for N epochs performing calculating exact validation accuracy after each epoch. Epoch size can be either equal to full training set or some predefined number of iterations. During validation every validation set input has to be evaluated exactly once.

What would be the best way to mix together one_shot_iterators, initializable iterator and/or handle for that task?

Here is scaffolding of how i see it should work:

def build_training_dataset():
    pass

def build_validation_dataset():
    pass

def construct_train_op(dataset):
    pass

def magic(iterator):
    pass

USE_CUSTOM_EPOCH_SIZE = True
CUSTOM_EPOCH_SIZE = 60
MAX_EPOCHS = 100


training_dataset = build_training_dataset()
validation_dataset = build_validation_dataset()


# Magic goes here to build a nice one-instance dataset
dataset = magic(training_dataset, validation_dataset)

train_op = construct_train_op(dataset)

# Run N epochs in which the training dataset is traversed, followed by the
# validation dataset.
with tf.Session() as sess:
    for epoch in MAX_EPOCHS:

        # train
        if USE_CUSTOM_EPOCH_SIZE:
            for _ in range(CUSTOM_EPOCH_SIZE):
                sess.run(train_op)
        else:
            while True:
                # I guess smth like this:
                try:
                    sess.run(train_op)
                except tf.errors.OutOfRangeError:
                    break # we are done with the epoch

        # validation
        validation_predictions = []
        while True:
            try:
                np.append(validation_predictions, sess.run(train_op)) # but for validation this time
            except tf.errors.OutOfRangeError:
                print('epoch %d finished with accuracy: %f' % (epoch validation_predictions.mean()))
                break 

解决方案

Since the solution is a lot messier than I expected it comes in 2 peaces:

0) Auxiliary code shared by both examples:

USE_CUSTOM_EPOCH_SIZE = True
CUSTOM_EPOCH_SIZE = 60
MAX_EPOCHS = 100

TRAIN_SIZE = 500
VALIDATION_SIZE = 145
BATCH_SIZE = 64


def construct_train_op(batch):
    return batch


def build_train_dataset():
    return tf.data.Dataset.range(TRAIN_SIZE) \
        .map(lambda x: x + tf.random_uniform([], -10, 10, tf.int64)) \
        .batch(BATCH_SIZE)

def build_test_dataset():
    return tf.data.Dataset.range(VALIDATION_SIZE) \
        .batch(BATCH_SIZE)

1) For epoch equal to the train dataset size:

# datasets construction
training_dataset = build_train_dataset()
validation_dataset = build_test_dataset()

# handle constructions. Handle allows us to feed data from different dataset by providing a parameter in feed_dict
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, training_dataset.output_types, training_dataset.output_shapes)
next_element = iterator.get_next()

train_op = construct_train_op(next_element)

training_iterator = training_dataset.make_initializable_iterator()
validation_iterator = validation_dataset.make_initializable_iterator()

with tf.Session() as sess:
    training_handle = sess.run(training_iterator.string_handle())
    validation_handle = sess.run(validation_iterator.string_handle())

    for epoch in range(MAX_EPOCHS):
        #train
        sess.run(training_iterator.initializer)
        total_in_train = 0
        while True:
            try:
                train_output = sess.run(train_op, feed_dict={handle: training_handle})
                total_in_train += len(train_output)
            except tf.errors.OutOfRangeError:
                assert total_in_train == TRAIN_SIZE
                break # we are done with the epoch

        # validation
        validation_predictions = []
        sess.run(validation_iterator.initializer)
        while True:
            try:
                pred = sess.run(train_op, feed_dict={handle: validation_handle})
                validation_predictions = np.append(validation_predictions, pred)
            except tf.errors.OutOfRangeError:
                assert len(validation_predictions) == VALIDATION_SIZE
                print('Epoch %d finished with accuracy: %f' % (epoch, np.mean(validation_predictions)))
                break

2) For custom epoch size:

# datasets construction
training_dataset = build_train_dataset().repeat() # CHANGE 1
validation_dataset = build_test_dataset()

# handle constructions. Handle allows us to feed data from different dataset by providing a parameter in feed_dict
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, training_dataset.output_types, training_dataset.output_shapes)
next_element = iterator.get_next()


train_op = construct_train_op(next_element)

training_iterator = training_dataset.make_one_shot_iterator() # CHANGE 2
validation_iterator = validation_dataset.make_initializable_iterator()

with tf.Session() as sess:
    training_handle = sess.run(training_iterator.string_handle())
    validation_handle = sess.run(validation_iterator.string_handle())

    for epoch in range(MAX_EPOCHS):
        #train
        # CHANGE 3: no initiazation, not try/catch
        for _ in range(CUSTOM_EPOCH_SIZE): 
            train_output = sess.run(train_op, feed_dict={handle: training_handle})


        # validation
        validation_predictions = []
        sess.run(validation_iterator.initializer)
        while True:
            try:
                pred = sess.run(train_op, feed_dict={handle: validation_handle})
                validation_predictions = np.append(validation_predictions, pred)
            except tf.errors.OutOfRangeError:
                assert len(validation_predictions) == VALIDATION_SIZE
                print('Epoch %d finished with accuracy: %f' % (epoch, np.mean(validation_predictions)))
                break

这篇关于如何将Tensorflow数据集API与训练和验证集结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆