Tensorflow v1.10+ 为什么在没有它的情况下创建检查点时需要输入服务接收器功能? [英] Tensorflow v1.10+ why is an input serving receiver function needed when checkpoints are made without it?

查看:19
本文介绍了Tensorflow v1.10+ 为什么在没有它的情况下创建检查点时需要输入服务接收器功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调整我的模型以适应 TensorFlow 的 estimator API.

我最近问了一个关于根据验证数据提前停止,其中除了提前停止之外,此时最好的模型应该被导出.

我对模型导出是什么以及检查点是什么的理解似乎并不完整.

检查点是自动生成的.根据我的理解,检查点足以让估算器开始热身"——要么使用经过训练的权重,要么在出现错误之前使用权重(例如,如果您遇到停电).

检查点的好处在于,除了自定义估算器所需的代码(即,input_fnmodel_fn)之外,我无需编写任何代码.

虽然给定一个初始化的估计器,我们可以调用它的 train 方法来训练模型,但实际上这种方法相当乏味.通常人们想做几件事:

  1. 定期将网络与验证数据集进行比较,以确保不会过度拟合
  2. 如果发生过拟合,尽早停止训练
  3. 在网络完成时保存最佳模型(通过达到指定的训练步骤数或提前停止标准).

对于高级"估算器 API 的新手来说,似乎需要很多低级专业知识(例如对于 input_fn),因为如何让估算器做到这一点并不是直截了当.

通过使用 tf.estimator.TrainSpectf.estimator.EvalSpec 可以通过一些轻微的代码修改#1 来实现>tf.estimator.train_and_evaluate.

之前问题 用户 @GPhilo 阐明了如何使用 tf.contrib 中的半直观函数来实现 #2:

tf.contrib.estimator.stop_if_no_decrease_hook(my_estimator,'my_metric_to_monitor', 10000)

(不直观,因为提前停止不是根据非改进评估的数量触发,而是根据某个步骤范围内非改进评估的数量触发").

@GPhilo - 注意到它与 #2 无关 - 还回答了如何做 #3(按照原帖中的要求)).然而,我不明白 input_serving_fn 是什么,为什么需要它,或者如何制作它.

这让我更加困惑,因为不需要这样的函数来创建检查点,或者估算器从检查点开始热".

所以我的问题是:

  • 检查点和导出的最佳模型有什么区别?
  • 服务输入接收器函数究竟是什么以及如何编写?(我花了一些时间阅读 tensorflow 文档,但发现它不足以理解我应该如何编写,以及为什么我必须这样做).
  • 如何训练我的估算器,保存最佳模型,然后再加载它.

为了帮助回答我的问题,我提供了这个 Colab 文档.

这个自包含的笔记本产生一些虚拟数据,将其保存在 TF Records 中,通过 model_fn 有一个非常简单的自定义估计器,并使用 input_fn 训练这个模型,使用TF 记录文件.因此,只要有人向我解释我需要为输入服务接收器功能制作哪些占位符,以及如何完成#3.

更新

@GPhilo 首先,我不能低估您对帮助我(以及希望其他人)理解此事的周到考虑和关怀的感谢.

我的目标"(促使我提出这个问题)是尝试为训练网络构建一个可重用的框架,这样我就可以通过不同的 build_fn 并开始(加上生活质量导出模型的特征,提前停止等).

可以找到更新的(根据您的回答)Colab此处.

在阅读了您的回答之后,我现在发现了更多的困惑:

1.

<块引用>

您向推理模型提供输入的方式与您用于训练的方式不同

为什么?据我了解,数据输入管道不是:

load raw —>过程——>喂给模型

而是:

加载原始数据 —>预处理 —>存储(可能作为 tf 记录)# 数据处理与向模型馈送数据无关?负载已处理 —>喂给模型

换句话说,我的理解(可能是错误的)tf Example/SequenceExample 的目的是存储一个完整的单一数据实体准备好 -除了从 TFRecord 文件中读取外,不需要其他处理.

因此,训练/评估 input_fn 和推理之间可能存在差异(例如从文件读取与内存中的急切/交互式评估),但数据格式是相同的(除了对于推理,您可能只想提供 1 个示例而不是一批……)

我同意输入管道不是模型本身的一部分".但是,在我看来,而且我的想法显然是错误的,使用估算器,我应该能够为它提供一批用于训练和一个用于推理的示例(或批次).

旁白:在评估时,你不需要梯度,你需要一个不同的输入函数.,唯一的区别(至少在我的情况下)是你从中获取的文件阅读?

  1. 我熟悉那个 TF 指南,但我觉得它没有用,因为我不清楚需要添加哪些占位符以及需要添加哪些额外的操作来转换数据.

如果我用记录训练我的模型并且只想用密集张量进行推理怎么办?

切向地,我在链接的指南 subpar 中找到了 example,因为 tf 记录界面要求用户多次定义如何在不同上下文中写入/提取 tf 记录文件中的特征.此外,鉴于 TF 团队明确表示他们对记录 tf 记录几乎没有兴趣,因此对我而言,建立在它之上的任何文档都同样没有启发性.

  1. 关于 tf.estimator.export.build_raw_serving_input_receiver_fn.占位符叫什么?输入?您能否通过编写等效的 serving_input_receiver_fn

  2. 来显示 tf.estimator.export.build_raw_serving_input_receiver_fn 的模拟
  3. 关于输入图像的示例 serving_input_receiver_fn.你怎么知道调用特征图像"和接收器张量input_data"?那是(后者)标准吗?

  4. 如何使用 signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY 命名导出.

解决方案

检查点和导出的最佳模型有什么区别?

检查点至少是一个文件,其中包含在特定时间点获取的特定图表的所有变量值.通过特定图,我的意思是当加载回检查点时,TensorFlow 所做的是循环遍历图中定义的所有变量(您正在运行的 session 中的变量)并在检查点文件与图中的文件同名.对于恢复训练,这是理想的,因为您的图表在重新启动之间看起来总是相同的.

导出的模型有不同的用途.导出模型的想法是,一旦你完成训练,你想要得到一些可用于推理的东西,它不包含特定于训练的所有(重)部分(一些例子:梯度计算、全局步骤变量,输入管道,...).此外,这也是关键点,通常您向推理模型提供输入的方式与您用于训练的方式不同.对于训练,您有一个输入管道,可以加载、预处理和将数据馈送到您的网络.此输入管道不是模型本身的一部分,可能必须进行更改才能进行推理.这是使用 Estimator 操作时的一个关键点.

为什么我需要服务输入接收器功能?

为了回答这个问题,我先退后一步.为什么我们需要输入函数以及它们是什么? TF 的 Estimator 虽然可能不像其他网络建模方法那样直观,但有一个很大的优势:它们很清楚通过输入函数和模型函数将模型逻辑和输入处理逻辑分开.

模型存在于 3 个不同的阶段:训练、评估和推理.对于最常见的用例(或者至少,我目前能想到的所有用例),在 TF 中运行的 graph 在所有这些阶段都会有所不同.该图是输入预处理、模型以及在当前阶段运行模型所需的所有机制的组合.

一些希望进一步澄清的示例:在训练时,您需要梯度来更新权重、运行训练步骤的优化器、监控事情进展的各种指标、从训练中获取数据的输入管道设置等.在评估时,你不需要梯度,你需要一个不同的输入函数.当您进行推理时,您只需要模型的 forward 部分,并且输入函数将再次不同(没有 tf.data.* 的东西,但通常只是一个占位符).

Estimators 中的每个阶段都有自己的输入函数.您熟悉训练和评估,推理只是您的 serving input receiver 函数.在 TF 术语中,服务"是打包训练好的模型并将其用于推理的过程(有一个用于大规模操作的完整 TensorFlow 服务系统,但这超出了这个问题,无论如何您很可能不需要它).

是时候引用 关于该主题的 TF 指南:

<块引用>

在训练期间, input_fn() 摄取数据并准备好供该模型.在服务时间,类似地,一个 serving_input_receiver_fn()接受推理请求并为模型准备它们.这个函数有以下用途:

  • 将占位符添加到服务系统将提供的图表中与推理请求.
  • 添加转换所需的任何其他操作将数据从输入格式转换为期望的特征张量模型.

现在,服务输入函数规范取决于您计划如何将输入发送到图形.

如果您要将数据打包到(序列化的)tf.Example(类似于 TFRecord 文件中的记录之一),您的服务输入函数将有一个字符串占位符(用于示例的序列化字节),并且需要关于如何解释示例以提取其数据的规范.如果这是您想要的方式,我邀请您查看上面链接指南中的示例,它基本上展示了您如何设置如何解释示例并解析它以获取输入数据的规范.

相反,如果您打算直接向网络的第一层提供输入,您仍然需要定义一个服务输入函数,但这一次将仅包含将直接插入网络的占位符.TF 提供了一个功能:tf.estimator.export.build_raw_serving_input_receiver_fn.

那么,您真的需要编写自己的输入函数吗?如果您需要的是占位符,则不需要.只需使用带有适当参数的 build_raw_serving_input_receiver_fn 即可.如果您需要更高级的预处理,那么是的,您可能需要自己编写.在这种情况下,它看起来像这样:

def services_input_receiver_fn():"""为了举例,我们假设您对网络的输入将是一个 28x28 的灰度图像,然后您将根据需要对其进行预处理"""input_images = tf.placeholder(dtype=tf.uint8,形状=[无, 28, 28, 1],名称='输入图像')# 在这里,您可以对图像进行所有需要的操作,然后才能将它们馈送到网络(例如,规范化、重塑等).让我们假设图像"是结果张量.features = {'input_data' : images} # 这是字典,然后作为features"参数传递给你的 model_fnreceiver_tensors = {'input_data': input_images} # 据我所知,这是将输入映射到稍后可以检索的名称所必需的返回 tf.estimator.export.ServingInputReceiver(功能,receiver_tensors)

如何训练我的估算器,保存最佳模型,然后加载它?

您的 model_fn 采用 mode 参数,以便您有条件地构建模型.例如,在您的 colab 中,您总是有一个优化器.这是错误的,因为它应该只用于 mode == tf.estimator.ModeKeys.TRAIN.

其次,您的 build_fn 有一个毫无意义的输​​出"参数.这个函数应该代表你的推理图,只将你在推理中提供给它的张量作为输入,并返回对数/预测.因此,我假设 outputs 参数不存在,因为 build_fn 签名应该是 def build_fn(inputs, params).

此外,您定义 model_fn 以将 features 作为张量.虽然这是可以做到的,但它既限制了你只有一个输入,又使 services_fn 的事情变得复杂(你不能使用罐装的 build_raw_... 但需要自己编写并返回一个 build_raw_...a href="https://www.tensorflow.org/api_docs/python/tf/estimator/export/TensorServingInputReceiver" rel="noreferrer">TensorServingInputReceiver).我将选择更通用的解决方案,并假设您的 model_fn 如下(为简洁起见,我省略了变量范围,根据需要添加):

def model_fn(特征、标签、模式、参数):my_input = features["input_data"]my_input.set_shape(I_SHAPE(params['batch_size']))# 网络输出onet = build_fn(功能,参数)预测标签 = tf.nn.sigmoid(onet)预测 = {'labels':predicted_labels,'logits':onet}export_outputs = { # 查看 EstimatorSpec 的文档以了解这是什么以及为什么它是必要的.'标签':tf.estimator.export.PredictOutput(predicted_labels),'logits': tf.estimator.export.PredictOutput(onet)}# 注意:export_outputs 也可用于在评估期间将模型保存为SavedModel".# 这里是训练、推理和评估之间图的公共部分停止的地方.如果模式 == tf.estimator.ModeKeys.PREDICT:# 尽早返回并避免添加与推理无关的其余图形.返回 tf.estimator.EstimatorSpec(mode=mode,预测=预测,export_outputs=export_outputs)label.set_shape(O_SHAPE(params['batch_size']))# 计算损失损失 = loss_fn(onet,标签)# 仅当我们正在训练时才添加优化器如果模式 == tf.estimator.ModeKeys.TRAIN:优化器 = tf.train.AdagradOptimizer(learning_rate=params['learning_rate'])# 一些用于训练和评估的指标mae = tf.metrics.mean_absolute_error(标签=标签,预测=predicted_labels,名称=mea_op")mse = tf.metrics.mean_squared_error(标签=标签,预测=predicted_labels,名称=mse_op")指标 = {'mae':mae,'mse':mse}tf.summary.scalar('mae', mae[1])tf.summary.scalar('mse', mse[1])如果模式 == tf.estimator.ModeKeys.EVAL:返回 tf.estimator.EstimatorSpec(模式,损失=损失,eval_metric_ops=metrics,预测=预测,export_outputs=export_outputs)如果模式 == tf.estimator.ModeKeys.TRAIN:train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())返回 tf.estimator.EstimatorSpec(模式,损失=损失,train_op=train_op,eval_metric_ops=metrics,predictions=predictions,export_outputs=export_outputs)

现在,设置导出部分,在您调用 train_and_evaluate 完成后:

1) 定义您的服务输入函数:

serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({'input_data':tf.placeholder(tf.float32, [None,#YOUR_INPUT_SHAPE_HERE (没有批量大小)#])})

2) 将模型导出到某个文件夹

est.export_savedmodel('my_directory_for_saved_models', services_fn)

这会将估算器的当前状态保存到您指定的任何位置.如果您想要一个特定的检查点,请在调用 export_savedmodel 之前加载它.这将在my_directory_for_saved_models"中保存一个预测图,其中包含调用导出函数时估算器具有的训练参数.

最后,您可能不希望冻结图形(查找 freeze_graph.py)并优化它以进行推理(查找 optimize_for_inference.py 和/或 transform_graph) 获取冻结的 *.pb 文件,然后您可以根据需要加载和用于推理.

<小时>

在更新中添加新问题的答案

旁注:

<块引用>

我的目标"(促使我提出这个问题)是尝试建立一个用于训练网络的可重用框架,所以我可以通过一个不同的 build_fn 和 go(加上有品质的生活功能导出模型,提前停止等).

无论如何,如果您管理,请将其发布到 GitHub 某处并链接到我.一段时间以来,我一直试图让同样的事情启动并运行,但结果并不像我希望的那样好.

问题 1:

<块引用>

换句话说,我的理解(可能是错误的)是tf Example/SequenceExample 的重点是存储一个完整的准备就绪的单一数据实体 - 无需其他处理而不是从 TFRecord 文件中读取.

实际上,通常情况并非如此(尽管您的方式理论上也非常好).您可以将 TFRecords 视为一种以紧凑的方式存储数据集的(记录详尽的)方式.例如,对于图像数据集,记录通常包含压缩图像数据(如构成 jpeg/png 文件的字节)、其标签和一些元信息.然后输入管道读取记录,对其进行解码,根据需要对其进行预处理并将其提供给网络.当然,您可以在生成 TFRecord 数据集之前移动解码和预处理,并将即用型数据存储在示例中,但是您的数据集的大小会膨胀.

特定的预处理管道是阶段之间发生变化的一个示例(例如,您可能在训练管道中进行了数据增强,但在其他管道中则没有).当然,有些情况下这些管道是相同的,但通常情况并非如此.

关于旁边:

<块引用>

在评估时,你不需要梯度,你需要一个不同的输入功能.,唯一的区别(至少在我的情况下)是你阅读的文件吗?

在您的情况下可能是这样.但同样,假设您正在使用数据增强:您需要在 eval 期间禁用它(或者,更好的是,根本没有它),这会改变您的管道.

问题 2:如果我用记录训练我的模型并且只想用密集张量进行推理怎么办?

这正是您将管道与模型分开的原因.该模型将张量作为输入并对其进行操作.无论该张量是占位符还是将其从 Example 转换为张量的子图的输出,这是属于框架的细节,而不属于模型本身.

分裂点是模型输入.该模型需要一个张量(或者,在更一般的情况下,一个 name:tensor 项的字典)作为输入,并使用它来构建其计算图.输入来自哪里由输入函数决定,但只要所有输入函数的输出具有相同的接口,就可以根据需要交换输入,模型将简单地获取并使用它.

所以,回顾一下,假设您使用示例训练/评估并使用密集张量进行预测,您的训练和评估输入函数将设置一个管道,从某处读取示例,将它们解码为张量并将其返回给模型以供使用作为输入.另一方面,您的预测输入函数只是为模型的每个输入设置一个占位符并将它们返回给模型,因为它假设您将准备好输入网络的数据放入占位符中.

问题 3:

您将占位符作为 build_raw_serving_input_receiver_fn 的参数传递,因此您选择了它的名称:

tf.estimator.export.build_raw_serving_input_receiver_fn({'images':tf.placeholder(tf.float32, [None,28,28,1], name='input_images')})

问题 4:

代码有错误(我混淆了两行),dict的key应该是input_data(我修改了上面的代码).dict 中的键必须是用于从 model_fn 中的 features 检索张量的键.在 model_fn 中,第一行是:

my_input = features["input_data"]

因此关键是'input_data'.根据 receiver_tensor 中的键,我仍然不太确定它的作用是什么,所以我的建议是尝试设置一个与 features 中的键不同的名称并检查名字出现的地方.

问题 5:

我不确定我是否理解,我会在澄清后进行编辑

I'm in the process of adapting my model to TensorFlow's estimator API.

I recently asked a question regarding early stopping based on validation data where in addition to early stopping, the best model at this point should be exported.

It seems that my understanding of what a model export is and what a checkpoint is is not complete.

Checkpoints are made automatically. From my understanding, the checkpoints are sufficient for the estimator to start "warm" - either using so per-trained weights or weights prior to an error (e.g. if you experienced a power outage).

What is nice about checkpoints is that I do not have to write any code besides what is necessary for a custom estimator (namely, input_fn and model_fn).

While, given an initialized estimator, one can just call its train method to train the model, in practice this method is rather lackluster. Often one would like to do several things:

  1. compare the network periodically to a validation dataset to ensure you are not over-fitting
  2. stop the training early if over-fitting occurs
  3. save the best model whenever the network finishes (either by hitting the specified number of training steps or by the early stopping criteria).

To someone new to the "high level" estimator API, a lot of low level expertise seems to be required (e.g. for the input_fn) as how one could get the estimator to do this is not straight forward.

By some light code reworking #1 can be achieved by using tf.estimator.TrainSpec and tf.estimator.EvalSpec with tf.estimator.train_and_evaluate.

In the previous question user @GPhilo clarifies how #2 can be achieved by using a semi-unintuitive function from the tf.contrib:

tf.contrib.estimator.stop_if_no_decrease_hook(my_estimator,'my_metric_to_monitor', 10000)

(unintuitive as "the early stopping is not triggered according to the number of non-improving evaluations, but to the number of non-improving evals in a certain step range").

@GPhilo - noting that it is unrelated to #2 - also answered how to do #3 (as requested in the original post). Yet, I do not understand what an input_serving_fn is, why it is needed, or how to make it.

This is further confusing to me as no such function is needed to make checkpoints, or for the estimator to start "warm" from the checkpoint.

So my questions are:

  • what is the difference between a checkpoint and an exported best model?
  • what exactly is a serving input receiver function and how to write one? (I have spent a bit of time reading over the tensorflow docs and do not find it sufficient to understand how I should write one, and why I even have to).
  • how can I train my estimator, save the best model, and then later load it.

To aid in answering my question I am providing this Colab document.

This self contained notebook produces some dummy data, saves it in TF Records, has a very simple custom estimator via model_fn and trains this model with an input_fn that uses the TF Record files. Thus it should be sufficient for someone to explain to me what placeholders I need to make for the input serving receiver function and and how I can accomplish #3.

Update

@GPhilo foremost I can not understate my appreciation for you thoughtful consideration and care in aiding me (and hopefully others) understand this matter.

My "goal" (motivating me to ask this question) is to try and build a reusable framework for training networks so I can just pass a different build_fn and go (plus have the quality of life features of exported model, early stopping, etc).

An updated (based off your answers) Colab can be found here.

After several readings of your answer, I have found now some more confusion:

1.

the way you provide input to the inference model is different than the one you use for the training

Why? To my understanding the data input pipeline is not:

load raw —> process —> feed to model

But rather:

Load raw —> pre process —> store (perhaps as tf records)
# data processing has nothing to do with feeding data to the model?
Load processed —> feed to model

In other words, it is my understanding (perhaps wrongly) that the point of a tf Example / SequenceExample is to store a complete singular datum entity ready to go - no other processing needed other than reading from the TFRecord file.


Thus there can be a difference between the training / evaluation input_fn and the inference one (e.g. reading from file vs eager / interactive evaluation of in memory), but the data format is the same (except for inference you might want to feed only 1 example rather than a batch…)

I agree that the "input pipeline is not part of the model itself". However, in my mind, and I am apparently wrong in thinking so, with the estimator I should be able to feed it a batch for training and a single example (or batch) for inference.

An aside: "When evaluating, you don't need the gradients and you need a different input function. ", the only difference (at least in my case) is the files from which you reading?

  1. I am familiar with that TF Guide, but I have not found it useful because it is unclear to me what placeholders I need to add and what additional ops needed to be added to convert the data.

What if I train my model with records and want to inference with just the dense tensors?

Tangentially, I find the example in the linked guide subpar, given the tf record interface requires the user to define multiple times how to write to / extract features from a tf record file in different contexts. Further, given that the TF team has explicitly stated they have little interest in documenting tf records, any documentation built on top of it, to me, is therefore equally unenlightening.

  1. Regarding tf.estimator.export.build_raw_serving_input_receiver_fn. What is the placeholder called? Input? Could you perhaps show the analog of tf.estimator.export.build_raw_serving_input_receiver_fn by writing the equivalent serving_input_receiver_fn

  2. Regarding your example serving_input_receiver_fn with the input images. How do you know to call features ‘images’ and the receiver tensor ‘input_data’ ? Is that (the latter) standard?

  3. How to name an export with signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.

解决方案

What is the difference between a checkpoint and an exported best model?

A checkpoint is, at its minimum, a file containing the values of all the variables of a specific graph taken at a specific time point. By specific graph I mean that when loading back your checkpoint, what TensorFlow does is loop through all the variables defined in your graph (the one in the session you're running) and search for a variable in the checkpoint file that has the same name as the one in the graph. For resuming training, this is ideal because your graph will always look the same between restarts.

An exported model serves a different purpose. The idea of an exported model is that, once you're done training, you want to get something you can use for inference that doesn't contain all the (heavy) parts that are specific to training (some examples: gradient computation, global step variable, input pipeline, ...). Moreover, and his is the key point, typically the way you provide input to the inference model is different than the one you use for the training. For training, you have an input pipeline that loads, preprocess and feeds data to your network. This input pipeline is not part of the model itself and may have to be altered for inference. This is a key point when operating with Estimators.

Why do I need a serving input receiver function?

To answer this I'll take first a step back. Why do we need input functions at all ad what are they? TF's Estimators, while perhaps not as intuitive as other ways to model networks, have a great advantage: they clearly separate between model logic and input processing logic by means of input functions and model functions.

A model lives in 3 different phases: Training, Evaluation and Inference. For the most common use-cases (or at least, all I can think of at the moment), the graph running in TF will be different in all these phases. The graph is the combination of input preprocessing, model and all the machinery necessary to run the model in the current phase.

A few examples to hopefully clarify further: When training, you need gradients to update the weights, an optimizer that runs the training step, metrics of all kinds to monitor how things are going, an input pipeline that grabs data from the training set, etc. When evaluating, you don't need the gradients and you need a different input function. When you are inferencing, all you need is the forward part of the model and again the input function will be different (no tf.data.* stuff but typically just a placeholder).

Each of these phases in Estimators has its own input function. You're familiar with the training and evaluation ones, the inference one is simply your serving input receiver function. In TF lingo, "serving" is the process of packing a trained model and using it for inference (there's a whole TensorFlow serving system for large-scale operation but that's beyond this question and you most likely won't need it anyhow).

Time to quote a TF guide on the topic:

During training, an input_fn() ingests data and prepares it for use by the model. At serving time, similarly, a serving_input_receiver_fn() accepts inference requests and prepares them for the model. This function has the following purposes:

  • To add placeholders to the graph that the serving system will feed with inference requests.
  • To add any additional ops needed to convert data from the input format into the feature Tensors expected by the model.

Now, the serving input function specification depends on how you plan of sending input to your graph.

If you're going to pack the data in a (serialized) tf.Example (which is similar to one of the records in your TFRecord files), your serving input function will have a string placeholder (that's for the serialized bytes for the example) and will need a specification of how to interpret the example in order to extract its data. If this is the way you want to go I invite you to have a look at the example in the linked guide above, it essentially shows how you setup the specification of how to interpret the example and parse it to obtain the input data.

If, instead, you're planning on directly feeding input to the first layer of your network you still need to define a serving input function, but this time it will only contain a placeholder that will be plugged directly into the network. TF offers a function that does just that: tf.estimator.export.build_raw_serving_input_receiver_fn.

So, do you actually need to write your own input function? IF al you need is a placeholder, no. Just use build_raw_serving_input_receiver_fn with the appropriate parameters. IF you need fancier preprocessing, then yes, you might need to write your own. In that case, it would look something like this:

def serving_input_receiver_fn():
  """For the sake of the example, let's assume your input to the network will be a 28x28 grayscale image that you'll then preprocess as needed"""
  input_images = tf.placeholder(dtype=tf.uint8,
                                         shape=[None, 28, 28, 1],
                                         name='input_images')
  # here you do all the operations you need on the images before they can be fed to the net (e.g., normalizing, reshaping, etc). Let's assume "images" is the resulting tensor.

  features = {'input_data' : images} # this is the dict that is then passed as "features" parameter to your model_fn
  receiver_tensors = {'input_data': input_images} # As far as I understand this is needed to map the input to a name you can retrieve later
  return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

How can I train my estimator, save the best model, and then later load it?

Your model_fn takes the mode parameter in order for you to build conditionally the model. In your colab, you always have a optimizer, for example. This is wrong ,as it should only be there for mode == tf.estimator.ModeKeys.TRAIN.

Secondly, your build_fn has an "outputs" parameter that is meaningless. This function should represent your inference graph, take as input only the tensors you'll fed to it in the inference and return the logits/predictions. I'll thus assume the outputs parameters is not there as the build_fn signature should be def build_fn(inputs, params).

Moreover, you define your model_fn to take features as a tensor. While this can be done, it both limits you to having exactly one input and complicates things for the serving_fn (you can't use the canned build_raw_... but need to write your own and return a TensorServingInputReceiver instead). I'll choose the more generic solution and assume your model_fn is as follows (I omit the variable scope for brevity, add it as necessary):

def model_fn(features, labels, mode, params): 
  my_input = features["input_data"]
  my_input.set_shape(I_SHAPE(params['batch_size']))

  # output of the network
  onet = build_fn(features, params)
  predicted_labels = tf.nn.sigmoid(onet)
  predictions = {'labels': predicted_labels, 'logits': onet}
  export_outputs = { # see EstimatorSpec's docs to understand what this is and why it's necessary.
       'labels': tf.estimator.export.PredictOutput(predicted_labels),
       'logits': tf.estimator.export.PredictOutput(onet) 
  } 
  # NOTE: export_outputs can also be used to save models as "SavedModel"s during evaluation.

  # HERE is where the common part of the graph between training, inference and evaluation stops.
  if mode == tf.estimator.ModeKeys.PREDICT:
    # return early and avoid adding the rest of the graph that has nothing to do with inference.
    return  tf.estimator.EstimatorSpec(mode=mode, 
                                       predictions=predictions, 
                                       export_outputs=export_outputs)

  labels.set_shape(O_SHAPE(params['batch_size']))      

  # calculate loss 
  loss = loss_fn(onet, labels)

  # add optimizer only if we're training
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.AdagradOptimizer(learning_rate=params['learning_rate'])
  # some metrics used both in training and eval
  mae = tf.metrics.mean_absolute_error(labels=labels, predictions=predicted_labels, name='mea_op')
  mse = tf.metrics.mean_squared_error(labels=labels, predictions=predicted_labels, name='mse_op')
  metrics = {'mae': mae, 'mse': mse}
  tf.summary.scalar('mae', mae[1])
  tf.summary.scalar('mse', mse[1])

  if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics, predictions=predictions, export_outputs=export_outputs)

  if mode == tf.estimator.ModeKeys.TRAIN:
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op, eval_metric_ops=metrics, predictions=predictions, export_outputs=export_outputs)

Now, to set up the exporting part, after your call to train_and_evaluate finished:

1) Define your serving input function:

serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
                                       {'input_data':tf.placeholder(tf.float32, [None,#YOUR_INPUT_SHAPE_HERE (without batch size)#])})

2) Export the model to some folder

est.export_savedmodel('my_directory_for_saved_models', serving_fn)

This will save the current state of the estimator to wherever you specified. If you want a specifc checkpoint, load it before calling export_savedmodel. This will save in "my_directory_for_saved_models" a prediction graph with the trained parameters that the estimator had when you called the export function.

Finally, you might want t freeze the graph (look up freeze_graph.py) and optimize it for inference (look up optimize_for_inference.py and/or transform_graph) obtaining a frozen *.pb file you can then load and use for inference as you wish.


Edit: Adding answers to the new questions in the update

Sidenote:

My "goal" (motivating me to ask this question) is to try and build a reusable framework for training networks so I can just pass a different build_fn and go (plus have the quality of life features of exported model, early stopping, etc).

By all means, if you manage, please post it on GitHub somewhere and link it to me. I've been trying to get just the same thing up and running for a while now and the results are not quite as good as I'd like them to be.

Question 1:

In other words, it is my understanding (perhaps wrongly) that the point of a tf Example / SequenceExample is to store a complete singular datum entity ready to go - no other processing needed other than reading from the TFRecord file.

Actually, this is typically not the case (although, your way is in theory perfectly fine too). You can see TFRecords as a (awfully documented) way to store a dataset in a compact way. For image datasets for example, a record typically contains the compressed image data (as in, the bytes composing a jpeg/png file), its label and some meta information. Then the input pipeline reads a record, decodes it, preprocesses it as needed and feeds it to the network. Of course, you can move the decoding and preprocessing before the generation of the TFRecord dataset and store in the examples the ready-to-feed data, but the size blowup of your dataset will be huge.

The specific preprocessing pipeline is one example what changes between phases (for example, you might have data augmentation in the training pipeline, but not in the others). Of course, there are cases in which these pipelines are the same, but in general this is not true.

About the aside:

"When evaluating, you don't need the gradients and you need a different input function. ", the only difference (at least in my case) is the files from which you reading?

In your case that may be. But again, assume you're using data augmentation: You need to disable it (or, better, don't have it at all) during eval and this alters your pipeline.

Question 2: What if I train my model with records and want to inference with just the dense tensors?

This is precisely why you separate the pipeline from the model. The model takes as input a tensor and operates on it. Whether that tensor is a placeholder or is the output of a subgraph that converts it from an Example to a tensor, that's a detail that belongs to the framework, not to the model itself.

The splitting point is the model input. The model expects a tensor (or, in the more generic case, a dict of name:tensor items) as input and uses that to build its computation graph. Where that input comes from is decided by the input functions, but as long as the output of all input functions has the same interface, one can swap inputs as needed and the model will simply take whatever it gets and use it.

So, to recap, assuming you train/eval with Examples and predict with dense tensors, your train and eval input functions will set up a pipeline that reads examples from somewhere, decodes them into tensors and returns those to the model to use as inputs. Your predict input function, on the other hand, just sets up one placeholder per input of your model and returns them to the model, because it assumes you'll put in the placeholders the data ready to be fed to the network.

Question 3:

You pass the placeholder as a parameter of build_raw_serving_input_receiver_fn, so you choose its name:

tf.estimator.export.build_raw_serving_input_receiver_fn(                                               
    {'images':tf.placeholder(tf.float32, [None,28,28,1], name='input_images')})

Question 4:

There was a mistake in the code (I had mixed up two lines), the dict's key should have been input_data (I amended the code above). The key in the dict has to be the key you use to retrieve the tensor from features in your model_fn. In model_fn the first line is:

my_input = features["input_data"]

hence the key is 'input_data'. As per the key in receiver_tensor, I'm still not quite sure what role that one has, so my suggestion is try setting a different name than the key in features and check where the name shows up.

Question 5:

I'm not sure I understand, I'll edit this after some clarification

这篇关于Tensorflow v1.10+ 为什么在没有它的情况下创建检查点时需要输入服务接收器功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆