使用实例键进行训练和预测 [英] Training and Predicting with instance keys

查看:25
本文介绍了使用实例键进行训练和预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够训练我的模型并使用 ML Engine 进行预测,但我的结果不包含任何识别信息.这在一次提交一行进行预测时效果很好,但在提交多行时,我无法将预测连接回原始输入数据.GCP 文档 讨论了使用实例键,但我不能t 找到任何使用实例键进行训练和预测的示例代码.以 GCP 人口普查为例,我将如何更新输入函数以通过图形传递唯一 ID 并在训练期间忽略它,但返回带有预测的唯一 ID?或者,如果有人知道已经在使用密钥的不同示例也有帮助.

I am able to train my model and use ML Engine for prediction but my results don't include any identifying information. This works fine when submitting one row at a time for prediction but when submitting multiple rows I have no way of connecting the prediction back to the original input data. The GCP documentation discusses using instance keys but I can't find any example code that trains and predicts using an instance key. Taking the GCP census example how would I update the input functions to pass a unique ID through the graph and ignore it during training yet return the unique ID with predictions? Or alternatively if anyone knows of a different example already using keys that would help as well.

来自人口普查估算器示例

def serving_input_fn():
    feature_placeholders = {
      column.name: tf.placeholder(column.dtype, [None])
      for column in INPUT_COLUMNS
    }

    features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in feature_placeholders.items()
    }

    return input_fn_utils.InputFnOps(
      features,
      None,
      feature_placeholders
    )


def generate_input_fn(filenames,
                  num_epochs=None,
                  shuffle=True,
                  skip_header_lines=0,
                  batch_size=40):

    def _input_fn():
        files = tf.concat([
          tf.train.match_filenames_once(filename)
          for filename in filenames
        ], axis=0)

        filename_queue = tf.train.string_input_producer(
          files, num_epochs=num_epochs, shuffle=shuffle)
        reader = tf.TextLineReader(skip_header_lines=skip_header_lines)

        _, rows = reader.read_up_to(filename_queue, num_records=batch_size)

        row_columns = tf.expand_dims(rows, -1)
        columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS)
        features = dict(zip(CSV_COLUMNS, columns))

        # Remove unused columns
        for col in UNUSED_COLUMNS:
          features.pop(col)

        if shuffle:
           features = tf.train.shuffle_batch(
             features,
             batch_size,
             capacity=batch_size * 10,
             min_after_dequeue=batch_size*2 + 1,
             num_threads=multiprocessing.cpu_count(),
             enqueue_many=True,
             allow_smaller_final_batch=True
           )
        label_tensor = parse_label_column(features.pop(LABEL_COLUMN))
        return features, label_tensor

    return _input_fn

更新:我能够使用 下面这个答案中的建议代码我只需要稍微更改它以更新model_fn_ops 而不仅仅是预测字典.但是,这仅在我的服务输入函数为类似于 这个.我的服务输入函数之前是在 人口普查核心样本.

Update: I was able to use the suggested code from this answer below I just needed to alter it slightly to update the output alternatives in the model_fn_ops instead of just the prediction dict. However, this only works if my serving input function is coded for json inputs similar to this. My serving input function was previously modeled after the CSV serving input function in the Census Core Sample.

我认为我的问题来自 build_standardized_signature_def 函数,甚至还有 is_classification_problem 它调用的函数.使用 csv 服务函数的输入 dict 长度为 1,因此该逻辑最终使用 classification_signature_def 最终只显示分数(结果实际上是 probabilities) 而输入 dict 长度大于 1 使用 json 服务输入函数而不是

I think my problem is coming from the build_standardized_signature_def function and even more so the is_classification_problem function that it calls. The input dict length using the csv serving function is 1 so this logic ends up using the classification_signature_def which only ends up displaying the scores (which turns out are actually the probabilities) whereas the input dict length is greater than 1 with the json serving input function and instead the predict_signature_def is used which includes all of the outputs.

推荐答案

更新:在 1.3 版中,contrib 估算器(例如 tf.contrib.learn.DNNClassifier)已更改为从核心估算器类 tf.estimator 继承.Estimator 与其前身不同,它将模型函数隐藏为私有类成员,因此您需要将以下解决方案中的 estimator.model_fn 替换为 estimator._model_fn.

UPDATE: In version 1.3 the contrib estimators (tf.contrib.learn.DNNClassifier for example), were changed to inherit from the core estimator class tf.estimator.Estimator which unlike it's predecessor, hides the model function as a private class member, so you'll need to replace estimator.model_fn in the solution below with estimator._model_fn.

Josh 的回答将您引向 Flowers 示例,如果您想使用自定义估算器,这是一个很好的解决方案.如果您想坚持使用固定的估算器(例如 tf.contrib.learn.DNNClassifiers),您可以将其包装在添加对键的支持的自定义估算器中.(注意:我认为罐头估算器在进入核心时可能会获得关键支持).

Josh's answer points you to the Flowers example, which is a good solution if you want to use a custom estimator. If you want to stick with a canned estimator, (e.g. the tf.contrib.learn.DNNClassifiers) you can wrap it in a custom estimator that adds support for keys. (Note: I think it's likely canned estimators will gain key support when they move into core).

KEY = 'key'
def key_model_fn_gen(estimator):
    def _model_fn(features, labels, mode, params):
        key = features.pop(KEY, None)
        model_fn_ops = estimator.model_fn(
           features=features, labels=labels, mode=mode, params=params)
        if key:
            model_fn_ops.predictions[KEY] = key
            # This line makes it so the exported SavedModel will also require a key
            model_fn_ops.output_alternatives[None][1][KEY] = key
        return model_fn_ops
    return _model_fn

my_key_estimator = tf.contrib.learn.Estimator(
    model_fn=key_model_fn_gen(
        tf.contrib.learn.DNNClassifier(model_dir=model_dir...)
    ),
    model_dir=model_dir
)

my_key_estimator 然后可以像使用 DNNClassifier 一样使用,除了它需要一个名为 'key' 的功能来自input_fns(预测、评估和训练).

my_key_estimator can then be used exactly like your DNNClassifier would be used, except it will expect a feature with the name 'key' from input_fns (prediction, evaluation and training).

您还需要将相应的输入张量添加到您选择的预测输入函数中.例如,一个新的 JSON 服务输入 fn 看起来像:

You will also need to add the corresponding input tensor to the prediction input function of your choice. For example, a new JSON serving input fn would look like:

def json_serving_input_fn():
  inputs = # ... input_dict as before
  inputs[KEY] = tf.placeholder([None], dtype=tf.int64)
  features = # .. feature dict made from input_dict as before
  tf.contrib.learn.InputFnOps(features, None, inputs)

(在 1.2 和 1.3 之间略有不同,因为 tf.contrib.learn.InputFnOps 被替换为 tf.estimator.export.ServingInputReceiver,并将张量填充到 rank 2在 1.3 中不再需要)

(slightly different between 1.2 and 1.3, as tf.contrib.learn.InputFnOps is replaced with tf.estimator.export.ServingInputReceiver, and padding tensors to rank 2 is no longer necessary in 1.3)

然后 ML Engine 将随您的预测请求发送一个名为key"的张量,该张量将传递给您的模型,并与您的预测一起传递.

Then ML Engine will send a tensor named "key" with your prediction request, which will be passed to your model, and through with your predictions.

修改了 key_model_fn_gen 以支持忽略缺失的键值.添加了预测键

Modified key_model_fn_gen to support ignoring missing key values. Added key for prediction

这篇关于使用实例键进行训练和预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆