将google-cloud-ml github Reddit示例从回归转换为分类并添加键? [英] Converting google-cloud-ml github Reddit example from regression to classification and adding keys?

查看:48
本文介绍了将google-cloud-ml github Reddit示例从回归转换为分类并添加键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试改编 reddit_tft 示例 cloud-ml github示例存储库满足我的需求.

I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs.

我已经按照教程但是,我要使用它的是二进制分类问题,并且在批处理预测中还输出键.

However what i want to use it for is a binary classification problem and also output keys in batch prediction.

因此,我已复制了教程代码 ,并在几个地方进行了更改,使其能够具有deep_classifier的模型类型,该模型类型将使用DNNClasifier而不是DNNRegressor.

So i have made copy of the tutorial code here and have changed it in a few places to be able to have a model type of deep_classifier that would use a DNNClasifier instead of a DNNRegressor.

我将score变量更改为

I've changed the score variable to be

if(score>0,1,0) as score

if(score>0,1,0) as score

它的训练很好,可以部署到Cloud ml,但是我不确定现在如何从我的预测中获取密钥. `

It's training fine, deploys to cloud ml but i'm not sure how to now get keys back from my predictions. `

我已经更新了从BigQuery中提取的sql,以包含id as example_id

I've updated the sql pulling from BigQuery to include id as example_id here

看来本教程中的代码为example_id使用了某种占位符,所以我正在尝试利用它.

It seems the code from the tutorial had some sort of placeholder for example_id so i'm trying to leverage that.

这一切似乎都有效,但是当我得到批量预测时,我得到的是像这样的json:

It all seems to work but when i get batch predictions all i get is json like this:

{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]} {"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]} ...

{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]} {"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]} ...

所以example_id似乎并没有像我需要的那样进入服务功能.

So example_id does not seem to be making it into the serving functions like i need.

我尝试遵循此处的方法修改人口普查示例以获取关键信息.

I've tried to follow the approach here which is based on adapting the census example for keys.

我只是想不出如何完成这个Reddit示例以在预测中也输出键,因为它们在设计和功能上与我看起来有些不同.

I just cant figure out how to finish adapting this reddit example to also output keys in the predictions as they look a bit different to me in terms of design and functions being used.

我最近的尝试是此处尝试使用此处中概述的方法.

My latest attempt is here Trying to use the approach outlined here.

但是这给出了错误:

NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]

更新2

我的最新尝试和详细信息是此处.

我现在从tensorflow-fransform( run_preprocess.sh 在tft 0.1中工作正常)

I'm now getting a error from tensorflow-fransform (run_preprocess.sh works fine in tft 0.1)

File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__ self._dtype = tf.as_dtype(state['dtype']) TypeError: string indices must be integers, not str

File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__ self._dtype = tf.as_dtype(state['dtype']) TypeError: string indices must be integers, not str

我已将其更改为仅使用beam + csv并避免使用tft.我现在也正在使用

I have changed things to just use beam + csv and avoid tft. Also i'm now using the approach as outlined here for extending the canned estimator to get the key back with the predictions.

但是,当遵循 这篇文章以尝试获取评论,因为我现在遇到了一个新错误.

However when following this post to try get the comments in as features i'm now running into a new error.

The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name

The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name

针对此尝试/方法的我的仓库是此处.如果我只是使用subreddit作为功能,那么所有这些运行都很好,它添加了comment功能,这似乎引起了问题.行 103至111 是我遵循的

My repo for this attempt/approach is here. This all runs fine if i just use subreddit as a feature, it's adding in the comment feature that seems to be causing the problems. Lines 103 to 111 is where i have followed this approach.

不知道是什么原因触发了读取代码中的错误.任何想法吗?

Not sure what's triggering the error in my code from reading the trace. Anyone any ideas?

或者谁能指出我的另一种方法,即从文本到鞠躬再到TF中的嵌入功能?

Or can anyone point me towards another approach to go from text to bow to embedding feature in TF?

推荐答案

请参阅:

这是通过键传递代码的样子:

Here's what the code looks like to pass through keys:

def forward_key_to_export(estimator):
    estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)

    ## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
    config = estimator.config
    def model_fn2(features, labels, mode):
      estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
      if estimatorSpec.export_outputs:
        for ekey in ['predict', 'serving_default']:
          estimatorSpec.export_outputs[ekey] = \
            tf.estimator.export.PredictOutput(estimatorSpec.predictions)
      return estimatorSpec
    return tf.estimator.Estimator(model_fn=model_fn2, config=config)
    ##

# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
    estimator = tf.estimator.DNNLinearCombinedRegressor(...)
    estimator = forward_key_to_export(estimator)
    ...
    tf.estimator.train_and_evaluate(estimator, ...)

这篇关于将google-cloud-ml github Reddit示例从回归转换为分类并添加键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆