Google Cloud ML Engine + Tensorflow在input_fn（）中执行预处理/标记化 [英] Google Cloud ML Engine + Tensorflow perform preprocessing/tokenization in input_fn()

查看：375 发布时间：2018/5/10 13:35:02 python tensorflow google-cloud-platform google-cloud-ml google-cloud-ml-engine

本文介绍了Google Cloud ML Engine + Tensorflow在input_fn（）中执行预处理/标记化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在我的输入函数中执行基本的预处理和标记。我的数据包含在我无法修改的谷歌云存储存储位置（gs：//）中的csv中。此外，我对我的ml-engine包中的输入文本进行任何修改，以便在服务时间内复制行为。

我的输入函数遵循以下基本结构：

  filename_queue = tf.train.string_input_producer（文件名）
 reader = tf.TextLineReader（）
 _，rows = reader.read_up_to（filename_queue，num_records = batch_size）
 text，label = tf.decode_csv（rows，record_defaults = [[]，[]]）
 
 ＃添加逻辑来过滤特殊字符
＃添加逻辑以使所有单词小写
 words = tf.string_split（text）＃根据空格分割

是否有任何选项可以避免预先对整个数据集进行预处理？这个帖子表明可以使用tf.py_func（）来进行这些转换，但他们建议缺点是，因为它没有保存在图表中，所以我不能恢复我保存的模型，所以我不相信这对于服务时间是有用的。如果我正在定义我自己的tf.py_func（）来执行预处理，并且在培训师包中定义了我正在上传到云中，我是否会遇到任何问题？有没有其他的选择我不考虑？

解决方案

最好的做法是编写一个函数， / eval input_fn并从您的服务input_fn。

例如：

def add_engineered（features）： text = features ['text'] features ['words'] = tf.string_split（text）返回特征然后，在你的input_fn中，通过调用add_engineered来包装你返回的特性： def input_fn（）： features = ... label = ... return add_engineered（features），label并在您的serving_input fn中，请确保以调用add_engineered的方式将返回的功能（不是feature_placeholders） def serving_input_fn（）： feature_placeholders = ... 特征= ... 返回tflearn.utils.inp ut_fn_utils.InputFnOps（ add_engineered（features）， None， feature_placeholders ）您的模型将使用单词。然而，你在预测时输入的JSON只需要包含'文本'即原始值。下面是一个完整的工作示例： https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/taxifare/trainer/model.py#L107 I want to perform basic preprocessing and tokenization within my input function. My data is contained in csv's in a google cloud storage bucket location (gs://) that I cannot modify. Further, I to perform any modifications on input text within my ml-engine package so that the behavior can be replicated at serving time. my input function follows the basic structure below: filename_queue = tf.train.string_input_producer(filenames) reader = tf.TextLineReader() _, rows = reader.read_up_to(filename_queue, num_records=batch_size) text, label = tf.decode_csv(rows, record_defaults = [[""],[""]]) # add logic to filter special characters # add logic to make all words lowercase words = tf.string_split(text) # splits based on white space Are there any options that avoid performing this preprocessing on the entire data set in advance? This post suggests that tf.py_func() can be used to make these transformations, however they suggest that "The drawback is that as it is not saved in the graph, I cannot restore my saved model" so I am not convinced that this will be useful at serving time. If I am defining my own tf.py_func() to do preprocessing and it is defined in the trainer package that I am uploading to the cloud will I run into any issues? Are there any alternative options that I am not considering? 解决方案 Best practice is to write a function that you call from both the training/eval input_fn and from your serving input_fn. For example: def add_engineered(features): text = features['text'] features['words'] = tf.string_split(text) return features Then, in your input_fn, wrap the features you return with a call to add_engineered: def input_fn(): features = ... label = ... return add_engineered(features), label and in your serving_input fn, make sure to similarly wrap the returned features (NOT the feature_placeholders) with a call to add_engineered: def serving_input_fn(): feature_placeholders = ... features = ... return tflearn.utils.input_fn_utils.InputFnOps( add_engineered(features), None, feature_placeholders ) Your model would use 'words'. However, your JSON input at prediction time would only need to contain 'text' i.e. the raw values. Here's a complete working example: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/taxifare/trainer/model.py#L107 这篇关于Google Cloud ML Engine + Tensorflow在input_fn（）中执行预处理/标记化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Google Cloud ML Engine + Tensorflow在input_fn（）中执行预处理/标记化 [英] Google Cloud ML Engine + Tensorflow perform preprocessing/tokenization in input_fn()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Google Cloud ML Engine + Tensorflow在input_fn（）中执行预处理/标记化 [英] Google Cloud ML Engine + Tensorflow perform preprocessing/tokenization in input_fn()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭