如何在 GCP AI Platform 上使用 TFRecord 文件进行批量预测? [英] How to use a TFRecord file for batch prediction on GCP AI Platform?
问题描述
TL;博士Google Cloud AI Platform 在进行批量预测时如何解压 TFRecord
文件?
TL;DR
How does Google Cloud AI Platform unpack TFRecord
files when doing batch predictions?
我已将经过训练的 Keras 模型部署到 Google Cloud AI Platform,但我在批处理预测的文件格式方面遇到了问题.对于训练,我使用 tf.data.TFRecordDataset
读取 TFRecord
列表,如下所示,一切正常.
I have deployed a trained Keras model to Google Cloud AI Platform, but I'm having trouble with the file format for batch predictions. For training I'm using a tf.data.TFRecordDataset
to read a list of TFRecord
like the following which all works fine.
def unpack_tfrecord(record):
parsed = tf.io.parse_example(record, {
'chunk': tf.io.FixedLenFeature([128, 2, 3], tf.float32), # Input
'class': tf.io.FixedLenFeature([2], tf.int64), # One-hot classification (binary)
})
return (parsed['chunk'], parsed['class'])
files = [str(p) for p in training_chunks_path.glob('*.tfrecord')]
dataset = tf.data.TFRecordDataset(files).batch(32).map(unpack_tfrecord)
model.fit(x=dataset, epochs=train_epochs)
tf.saved_model.save(model, model_save_path)
我将保存的模型上传到 Cloud Storage 并在 AI Platform 中创建一个新模型.AI Platform 文档指出Batch with gcloud tool [supports] Text file with JSON instance strings or TFRecord file (may becompressed)"(https://cloud.google.com/ai-platform/prediction/docs/overview#prediction_input_data).但是当我提供 TFRecord 文件时,我收到错误:
I upload the saved model to Cloud Storage and create a new model in AI Platform. AI Platform documentation states that "Batch with gcloud tool [supports] Text file with JSON instance strings or TFRecord file (may be compressed)" (https://cloud.google.com/ai-platform/prediction/docs/overview#prediction_input_data). But when I provide a TFRecord file i get the error:
("'utf-8' codec can't decode byte 0xa4 in position 1: invalid start byte", 8)
我的 TFRecord 文件包含一堆 Protobuf 编码的 tf.train.Example
.我没有向 AI Platform 提供 unpack_tfrecord
函数,所以我想它无法正确解包是有道理的,但我知道从这里开始的节点.由于数据太大,我对使用 JSON 格式不感兴趣.
My TFRecord file contains a bunch of Protobuf encoded tf.train.Example
. I'm not providing the unpack_tfrecord
function to AI Platform, so I guess it makes sense that it can not unpack it properly, but I have node idea where to go from here. I'm not interested in using the JSON format as the data is too large.
推荐答案
我不知道这是否是解决此问题的最佳方式,但对于 TF 2.x,您可以执行以下操作:
I don't know if this is the best way of going about this but for TF 2.x you can do something like:
import tensorflow as tf
def make_serving_input_fn():
# your feature spec
feature_spec = {
'chunk': tf.io.FixedLenFeature([128, 2, 3], tf.float32),
'class': tf.io.FixedLenFeature([2], tf.int64),
}
serialized_tf_examples = tf.keras.Input(
shape=[], name='input_example_tensor', dtype=tf.string)
examples = tf.io.parse_example(serialized_tf_examples, feature_spec)
# any processing
processed_chunks = tf.map_fn(
<PROCESSING_FN>,
examples['chunk'], # ?
dtype=tf.float32)
return tf.estimator.export.ServingInputReceiver(
features={<MODEL_FIRST_LAYER_NAME>: processed_chunks},
receiver_tensors={"input_example_tensor": serialized_tf_examples}
)
estimator = tf.keras.estimator.model_to_estimator(
keras_model=model,
model_dir=<ESTIMATOR_SAVE_DIR>)
estimator.export_saved_model(
export_dir_base=<WORKING_DIR>,
serving_input_receiver_fn=make_serving_input_fn)
这篇关于如何在 GCP AI Platform 上使用 TFRecord 文件进行批量预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!