应用TensorFlow变换来变换/缩放生产中的特征 [英] Apply TensorFlow Transform to transform/scale features in production
问题描述
我按照以下指南编写了TF记录,在其中我使用了tf.Transform
来预处理我的功能.现在,我想部署我的模型,为此我需要在真实的实时数据上应用此预处理功能.
I followed the following guide to write TF Records, where I used tf.Transform
to preprocess my features. Now, I would like to deploy my model, for which I need apply this preprocessing function on real live data.
首先,假设我有两个功能:
First, suppose I have 2 features:
features = ['amount', 'age']
我来自Apache Beam的transform_fn
,位于working_dir=gs://path-to-transform-fn/
I have the transform_fn
from the Apache Beam, residing in working_dir=gs://path-to-transform-fn/
然后我使用以下命令加载转换函数:
Then I load the transform function using:
tf_transform_output = tft.TFTransformOutput(working_dir)
我认为在生产中使用的最简单方法是获取一堆处理过的数据,然后调用model.predict()
(我使用的是Keras模型).
I thought that the easiest way to serve in in production was to get a numpy array of processed data, and call model.predict()
(I am using Keras model).
为此,我认为 transform_raw_features()
方法正是我所需要的.
To do this, I thought transform_raw_features()
method is exactly what I need.
但是,似乎在构建模式之后:
However, it seems that after building the schema:
raw_features = {}
for k in features:
raw_features.update({k: tf.constant(1)})
print(tf_transform_output.transform_raw_features(raw_features))
我得到:
AttributeError: 'Tensor' object has no attribute 'indices'
现在,我认为这是因为在preprocessing_fn
中定义架构时使用了tf.VarLenFeature()
.
Now, I am assuming this happens because I used tf.VarLenFeature()
when I defined schema in my preprocessing_fn
.
def preprocessing_fn(inputs):
outputs = inputs.copy()
for _ in features:
outputs[_] = tft.scale_to_z_score(outputs[_])
然后我使用以下命令构建元数据
And I build the metadata using:
RAW_DATA_FEATURE_SPEC = {}
for _ in features:
RAW_DATA_FEATURE_SPEC[_] = tf.VarLenFeature(dtype=tf.float32)
RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
dataset_schema.from_feature_spec(RAW_DATA_FEATURE_SPEC))
简而言之,给定字典:
d = {'amount': [50], 'age': [32]}
,我想应用此transform_fn
,并适当缩放这些值以输入到我的模型中进行预测.这本词典正是PCollection
函数处理数据之前我PCollection
的格式.
d = {'amount': [50], 'age': [32]}
, I would like to apply this transform_fn
, and scale these values appropriately to input into my model for prediction. This dictionary is exactly the format of my PCollection
before the data is processed by the pre_processing()
function.
class BeamProccess():
def __init__(self):
# init
self.run()
def run(self):
def preprocessing_fn(inputs):
# outputs = { 'id' : [list], 'amount': [list], 'age': [list] }
return outputs
with beam.Pipeline(options=self.pipe_opt) as p:
with beam_impl.Context(temp_dir=self.google_cloud_options.temp_location):
data = p | "read_table" >> beam.io.Read(table_bq) \
| "create_data" >> beam.ParDo(ProcessFn())
transformed_dataset, transform_fn = (
(train, RAW_DATA_METADATA) | beam_impl.AnalyzeAndTransformDataset(
preprocessing_fn))
transformed_data, transformed_metadata = transformed_dataset
transformed_data | "WriteTrainTFRecords" >> tfrecordio.WriteToTFRecord(
file_path_prefix=self.JOB_DIR + '/train/data',
file_name_suffix='.tfrecord',
coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))
_ = (
transform_fn
| 'WriteTransformFn' >>
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
最后,ParDo()
是:
class ProcessFn(beam.DoFn):
def process(self, element):
yield { 'id' : [list], 'amount': [list], 'age': [list] }
推荐答案
问题在于代码段
raw_features = {}
for k in features:
raw_features.update({k: tf.constant(1)})
print(tf_transform_output.transform_raw_features(raw_features))
在此代码中,您构造了一个字典,其中的值是张量.就像您说的那样,这对于VarLenFeature
不起作用.代替使用tf.constant
尝试将tf.placeholder
用作FixedLenFeature
,将tf.sparse_placeholder
用作VarLenFeature
.
In this code you construct a dictionary where the values are tensors. Like you said, this won't work for a VarLenFeature
. Instead of using tf.constant
try using tf.placeholder
for a a FixedLenFeature
and tf.sparse_placeholder
for a VarLenFeature
.
这篇关于应用TensorFlow变换来变换/缩放生产中的特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!