如何在 tensorflow 中读取 utf-8 编码的二进制字符串? [英] How to read a utf-8 encoded binary string in tensorflow?

查看:32
本文介绍了如何在 tensorflow 中读取 utf-8 编码的二进制字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将编码的字节字符串转换回张量流图中的原始数组(使用张量流操作),以便在张量流模型中进行预测.数组到字节的转换基于这个答案,它是谷歌云机器学习引擎上张量流模型预测的建议输入.

I am trying to convert an encoded byte string back into the original array in the tensorflow graph (using tensorflow operations) in order to make a prediction in a tensorflow model. The array to byte conversion is based on this answer and it is the suggested input to tensorflow model prediction on google cloud's ml-engine.

def array_request_example(input_array):
    input_array = input_array.astype(np.float32)
    byte_string = input_array.tostring()
    string_encoded_contents = base64.b64encode(byte_string)
    return string_encoded_contents.decode('utf-8')}

Tensorflow 代码

Tensorflow code

byte_string = tf.placeholder(dtype=tf.string)
audio_samples = tf.decode_raw(byte_string, tf.float32)

audio_array = np.array([1, 2, 3, 4])
bstring = array_request_example(audio_array)
fdict = {byte_string: bstring}
with tf.Session() as sess:
    [tf_samples] = sess.run([audio_samples], feed_dict=fdict)

我尝试使用 decode_rawdecode_base64 但都不返回原始值.

I have tried using decode_raw and decode_base64 but neither return the original values.

我尝试将 decode raw 的 out_type 设置为不同的可能数据类型,并尝试更改我将原始数组转换为的数据类型.

I have tried setting the the out_type of decode raw to the different possible datatypes and tried altering what data type I am converting the original array to.

那么,我将如何读取 tensorflow 中的字节数组?谢谢:)

So, how would I read the byte array in tensorflow? Thanks :)

这背后的目的是为自定义 Estimator 创建服务输入函数,以使用 gcloud ml-engine local predict(用于测试)并使用 REST API 对存储在云上的模型进行预测.

The aim behind this is to create the serving input function for a custom Estimator to make predictions using gcloud ml-engine local predict (for testing) and using the REST API for the model stored on the cloud.

Estimator 的服务输入函数是

The serving input function for the Estimator is

def serving_input_fn():
    feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
                                                  shape=[None],
                                                  name='source')}
    audio_samples = tf.decode_raw(feature_placeholders['b64'], tf.float32)
    # Dummy function to save space
    power_spectrogram = create_spectrogram_from_audio(audio_samples)
    inputs = {'spectrogram': power_spectrogram}
    return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)

Json 请求

我使用 .decode('utf-8') 因为在尝试 json 转储 base64 编码的字节字符串时,我收到此错误

Json request

I use .decode('utf-8') because when attempting to json dump the base64 encoded byte strings I receive this error

raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'longbytestring'

预测错误

当使用 gcloud local 传递 json 请求 {'audio_bytes': 'b64': bytestring} 时,我收到错误

Prediction Errors

When passing the json request {'audio_bytes': 'b64': bytestring} with gcloud local I get the error

PredictionError: Invalid inputs: Expected tensor name: b64, got tensor name: [u'audio_bytes']

那么也许 google cloud local predict 不会自动处理音频字节和 base64 转换?或者我的估算器设置可能有问题.

So perhaps google cloud local predict does not automatically handle the audio bytes and base64 conversion? Or likely somethings wrong with my Estimator setup.

对 REST API 的请求 {'instances': [{'audio_bytes': 'b64': bytestring}]} 给出

And the request {'instances': [{'audio_bytes': 'b64': bytestring}]} to REST API gives

{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Input to DecodeRaw has length 793713 that is not a multiple of 4, the size of float
	 [[Node: DecodeRaw = DecodeRaw[_output_shapes=[[?,?]], little_endian=true, out_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_source_0_0)]]")'}

这让我很困惑,因为我明确地将请求定义为浮点数并在服务输入接收器中执行相同操作.

which confuses me as I explicitly define the request to be a float and do the same in the serving input receiver.

从请求中删除 audio_bytes 并对字节字符串进行 utf-8 编码允许我得到预测,尽管在本地测试解码时,我认为音频是从字节字符串错误地转换的.

Removing audio_bytes from the request and utf-8 encoding the byte strings allows me to get predictions, though in testing the decoding locally, I think the audio is being incorrectly converted from the byte string.

推荐答案

您引用的 answer 是假设您正在 CloudML Engine 的服务上运行模型.该服务实际上负责处理 JSON(包括 UTF-8)和 base64 编码.

The answer that you referenced, is written assuming you are running the model on CloudML Engine's service. The service actually takes care of the JSON (including UTF-8) and base64 encoding.

要让您的代码在本地或其他环境中运行,您需要进行以下更改:

To get your code working locally or in another environment, you'll need the following changes:

def array_request_example(input_array):
    input_array = input_array.astype(np.float32)
    return input_array.tostring()

byte_string = tf.placeholder(dtype=tf.string)
audio_samples = tf.decode_raw(byte_string, tf.float32)

audio_array = np.array([1, 2, 3, 4])
bstring = array_request_example(audio_array)
fdict = {byte_string: bstring}
with tf.Session() as sess:
    tf_samples = sess.run([audio_samples], feed_dict=fdict)

也就是说,根据您的代码,我怀疑您希望将数据作为 JSON 发送;您可以使用 gcloud local predict 来模拟 CloudML Engine 的服务.或者,如果您更喜欢编写自己的代码,可能是这样的:

That said, based on your code, I suspect you are looking to send data as JSON; you can use gcloud local predict to simulate CloudML Engine's service. Or, if you prefer to write your own code, perhaps something like this:

def array_request_examples,(input_arrays):
  """input_arrays is a list (batch) of np_arrays)"""
  input_arrays = (a.astype(np.float32) for a in input_arrays)
  # Convert each image to byte strings
  bytes_strings = (a.tostring() for a in input_arrays)
  # Base64 encode the data
  encoded = (base64.b64encode(b) for b in bytes_strings)
  # Create a list of images suitable to send to the service as JSON:
  instances = [{'audio_bytes': {'b64': e}} for e in encoded]
  # Create a JSON request
  return json.dumps({'instances': instances})

def parse_request(request):
  # non-TF to simulate the CloudML Service which does not expect
  # this to be in the submitted graphs.
  instances = json.loads(request)['instances']
  return [base64.b64decode(i['audio_bytes']['b64']) for i in instances]

byte_strings = tf.placeholder(dtype=tf.string, shape=[None])
decode = lambda raw_byte_str: tf.decode_raw(raw_byte_str, tf.float32)
audio_samples = tf.map_fn(decode, byte_strings, dtype=tf.float32)

audio_array = np.array([1, 2, 3, 4])
request = array_request_examples([audio_array])
fdict = {byte_strings: parse_request(request)}
with tf.Session() as sess:
  tf_samples = sess.run([audio_samples], feed_dict=fdict)

这篇关于如何在 tensorflow 中读取 utf-8 编码的二进制字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆