当Json请求中包含"_bytes"字样时,Google Cloud ml-engine会做什么?或"b64"? [英] What does google cloud ml-engine do when a Json request contains "_bytes" or "b64"?

查看:67
本文介绍了当Json请求中包含"_bytes"字样时,Google Cloud ml-engine会做什么?或"b64"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google云文档(请参见预测输入中的二进制数据)状态:

The google cloud documentation (see Binary data in prediction input) states:

您的编码字符串必须被格式化为一个JSON对象,并带有一个 名为b64的密钥.以下Python示例对原始缓冲区进行编码 使用base64库创建实例的JPEG数据:

Your encoded string must be formatted as a JSON object with a single key named b64. The following Python example encodes a buffer of raw JPEG data using the base64 library to make an instance:

{"image_bytes":{"b64": base64.b64encode(jpeg_data)}}

在TensorFlow模型代码中,您必须为您的别名命名 输入和输出张量,使其以"_bytes"结尾.

In your TensorFlow model code, you must name the aliases for your input and output tensors so that they end with '_bytes'.

我想了解更多有关此过程在Google云端方面的工作方式的信息.

I would like to understand more about how this process works on the google cloud side.

  • ml引擎是否自动解码"b64"之后的任何内容 字符串到字节数据?

  • Is the ml-engine automatically decoding any content after the "b64" string to byte data?

当请求具有此嵌套结构时,它仅传递给 服务输入功能的"b64"部分,然后删除 "image_bytes"键?

When the request has this nested structure, does it only pass in the "b64" section to the serving input function and remove the "image_bytes" key?

每个请求是否分别传递给服务输入函数或 他们分批吗?

Is each request passed individually to the serving input function or are they batched?

我们是否在服务输入函数返回的ServingInputReceiver中定义输入输出别名?

Do we define the input output aliases in the ServingInputReceiver returned by the serving input function?

我发现无法创建使用此嵌套结构定义要素占位符的服务输入函数.我只在自己的机器中使用"b64",并且不确定gcloud ml-engine在收到请求时会做什么.

I have found no way to create a serving input function which uses this nested structure to define the feature placeholders. I only use "b64" in mine and I am not sure what the gcloud ml-engine does on receiving the requests.

此外,当使用gcloud ml-engine local predict在本地进行预测时,使用嵌套结构发送请求失败(未预料的关键image_bytes,因为在服务输入函数中未定义).但是,在使用gcloud ml-engine predict进行预测时,即使服务输入函数不包含对"image_bytes"的引用,也可以使用嵌套结构发送请求.当忽略"image_bytes"并仅传入"b64"时,gcloud预测也可以使用.

Additionally when predicting locally using gcloud ml-engine local predict, sending the request with the nested structure fails, (unexpected key image_bytes as it is not defined in the serving input function). But when predicting using gcloud ml-engine predict, sending requests with the nested structure works even when the serving input function contains no reference to "image_bytes". The gcloud predict also works when leaving out "image_bytes" and passing in just "b64".

提供输入功能的示例

def serving_input_fn():
    feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
                                                  shape=[None],
                                                  name='source')}
    single_image = tf.decode_raw(feature_placeholders['b64'], tf.float32)
    inputs = {'image': single_image}
    return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)

我给出了使用图像的示例,但我认为这同样适用于以字节和base64编码形式发送的所有类型的数据.

I gave the example using images but I assume the same should apply to all types of data sent as bytes and base64 encoded.

有很多stackoverflow问题,其中包含需要在信息片段中包含"_bytes"的引用,但是如果有人可以详细解释发生了什么,那么我会发现它很有用,因为我不会在格式化请求时如此碰上和错过.

There are a lot of stackoverflow questions which contain references to the need to include "_bytes" with snippets of information, but I would find it useful if someone could explain a bit more in detail whats going on as then I wouldn't be so hit and miss when formatting requests.

关于此主题的Stackoverflow问题

Stackoverflow questions on this topic

如何转换在Google机器学习中将jpeg图像转换成json文件

如何正确预测jpeg图片以cloud-ml

Base64图像与Keras和Google Cloud ML一起使用

如何阅读tensorflow中的utf-8编码的二进制字符串?

推荐答案

为帮助您阐明一些问题,让我从预测请求的基本结构入手:

To help clarify some of the questions you have, allow me to start with the basic anatomy of a prediction request:

{"instances": [<instance>, <instance>, ...]}

其中instance是JSON对象(dict/map,以下将使用Python术语"dict"),属性/键是输入的名称,其值包含该输入的数据.

Where instance is a JSON object (dict/map, I'll use the Python term "dict" hereafter) and the attributes/keys are the names of the inputs with values containing the data for that input.

云服务的功能(并且gcloud ml-engine local predict使用与该服务相同的基础库)是它获取字典列表(可以将其视为数据行),然后将其转换为列表字典(可以认为是包含一批实例的列数据,其键与原始数据中的键相同.例如,

What the cloud service does (and gcloud ml-engine local predict uses the same underlying libraries as the service) is it takes the list of dicts (which can be thought of as rows of data) and then converts it to a dict of lists (which can be thought of as columnar data containing batches of instances) with the same keys as in the original data. For example,

{"instances": [{"x": 1, "y": "a"}, {"x": 3, "y": "b"}, {"x": 5, "y": "c"}]}

(内部)成为

{"x": [1, 3, 5], "y": ["a", "b", "c"]}

此字典中的键(因此,在原始请求的实例中)必须与传递给ServingInputFnReceiver的字典的键相对应.从该示例可以明显看出,该服务将所有数据批处理",这意味着实例的 all 作为一个批处理被馈送到图中.这就是为什么输入必须的形状的外部尺寸为None的原因-它是批处理尺寸,在发出请求之前是未知的(因为每个请求可能具有不同数量的实例).导出图形以接受上述请求时,可以定义如下函数:

The keys in this dict (and hence, in the instance in the original request) must correspond to the keys of the dict passed to the ServingInputFnReceiver. It should be apparent from this example that the service "batches" all of the data, meaning all of the instances are fed into the graph as a single batch. That's why the outer dimension of the shape of the inputs must be None -- it is the batch dimension and it is not known before a request is made (since each request may have different number of instances). When exporting a graph to accept the above requests, you might define a function like this:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
            'y': tf.placeholder(dtype=tf.string, shape=[None]}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs) 

由于JSON不(直接)支持二进制数据,并且由于TensorFlow无法将字符串"与字节"区分开,因此我们需要特别对待二进制数据.首先,我们需要所述输入的名称以"_bytes"结尾,以帮助区分文本字符串和字节字符串.使用上面的示例,假设y包含二进制数据而不是文本.我们将声明以下内容:

Since JSON does not (directly) support binary data and since TensorFlow has no way of distinguishing "strings" from "bytes", we need to treat binary data specially. First of all, we need the name of said inputs to end in "_bytes" to help differentiate a text string from a byte string. Using the example above, suppose y contained binary data instead of text. We would declare the following:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None]),
            'y_bytes': tf.placeholder(dtype=tf.string, shape=[None]}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs) 

请注意,唯一更改的地方是使用y_bytes而不是y作为输入的名称.

Notice that the only thing that changed was using y_bytes instead of y as the name of the input.

接下来,我们需要对数据进行base64编码.在可以接受字符串的任何地方,我们都可以改用这样的对象:{"b64":"}.适应正在运行的示例,请求可能类似于:

Next, we need to actually base64 encode the data; anywhere where a string would be acceptable, we can instead use an object like so: {"b64": ""}. Adapting the running example, a request might look like:

{
  "instances": [
    {"x": 1, "y_bytes": {"b64": "YQ=="}},
    {"x": 3, "y_bytes": {"b64": "Yg=="}},
    {"x": 5, "y_bytes": {"b64": "Yw=="}}
  ]
}

在这种情况下,服务执行的操作与之前完全相同,但是增加了一个步骤:在发送给TensorFlow之前,它会自动对字符串进行base64解码(并用字节替换" {"b64":...}对象)字节. .因此TensorFlow实际上最终得到了与以前完全相同的dict:

In this case the service does exactly what it did before, but adding one step: it automatically base64 decodes the string (and "replaces" the {"b64": ...} object with the bytes) before sending to TensorFlow. So TensorFlow actually ends up with a dict like exactly as before:

{"x": [1, 3, 5], "y_bytes": ["a", "b", "c"]}

(请注意,输入的名称未更改.)

(Note that the name of the input has not changed.)

当然,base64文本数据是毫无意义的.您通常会这样做,例如,对于无法通过JSON发送任何其他方式的图像数据,但是我希望以上示例足以说明这一点.

Of course, base64 textual data is kind of pointless; you'd usually do this, e.g., for image data which can't be sent any other way over JSON, but I hope the above example is sufficient to illustrate the point anyways.

还有一个重要的要点:该服务支持一种简写形式.当TensorFlow模型中只有一个输入时,无需在实例列表中的每个对象中不断重复该输入的名称.举例说明,假设导出仅包含x的模型:

There's another important point to be made: the service supports a type of shorthand. When there is exactly one input to your TensorFlow model, there's no need to incessantly repeat the name of the that input in every single object in your list of instances. To illustrate, imagine exporting a model with only x:

def serving_input_fn():
  inputs = {'x': tf.placeholder(dtype=tf.int32, shape=[None])}
  return tf.estimator.export.ServingInputReceiver(inputs, inputs) 

长格式"请求如下所示:

The "long form" request would look like this:

{"instances": [{"x": 1}, {"x": 3}, {"x": 5}]}

相反,您可以像下面这样以简写形式发送请求:

Instead, you can send a request in shorthand, like so:

{"instances": [1, 3, 5]}

请注意,这甚至适用于base64编码的数据.因此,例如,如果我们只导出了y_bytes而不是仅导出x,我们可以简化以下请求:

Note that this applies even for base64 encoded data. So, for instance, if instead of only exporting x, we had only exported y_bytes, we could simplify the requests from:

{
  "instances": [
    {"y_bytes": {"b64": "YQ=="}},
    {"y_bytes": {"b64": "Yg=="}},
    {"y_bytes": {"b64": "Yw=="}}
  ]
}

收件人:

{
  "instances": [
    {"b64": "YQ=="},
    {"b64": "Yg=="},
    {"b64": "Yw=="}
  ]
}

在许多情况下,这只是一个小小的胜利,但是它绝对有助于提高可读性,例如,当输入包含CSV数据时.

In many cases this is only a small win, but it definitely aids readability, e.g., when the inputs contain CSV data.

因此,总而言之,它可以适应您的特定情况,这就是您的服务功能应如下所示:

So putting it altogether to adapt to your specific scenario, here's what your serving function should look like:

def serving_input_fn():
  feature_placeholders = {
    'image_bytes': tf.placeholder(dtype=tf.string, shape=[None], name='source')}
    single_image = tf.decode_raw(feature_placeholders['image_bytes'], tf.float32)
    return tf.estimator.export.ServingInputReceiver(feature_placeholders, feature_placeholders)

与您当前代码的显着区别:

Notable differences from your current code:

  • 输入名称为不是 b64,但为image_bytes(可以是任何以_bytes结尾的内容)
  • feature_placeholders用作两者的参数 ServingInputReceiver
  • Name of the input is not b64, but image_bytes (could be anything that ends in _bytes)
  • feature_placeholders is used as both arguments to ServingInputReceiver

一个示例请求可能看起来像这样:

And a sample request might look like this:

{
  "instances": [
    {"image_bytes": {"b64": "YQ=="}},
    {"image_bytes": {"b64": "Yg=="}},
    {"image_bytes": {"b64": "Yw=="}}
  ]
}

或者(可选)简而言之:

Or, optionally, in short hand:

{
  "instances": [
    {"b64": "YQ=="},
    {"b64": "Yg=="},
    {"b64": "Yw=="}
  ]
}

最后一个最后的音符. gcloud ml-engine local predictgcloud ml-engine predict根据传入的文件的内容构造请求.非常重要的一点是要注意,文件的内容当前不是一个完整,有效的请求,而是--json-instances文件的每一行成为实例列表中的一项.具体来说,文件看起来像这样(换行在这里很有意义):

One last final note. gcloud ml-engine local predict and gcloud ml-engine predict construct the request based on the contents of the file passed in. It is very important to note that the content of the file is currently not a full, valid request, but rather each line of the --json-instances file becomes one entry in the list of instances. Specifically in your case, the file will look like (newlines are meaningful here):

{"image_bytes": {"b64": "YQ=="}}
{"image_bytes": {"b64": "Yg=="}}
{"image_bytes": {"b64": "Yw=="}}

或等效的速记. gcloud将采用每一行并构造上面显示的 actual 请求.

or the equivalent shorthand. gcloud will take each line and construct the actual request shown above.

这篇关于当Json请求中包含"_bytes"字样时,Google Cloud ml-engine会做什么?或"b64"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆