Google Cloud ML Engine错误429内存不足 [英] Google Cloud ML Engine Error 429 Out of Memory

查看:92
本文介绍了Google Cloud ML Engine错误429内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将模型上传到ML引擎,并且在尝试进行预测时收到以下错误:

I uploaded my model to ML-engine and when trying to make a prediction I receive the following error:

ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {   "error": {
    "code": 429,
    "message": "Prediction server is out of memory, possibly because model size is too big.",
    "status": "RESOURCE_EXHAUSTED"   } }

我的模型大小为151.1 MB.我已经从Google Cloud网站执行了所有建议的操作,例如quantize.是否有可能的解决方案或我可以做的其他任何事情来使它起作用?

My model size is 151.1 MB. I already did all the suggested actions from google cloud website such as quantise. Is there a possible solution or any other thing I could do to make it work?

谢谢

推荐答案

通常,这种大小的模型不应导致OOM.由于TF进行了大量的延迟初始化,因此直到第一个请求初始化数据结构的请求时,才会检测到某些OOM.在极少数情况下,某些图形可能会在内存中爆炸10倍,从而导致OOM.

Typically a model of this size should not result in OOM. Since TF does a lot of lazy initialization, some OOMs won't be detected until the first request to initialize the data structure. In rare case certain graph can explode 10x in memory causing OOM.

1)您是否始终看到预测误差?由于Tensorflow调度节点的方式,同一图形的内存使用量在各次运行中可能会有所不同.确保多次运行预测,然后每次查看是否为429.

1) Did you see the prediction error consistently? Due to the way Tensorflow schedules nodes the memory usage for the same graph might be different across runs. Make sure to run prediction multiple times and see if it's 429 every time.

2)请确保您的SavedModel目录的大小为151.1MB.

2) Please make sure 151.1MB is the size of your SavedModel Directory.

3)您还可以在本地调试峰值内存,例如在运行gcloud ml-engine local predict时使用top或通过将模型加载到docker容器的内存中并使用docker stats或其他某种方式来监视内存使用情况.您可以尝试使用tensorflow服务进行调试( https://www.tensorflow.org/serving/serving_basic)并发布结果.

3) You can also debug the peak memory locally, for instance using top when running gcloud ml-engine local predict or by loading the model into memory in a docker container and use docker stats or some other way to monitor memory usage. You can try tensorflow serving for debugging (https://www.tensorflow.org/serving/serving_basic) and post the results.

4)如果您发现内存问题仍然存在,请联系cloudml-feedback@google.com以获得进一步的帮助,请确保您包括了您的项目编号和关联的帐户以进行进一步的调试.

4) If you find the memory problem is persistent, please contact cloudml-feedback@google.com for further assistance, make sure you include your project number and associated account for further debugging.

这篇关于Google Cloud ML Engine错误429内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆