为什么我在Azure容器实例中进行ML模型部署仍然失败? [英] Why does my ML model deployment in Azure Container Instance still fail?

查看:109
本文介绍了为什么我在Azure容器实例中进行ML模型部署仍然失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Azure机器学习服务将ML模型部署为Web服务。





另外,运行 service.get_logs()给我


WebserviceException:收到来自模型管理
服务的错误响应:响应代码:404


可能是罪魁祸首?

解决方案

如果ACI部署失败,一种解决方案正在尝试分配 less 个资源,例如

  aciconfig = AciWebservice.deploy_configuration (cpu_cores = 1,
memory_gb = 8,
标签= {数据:文本,方法: NB},
description ='预测某事')

虽然抛出的错误消息不是特别有用,但实际上在文档


当区域位于如果负载很大,则
部署实例时可能会失败。为了减轻这种部署失败,请尝试
部署具有较低资源设置的实例[...]


文档还指出这是不同区域中可用的CPU / RAM资源的最大值(在编写本文时,要求使用 memory_gb = 32 进行的部署可能会在所有区域中失败,因为



在需要更少的资源后,部署应该成功


创建服务

正在运行....................................... ......

SucceededACI服务创建操作已完成,操作

成功运行正常



I am using Azure Machine Learning Service to deploy a ML model as web service.

I registered a model and now would like to deploy it as an ACI web service as in the guide.

To do so I define

from azureml.core.webservice import Webservice, AciWebservice
from azureml.core.image import ContainerImage

aciconfig = AciWebservice.deploy_configuration(cpu_cores=4, 
                      memory_gb=32, 
                      tags={"data": "text",  "method" : "NB"}, 
                      description='Predict something')

and

image_config = ContainerImage.image_configuration(execution_script="score.py", 
                      docker_file="Dockerfile",
                      runtime="python", 
                      conda_file="myenv.yml")

and create an image with

image = ContainerImage.create(name = "scorer-image",
                      models = [model],
                      image_config = image_config,
                      workspace = ws
                      )

Image creation succeeds with

Creating image Image creation operation finished for image scorer-image:5, operation "Succeeded"

Also, troubleshooting the image by running it locally on an Azure VM with

sudo docker run -p 8002:5001 myscorer0588419434.azurecr.io/scorer-image:5

allows me to run (locally) queries successfully against http://localhost:8002/score.

However, deployment with

service_name = 'scorer-svc'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                        image = image,
                                        name = service_name,
                                        workspace = ws)

fails with

Creating service
Running.
FailedACI service creation operation finished, operation "Failed"
Service creation polling reached terminal state, current service state: Transitioning
Service creation polling reached terminal state, unexpected response received. Transitioning

I tried setting in the aciconfig more generous memory_gb, but to no avail: the deployment stays in a transitioning state (like in the image below if monitored on the Azure portal):

Also, running service.get_logs() gives me

WebserviceException: Received bad response from Model Management Service: Response Code: 404

What could possibly be the culprit?

解决方案

If ACI deployment fails, one solution is trying to allocate less resources, e.g.

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                  memory_gb=8, 
                  tags={"data": "text",  "method" : "NB"}, 
                  description='Predict something')

While the error messages thrown are not particularly informative, this is actually clearly stated in the documentation:

When a region is under heavy load, you may experience a failure when deploying instances. To mitigate such a deployment failure, try deploying instances with lower resource settings [...]

The documentation also states which are the maximum values of the CPU/RAM resources available in the different regions (at the time of writing, requiring a deployment with memory_gb=32 would likely fail in all regions because of insufficient resources).

Upon requiring less resources, deployment should succeed with

Creating service
Running......................................................
SucceededACI service creation operation finished, operation
"Succeeded" Healthy

这篇关于为什么我在Azure容器实例中进行ML模型部署仍然失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆