调用 InvokeEndpoint 操作时发生错误 (InternalFailure):向模型发送请求时发生异常 [英] An error occurred (InternalFailure) when calling the InvokeEndpoint operation: An exception occurred while sending request to model

查看:60
本文介绍了调用 InvokeEndpoint 操作时发生错误 (InternalFailure):向模型发送请求时发生异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 AWS Sagemaker 终端节点上托管已在本地训练的 XGBoost 模型,但在调用终端节点时收到以下错误:

I am trying to host an XGBoost model that I have trained locally on an AWS Sagemaker Endpoint but I am receiving the following error when invoking the endpoint:

调用 InvokeEndpoint 操作时发生错误 (InternalFailure)(达到最大重试次数:4):向模型发送请求时发生异常.有关请求,请联系客户支持.

An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4): An exception occurred while sending request to model. Please contact customer support regarding request.

模型在本地按预期工作,我在上传到 S3 之前使用以下内容保存它:

The model works as expected locally and I save it using the following before uploading to S3:

model.fit(args)
model.save_model(model_save_loc)
model_tar_loc = model_save_loc + '.tar.gz'
!tar czvf $model_tar_loc $model_save_loc

我通过 MultiDataModel 函数托管模型,

I am hosting the model through the MultiDataModel function,

container = retrieve("xgboost", region, "1.3-1")
mme = MultiDataModel(
    name=model_name,
    role=role,
    model_data_prefix=model_data_prefix,
    image_uri=container,
    sagemaker_session=sagemaker_session,
)

predictor = mme.deploy(
    initial_instance_count=1, instance_type=instance_type, endpoint_name=model_name,     
)

MultiDataModel 部署按预期工作,没有错误,如果我这样做:

The MultiDataModel deploy works as expected with no errors, and if I do:

list(mme.list_models())

它返回预期的模型列表:

It returns the expected list of models:

model_1.tar.gz
model_2.tar.gz
etc..

我使用以下方法调用模型:

I invoke the model using the following:

runtime_client = boto3.client("runtime.sagemaker")

response = runtime_client.invoke_endpoint(
    EndpointName="model_name", ContentType="text/csv", Body=payload, TargetModel='model_1.tar.gz'
)
result = response["Body"].read().decode("ascii")

我尝试了各种创建有效负载的方法,但都没有改变错误消息.

I have experimented with various ways of creating the payload but none change the error message.

本地 XGBoost 模型使用 XGBoost 1.3.1 版本(与 Docker 版本相同)进行训练.

The local XGBoost model was trained using XGBoost version 1.3.1 (same as the Docker version).

CloudWatch 仅提供以下内容:

CloudWatch provides only the following:

2021-06-26 10:48:36,865 [INFO] pool-1-thread-1 ACCESS_LOG -/10.32.0.2:37106 GET/ping HTTP/1.1"200 0

2021-06-26 10:48:36,865 [INFO ] pool-1-thread-1 ACCESS_LOG - /10.32.0.2:37106 "GET /ping HTTP/1.1" 200 0

根据错误提示,无法通过基本计划联系客户支持.

There is no way of contacting customer support through the basic plan, as advised by the error.

推荐答案

我通过尝试单独托管每个端点而不是使用 MultiDataModel 解决了这个问题,后者在 CloudWatch 中提供了更多详细的日志错误.

I solved this issue by trying to host each endpoint individually instead of using MultiDataModel, which provided more detail Log Errors in CloudWatch.

对我来说,错误是我的模型被保存为:

For me, the error was that my models were saved as:

model-1.tar.gz ->模型/模型-1

model-1.tar.gz -> models/model-1

默认情况下,XGBoost 容器会搜索model-1-tar.gz";文件文件夹(解压缩后),而我的模型是在子文件夹中找到的.将其提升一个级别解决了问题.

By default, the XGBoost container will search "model-1-tar.gz" folder for files (after unzipping), whilst my model was found within a subfolder. Moving this up a level solved the issue.

这篇关于调用 InvokeEndpoint 操作时发生错误 (InternalFailure):向模型发送请求时发生异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆