Sagemaker:如何在预测器中设置 content_type(Sagemake > 2.0)? [英] Sagemaker : How do I set content_type in Predictor (Sagemake > 2.0)?

查看:36
本文介绍了Sagemaker:如何在预测器中设置 content_type(Sagemake > 2.0)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请求帮助解决以下错误.

Requesting assistance with the following error.

调用 InvokeEndpoint 时发生错误 (ModelError)操作:从带有消息的模型收到客户端错误 (415)不支持内容类型的应用程序/八位字节流.支持的内容类型为 text/csv、text/libsvm"

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (415) from model with message "Content-type application/octet-stream not supported. Supported content-type is text/csv, text/libsvm"

这是相关的代码-

from sagemaker import image_uris
from sagemaker.estimator import Estimator

xgboost_hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "num_round":"50"
}

xgboost_image = image_uris.retrieve("xgboost", boto3.Session().region_name, version="1")



estimator = Estimator(image_uri = xgboost_image,
                     hyperparameters = xgboost_hyperparameters,
                     role = role,
                     instance_count=1, 
                     instance_type='ml.m5.2xlarge', 
                      output_path= output_loc,
                     volume_size=5 )

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer

train_input = sagemaker.inputs.TrainingInput(s3_data = train_loc, content_type='text/csv',s3_data_type = 'S3Prefix')
valid_input = sagemaker.inputs.TrainingInput(s3_data = validation_loc, content_type='text/csv',s3_data_type = 'S3Prefix')

estimator.CONTENT_TYPE = 'text/csv'
estimator.serializer = CSVSerializer()
estimator.deserializer = None

estimator.fit({'train':train_input, 'validation': valid_input})

# deploy model with data config
from sagemaker.model_monitor import DataCaptureConfig
from time import gmtime, strftime
s3_capture_upload_path = 's3://{}/{}/monitoring/datacapture'.format(bucket, prefix)
model_name = 'project3--model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
endpoint_name = 'project3-endpoint'
data_capture_configuration = DataCaptureConfig(
                        enable_capture = True,
                        sampling_percentage=100,
                        destination_s3_uri=s3_capture_upload_path  )

deploy = estimator.deploy(initial_instance_count = 1,
                          instance_type = 'ml.m4.xlarge'    ,
                          data_capture_config=data_capture_configuration,
                          model_name=model_name,
                          endpoint_name = endpoint_name
                         )

然后我面临预测器

from sagemaker.predictor import Predictor

predictor = Predictor(endpoint_name=endpoint_name)
with open('test.csv', 'r') as f:
    for row in f:
        print(row)
        payload = row.rstrip('\n')
        response = predictor.predict(data=payload[2:])
        sleep(0.5)
print('done!')
 

我查看了这些链接,但没有找到答案

I looked at these links but haven't found an answer

  1. https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor/blob/master/02_deploy_and_monitor/deploy_and_monitor.ipynb
  2. 如何在 Python 中的 Sagemaker 的 XGBoost 训练作业中指定 content_type?
  3. https://github.com/aws/amazon-sagemaker-examples/issues/729

推荐答案

首先,请确定您使用的是哪个 SDK 版本.AWS 在 1.x 和 2.x 之间进行了重大更改.更糟糕的是,笔记本实例上的 sagemaker SDK 可能与 sagemaker studio 中的不同,具体取决于地区.

First, please make sure which SDK version you are using. AWS made breaking changes between 1.x and 2.x. Even worse, the sagemaker SDK on notebook instance can be different from that in the sagemaker studio depending on the regions.

请参阅 如何在 Sagemaker 2 中使用序列化器和反序列化器 以及 AWS 更改了序列化/反序列化内容.

Please see How to use Serializer and Deserializer in Sagemaker 2 as well as AWS changed serialize/deserialize stuff.

输入数据的序列化和结果数据的反序列化行为可以通过初始化参数进行配置.

Behavior for serialization of input data and deserialization of result data can be configured through initializer arguments.

  • class sagemaker.serializers.CSVSerializer(content_type='text/csv')
  • 请尝试:

    from sagemaker.serializers import CSVSerializer
    predictor.serializer = CSVSerializer()
    

    或者通过将序列化器设置为 None,您可以完全控制代码中的序列化/反序列化.

    Or by setting None to the serializer, you can have full control over serialize/deserialize in your code.

    predictor.serializer=None
    

    这篇关于Sagemaker:如何在预测器中设置 content_type(Sagemake > 2.0)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆