如何在Sagemaker的Python的XGBoost培训工作中指定content_type? [英] How can I specify content_type in a training job of XGBoost from Sagemaker in Python?
问题描述
我正在尝试使用sagemaker库训练模型.到目前为止,我的代码如下:
I am trying to train a model using the sagemaker library. So far, my code is the following:
container = get_image_uri(boto3.Session().region_name,
'xgboost',
repo_version='0.90-1')
estimator = sagemaker.estimator.Estimator(container,
role = 'AmazonSageMaker-ExecutionRole-20190305TXXX',
train_instance_count = 1,
train_instance_type = 'ml.m4.2xlarge',
output_path = 's3://antifraud/production/',
hyperparameters = {'num_rounds':'400',
'objective':'binary:logistic',
'eval_metric':'error@0.1'})
train_config = training_config(estimator=estimator,
inputs = {'train':'s3://antifraud/production/train',
'validation':'s3://-antifraud/production/validation'})
我在解析超参数时遇到错误.此命令为我在控制台中提供了配置JSON输出.我已经能够使用配置为Json的boto3来运行培训工作,所以我发现我的代码生成的json配置中缺少的是content_type参数,该参数应如下所示:>
And I get an error parsing the hyperparameters. This commands gives me a configuration JSON output in the console. I have been able to run a training job using boto3 with the configuration as Json, so I have figured out that the thing I am missing in my json configuration generated by my code is the content_type parameter, which should be there as follow:
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://antifraud/production/data/train",
"S3DataDistributionType": "FullyReplicated"
}
},
"ContentType": "text/csv",
"CompressionType": "None"
},
{
"ChannelName": "validation",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://antifraud/production/validation",
"S3DataDistributionType": "FullyReplicated"
}
},
"ContentType": "text/csv",
"CompressionType": "None"
}
]
我尝试在容器,估计器和train_config中将content_type ='text/csv'编码为参数,并在输入内部将其编码为字典的另一个键,但没有成功.我该怎么做?
I have tried coding content_type = 'text/csv' in container, estimator and train_config as parameter and also inside inputs as another key of the dictionary, with no success. How could I make this work?
推荐答案
我已经使用s3_input对象解决了它:
I have solved it using s3_input objects:
s3_input_train = sagemaker.s3_input(s3_data='s3://antifraud/production/data/{domain}-{product}-{today}/train_data.csv',
content_type='text/csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://antifraud/production/data/{domain}-{product}-{today}/validation_data.csv',
content_type='text/csv')
train_config = training_config(estimator=estimator,
inputs = {'train':s3_input_train,
'validation':s3_input_validation})
这篇关于如何在Sagemaker的Python的XGBoost培训工作中指定content_type?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!