如何通过joblib向/从S3存储桶写入/加载机器学习模型? [英] How to write/load machine learning model to/from S3 bucket through joblib?

查看:18
本文介绍了如何通过joblib向/从S3存储桶写入/加载机器学习模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 ml 模型,我想将其保存在 S3 存储桶中.

I have an ml model which I want to save on an S3 bucket.

from lightgbm.sklearn import LGBMClassifier

# Initialize model
mdl_lightgbm = LGBMClassifier(boosting_type='rf', objective='binary')

# Fit data
mdl_lightgbm.fit(X,Y)
    
# Save model to dictionary
mdl_dict = {'mdl_fitted':mdl_lightgbm}    

出于某些原因,我将拟合模型存储在字典中.这个想法是通过 joblib 向/从 S3 存储桶转储/加载模型.

For some reasons, I'm storing the fitted model in a dictionary. The idea is to dump/load the model through joblib to/from an S3 bucket.

推荐答案

将模型保存到 S3

基于这个问题的想法,下面的函数让你将模型保存到s3桶或通过joblib本地保存:

Save model to S3

Based on the idea of this question, the following function let you save the model to an s3 bucket or locally through joblib:

import boto3 
from io import BytesIO 

def write_joblib(file, path):
    ''' 
       Function to write a joblib file to an s3 bucket or local directory.
       Arguments:
       * file: The file that you want to save 
       * path: an s3 bucket or local directory path. 
    '''

    # Path is an s3 bucket
    if path[:5] == 's3://':
        s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
        s3_key = '/'.join(s3_key)
        with BytesIO() as f:
            joblib.dump(file, f)
            f.seek(0)
            boto3.client("s3").upload_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
    
    # Path is a local directory 
    else:
        with open(path, 'wb') as f:
            joblib.dump(file, f)

在您的示例中,如果您想将模型保存到 s3 存储桶,只需输入

In your example, if you want to save your model to an s3 bucket, just type

write_joblib(mdl_dict, 's3://bucket_name/mdl_dict.joblib')

此外,遵循这个问题的想法,下面的函数让你从s3存储桶或本地文件加载模型

Additionaly, following the idea on this question, the following function let's you load the model from an s3 bucket or a local file

def read_joblib(path):
    ''' 
       Function to load a joblib file from an s3 bucket or local directory.
       Arguments:
       * path: an s3 bucket or local directory path where the file is stored
       Outputs:
       * file: Joblib file loaded
    '''

    # Path is an s3 bucket
    if path[:5] == 's3://':
        s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
        s3_key = '/'.join(s3_key)
        with BytesIO() as f:
            boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
            f.seek(0)
            file = joblib.load(f)
    
    # Path is a local directory 
    else:
        with open(path, 'rb') as f:
            file = joblib.load(f)
    
    return file

在您的情况下,要从同一个 s3 存储桶加载文件,请使用以下代码行

In your case, to load the file from the same s3 bucket use the following line of code

mdl_lightgbm = read_joblib('s3://bucket_name/mdl_dict.joblib')
mdl_lightgbm = mdl_lightgbm['mdl_fitted']

这篇关于如何通过joblib向/从S3存储桶写入/加载机器学习模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆