Google Cloud ML引擎scikit学习预测概率'predict_proba()' [英] Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()'

查看:122
本文介绍了Google Cloud ML引擎scikit学习预测概率'predict_proba()'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google Cloud ML-engine 支持部署scikit-learn的功能 Pipeline 对象.例如,文本分类Pipeline可能如下所示,

Google Cloud ML-engine supports the ability to deploy scikit-learn Pipeline objects. For example a text classification Pipeline could look like the following,

classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', naive_bayes.MultinomialNB())])

可以训练分类器,

classifier.fit(train_x, train_y)

然后可以将分类器上载到Google Cloud Storage,

Then the classifier can be uploaded to Google Cloud Storage,

model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)

然后 Model Google云控制台,或

Then a Model and Version can be created, either through the Google Cloud Console, or programmatically, linking the 'model.joblib' file to the Version.

此分类器随后可通过调用已部署的模型predict端点来用于预测新数据

This classifier can then be used to predict new data by calling the deployed model predict endpoint,

ml = discovery.build('ml','v1')
project_id = 'projects/{}/models/{}'.format(project_name, model_name)
if version_name is not None:
    project_id += '/versions/{}'.format(version_name)
request_dict = {'instances':['Test data']}
ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()

Google Cloud ML引擎调用 predict 函数,并返回预测的类.但是,我希望能够返回置信度分数.通常,这可以通过调用

The Google Cloud ML-engine calls the predict function of the classifier and returns the predicted class. However, I would like to be able to return the confidence score. Normally this could be achieved by calling the predict_proba function of the classier, however there doesn't seem to be the option to change the called function. My question is: Is it possible to return the confidence score for a scikit-learn classifier when using the Google Cloud ML-engine? If not, would you have any recommendations as to how else to achieve this result?

更新: 我找到了一个骇人听闻的解决方案.它涉及使用自己的predict_proba函数覆盖分类器的predict函数

Update: I've found a hacky solution. It involved overwriting the predict function of the classifier with its own predict_proba function,

nb = naive_bayes.MultinomialNB()
nb.predict = nb.predict_proba
classifier = Pipeline([
('vect', CountVectorizer()), 
('clf', nb)])

这令人惊讶.如果有人知道更整洁的解决方案,请告诉我.

Surprisingly this works. If anyone knows of a neater solution then please let me know.

更新:Google发布了一项名为

Update: Google have released a new feature (currently in beta) called Custom prediction routines. This allows you to define what code is run when a prediction request comes in. It adds more code to the solution, but it certainly less hacky.

推荐答案

您正在使用的ML Engine API仅具有预测方法,如您在

The ML Engine API you are using, only has the predict method, as you can see in the documentation, so it will only do the prediction (unless you force it to do something else with the hack you mentioned).

如果您想对经过训练的模型进行其他操作,则必须加载并正常使用.如果要使用存储在Cloud Storage中的模型,可以执行以下操作:

If you want to do something else with your trained model, you’ll have to load it and use it normally. If you want to use the model stored in Cloud Storage you can do something like:

from google.cloud import storage
from sklearn.externals import joblib

bucket_name = "<BUCKET_NAME>"
gs_model = "path/to/model.joblib"  # path in your Cloud Storage bucket
local_model = "/path/to/model.joblib"  # path in your local machine

client = storage.Client()
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gs_model)
blob.download_to_filename(local_model)

model = joblib.load(local_model)
model.predict_proba(test_data)

这篇关于Google Cloud ML引擎scikit学习预测概率'predict_proba()'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆