Google Cloud ML引擎scikit学习预测概率'predict_proba()' [英] Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()'
问题描述
Google Cloud ML-engine 支持部署scikit-learn的功能 Pipeline
对象.例如,文本分类Pipeline
可能如下所示,
Google Cloud ML-engine supports the ability to deploy scikit-learn Pipeline
objects. For example a text classification Pipeline
could look like the following,
classifier = Pipeline([
('vect', CountVectorizer()),
('clf', naive_bayes.MultinomialNB())])
可以训练分类器,
classifier.fit(train_x, train_y)
然后可以将分类器上载到Google Cloud Storage,
Then the classifier can be uploaded to Google Cloud Storage,
model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)
然后 Model
和 Google云控制台,或
Then a Model
and Version
can be created, either through the Google Cloud Console, or programmatically, linking the 'model.joblib'
file to the Version
.
此分类器随后可通过调用已部署的模型predict
端点来用于预测新数据
This classifier can then be used to predict new data by calling the deployed model predict
endpoint,
ml = discovery.build('ml','v1')
project_id = 'projects/{}/models/{}'.format(project_name, model_name)
if version_name is not None:
project_id += '/versions/{}'.format(version_name)
request_dict = {'instances':['Test data']}
ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()
Google Cloud ML引擎调用 predict
函数,并返回预测的类.但是,我希望能够返回置信度分数.通常,这可以通过调用
The Google Cloud ML-engine calls the predict
function of the classifier and returns the predicted class. However, I would like to be able to return the confidence score. Normally this could be achieved by calling the predict_proba
function of the classier, however there doesn't seem to be the option to change the called function. My question is: Is it possible to return the confidence score for a scikit-learn classifier when using the Google Cloud ML-engine? If not, would you have any recommendations as to how else to achieve this result?
更新:
我找到了一个骇人听闻的解决方案.它涉及使用自己的predict_proba
函数覆盖分类器的predict
函数
Update:
I've found a hacky solution. It involved overwriting the predict
function of the classifier with its own predict_proba
function,
nb = naive_bayes.MultinomialNB()
nb.predict = nb.predict_proba
classifier = Pipeline([
('vect', CountVectorizer()),
('clf', nb)])
这令人惊讶.如果有人知道更整洁的解决方案,请告诉我.
Surprisingly this works. If anyone knows of a neater solution then please let me know.
Update: Google have released a new feature (currently in beta) called Custom prediction routines
. This allows you to define what code is run when a prediction request comes in. It adds more code to the solution, but it certainly less hacky.
推荐答案
您正在使用的ML Engine API仅具有预测方法,如您在
The ML Engine API you are using, only has the predict method, as you can see in the documentation, so it will only do the prediction (unless you force it to do something else with the hack you mentioned).
如果您想对经过训练的模型进行其他操作,则必须加载并正常使用.如果要使用存储在Cloud Storage中的模型,可以执行以下操作:
If you want to do something else with your trained model, you’ll have to load it and use it normally. If you want to use the model stored in Cloud Storage you can do something like:
from google.cloud import storage
from sklearn.externals import joblib
bucket_name = "<BUCKET_NAME>"
gs_model = "path/to/model.joblib" # path in your Cloud Storage bucket
local_model = "/path/to/model.joblib" # path in your local machine
client = storage.Client()
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gs_model)
blob.download_to_filename(local_model)
model = joblib.load(local_model)
model.predict_proba(test_data)
这篇关于Google Cloud ML引擎scikit学习预测概率'predict_proba()'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!