在python中保存Apache Spark mllib模型 [英] Save Apache Spark mllib model in python

查看:105
本文介绍了在python中保存Apache Spark mllib模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将拟合模型保存到Spark中的文件.我有一个Spark集群,可以训练RandomForest模型.我想保存拟合模型并在另一台机器上重复使用.我在网上阅读了一些建议进行Java序列化的文章.我在python中做同样的事情,但是不起作用.诀窍是什么?

I am trying to save a fitted model to a file in Spark. I have a Spark cluster which trains a RandomForest model. I would like to save and reuse the fitted model on another machine. I read some posts on the web which recommends to do java serialization. I am doing the equivalent in python but it does not work. What is the trick?

model = RandomForest.trainRegressor(trainingData, categoricalFeaturesInfo={},
                                    numTrees=nb_tree,featureSubsetStrategy="auto",
                                    impurity='variance', maxDepth=depth)
output = open('model.ml', 'wb')
pickle.dump(model,output)

我收到此错误:

TypeError: can't pickle lock objects

我正在使用Apache Spark 1.2.0.

I am using Apache Spark 1.2.0.

推荐答案

如果查看源代码,您会看到RandomForestModel继承自TreeEnsembleModel,而TreeEnsembleModel又继承自JavaSaveable类,实现save()方法,因此您可以像下面的示例一样保存模型:

If you look at the source code, you'll see that the RandomForestModel inherits from the TreeEnsembleModel which in turn inherits from JavaSaveable class that implements the save() method, so you can save your model like in the example below:

model.save([spark_context], [file_path])

因此它将使用spark_contextmodel保存到file_path中.您不能(至少到现在为止)使用Python nativle pickle来做到这一点.如果确实要这样做,则需要手动实现方法__getstate____setstate__.有关更多信息,请参见此泡菜文档.

So it will save the model into the file_path using the spark_context. You cannot use (at least until now) the Python nativle pickle to do that. If you really want to do that, you'll need to implement the methods __getstate__ or __setstate__ manually. See this pickle documentation for more information.

这篇关于在python中保存Apache Spark mllib模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆