什么是保存在星火\\ PySpark \\负荷模型的正确方法 [英] What is the right way to save\load models in Spark\PySpark

查看：344 发布时间：2016/5/22 15:59:18 python apache-spark pyspark apache-spark-mllib

本文介绍了什么是保存在星火\\ PySpark \\负荷模型的正确方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用PySpark和MLlib星火1.3.0工作，我需要保存和载入我的模型。我用code这样（从官方文档拍摄）

I'm working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code like this (taken from the official documentation )

from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating

data = sc.textFile("data/mllib/als/test.data")
ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
rank = 10
numIterations = 20
model = ALS.train(ratings, rank, numIterations)
testdata = ratings.map(lambda p: (p[0], p[1]))
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
predictions.collect() # shows me some predictions
model.save(sc, "model0")

# Trying to load saved model and work with it
model0 = MatrixFactorizationModel.load(sc, "model0")
predictions0 = model0.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))

在我尝试使用model0我弄了半天回溯，这只能到此为止：

After I try to use model0 I get a long traceback, which ends with this:

Py4JError: An error occurred while calling o70.predict. Trace:
py4j.Py4JException: Method predict([class org.apache.spark.api.java.JavaRDD]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
    at py4j.Gateway.invoke(Gateway.java:252)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)

所以我的问题是 - 我做错了什么？至于我调试我的模型存储（本地和HDFS）和它们所包含的一些数据多个文件。我有一个模型是正确保存，但可能他们没有正确加载的感觉。我也用Google搜索周围却一无所获相关。

So my question is - am I doing something wrong? As far as I debugged my models are stored (locally and on HDFS) and they contain many files with some data. I have a feeling that models are saved correctly but probably they aren't loaded correctly. I also googled around but found nothing related.

看起来像这样保存\\加载功能已被添加在近期因为这个星火1.3.0，我有另外一个问题 - 什么是推荐的方式发布1.3.0之前保存\\负荷模型？我没有发现任何好的方法可以做到这一点，至少对Python的。我也试过味酸，但这里保存阿帕奇星火mllib描述面临着同样的问题模型蟒蛇

Looks like this save\load feature has been added recently in Spark 1.3.0 and because of this I have another question - what was the recommended way to save\load models before the release 1.3.0? I haven't found any nice ways to do this, at least for Python. I also tried Pickle, but faced with the same issues as described here Save Apache Spark mllib model in python

什么是保存在星火\\ PySpark \\负荷模型的正确方法 [英] What is the right way to save\load models in Spark\PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

什么是保存在星火\\ PySpark \\负荷模型的正确方法 [英] What is the right way to save\load models in Spark\PySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭