无法序列化Mleap中的逻辑回归 [英] Unable to serialize logistic regressing in mleap
问题描述
java.lang.AssertionError:断言失败:该操作仅支持二进制逻辑回归
java.lang.AssertionError: assertion failed: This op only supports binary logistic regression
我正在尝试在mleap中序列化spark管道.
I am trying to serialize a spark pipeline in mleap.
我在管道中使用了Tokenizer,HashingTF和LogisticRegression.
I am using Tokenizer, HashingTF and LogisticRegression in my pipeline.
当我尝试序列化管道时,出现上述错误. 这是我用来序列化管道的代码-
When I am trying to serialize my pipeline I get the above error. Here is the code I am using to serialize the pipeline -
val pipeline = Pipeline(pipelineConfig)
val model = pipeline.fit(data)
(for(bf <- managed(BundleFile("jar:file:/tmp/abc.model.twitter.zip"))) yield {
model.writeBundle.format(SerializationFormat.Json).save(bf).get
}).tried.get
sc.stop()
根据文档,mleap支持LR.因此,我完全不知道自己在这里可能做错了什么.
As per the documentation, LR is supported by mleap. So I am totally clueless about what I might be doing wrong here.
推荐答案
yashdosi,
MLeap默认支持Spark 2.0(很抱歉,此文档没有详细记录).在2.0中,仅支持二进制逻辑回归.随着2.1的引入,多项式逻辑回归.因为MLeap旨在支持2.0.0及更高版本,所以我们建立了一种机制来选择您使用的Spark版本(当前MLeap支持2.0和2.1,但默认为2.0).
MLeap defaults to support for Spark 2.0 (sorry this isn't well documented). In 2.0, only binary logistic regression was supported. With the introduction of 2.1 there is multinomial logistic regression. Because MLeap is meant to support 2.0.0 and up, we have built in a mechanism for selecting which version of Spark you are using (currently MLeap supports 2.0 and 2.1, but defaults to 2.0).
尝试将此行添加到资源目录中的application.conf
文件中,它将使MLeap知道在序列化时使用Spark 2.1转换器:
Try adding this line to your application.conf
file in your resources directory, it will let MLeap know to use the Spark 2.1 transformers when serializing:
// application.conf in src/main/resources
ml.combust.mleap.spark.registry.default = ${ml.combust.mleap.spark.registry.v21}
这篇关于无法序列化Mleap中的逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!