误差在Apache星火培训Logistic回归模型。 SPARK-5063 [英] Error with training logistic regression model on Apache Spark. SPARK-5063

查看:793
本文介绍了误差在Apache星火培训Logistic回归模型。 SPARK-5063的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想建立与Apache星火逻辑回归模型。
这里是code。

I am trying to build a Logistic Regression model with Apache Spark. Here is the code.

parsedData = raw_data.map(mapper) # mapper is a function that generates pair of label and feature vector as LabeledPoint object
featureVectors = parsedData.map(lambda point: point.features) # get feature vectors from parsed data 
scaler = StandardScaler(True, True).fit(featureVectors) #this creates a standardization model to scale the features
scaledData = parsedData.map(lambda lp: LabeledPoint(lp.label, scaler.transform(lp.features))) #trasform the features to scale mean to zero and unit std deviation
modelScaledSGD = LogisticRegressionWithSGD.train(scaledData, iterations = 10)

但我得到这个错误:

But I get this error:

例外:看来,你正试图从广播的变量,动作或transforamtion参考SparkContext。 SparkContext只能在驱动程序中使用,而不是在code它工人运行。欲了解更多信息,请参阅SPARK-5063。

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

我不知道如何解决这个问题。任何帮助将是greately AP preciated。

I am not sure how to work around this. Any help would be greately appreciated.

推荐答案

您看到的问题是pretty大致相同,我在的如何使用Java / Scala的功能,从一个动作或一个转型?来改变你要调用函数斯卡拉,它需要访问 SparkContext 因此你看到错误。

Problem you see is pretty much the same as the one I've described in How to use Java/Scala function from an action or a transformation? To transform you have to call Scala function, and it requires access to the SparkContext hence the error you see.

来处理这个标准的方法是仅处理您的数据所需的部分,然后压缩的结果。

Standard way to handle this is to process only the required part of your data and then zip the results.

labels = parsedData.map(lambda point: point.label)
featuresTransformed = scaler.transform(featureVectors)

scaledData = (labels
    .zip(featuresTransformed)
    .map(lambda p: LabeledPoint(p[0], p[1])))

modelScaledSGD = LogisticRegressionWithSGD.train(...)

如果不打算实现基于 MLlib 组件它可以更容易使用高层次的 ML API

If don't plan to implement your own methods based on MLlib components it could easier to use high level ML API.

修改

这里有两个可能的问题。

There are two possible problems here.


  1. 在这一点上 LogisticRegressionWithSGD 只有二​​项式分类(感谢 eliasah 指出了这一点)。如果您需要多标签分类可以用 LogisticRegressionWithLBFGS 替换它。

  2. StandardScaler 仅支持密集向量,因此限制了应用程序。

  1. At this point LogisticRegressionWithSGD supports only binomial classification (Thanks to eliasah for pointing that out) . If you need multi-label classification you can replace it with LogisticRegressionWithLBFGS.
  2. StandardScaler supports only dense vectors so it has limited applications.

这篇关于误差在Apache星火培训Logistic回归模型。 SPARK-5063的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆