如何生成MLlib的火花(原拉布勒,predicted标签)的元组? [英] How to generate tuples of (original lable, predicted label) on Spark with MLlib?

查看:764
本文介绍了如何生成MLlib的火花(原拉布勒,predicted标签)的元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图让predictions与我从MLlib回来的星火模型。我们的目标是产生(orinalLabelInData,predictedLabel)的元组。然后将这些元组可用于模型评估目的。什么是实现这一目标的最佳途径?谢谢你。

假设parsedTrainData是LabeledPoint的RDD

 从pyspark.mllib.regression进口LabeledPoint
从pyspark.mllib.tree进口决策树,DecisionTreeModel
从pyspark.mllib.util进口MLUtilsparsedTrainData = sc.parallelize([LabeledPoint(1.0,11.0,-12.0,23.0]),
                                  LabeledPoint(3.0,[-1.0,12.0,-​​23.0])])模型= DecisionTree.trainClassifier(parsedTrainData,numClasses = 7,
categoricalFeaturesInfo = {},杂质='基尼',MAXDEPTH = 8,maxBins = 32)型号predict。(parsedTrainData.map(波长X:x.features))取(1)

这还给了predictions,但我不知道如何给每个prediction匹配返回到数据的原始标签。

我试过

  parsedTrainData.map(波长X:(x.label,dtModel predict(x.features)))。取(1)

不过,好像我的发送模式,工人的方法就是不要在这里做一个有效的事情

  /spark140/python/pyspark/context.pyc在__getnewargs __(个体经营)
    250#这种方法被称为尝试腌制SparkContext,这始终是一个错误时:
    251引发异常(
- > 252看来,你正试图从广播中引用SparkContext
    253变量,行动或transforamtion。SparkContext只能对司机使用的,
    254不是在code,这对工人运行。欲了解更多信息,请参阅SPARK-5063。例外:看来,你正试图从广播的变量,动作或transforamtion参考SparkContext。 SparkContext只能在驱动程序中使用,而不是在code它工人运行。欲了解更多信息,请参阅SPARK-5063。


解决方案

那么,根据的官方文件你可以简单地压缩predictions和标签是这样的:

  predictions =模型$ P ​​$ pdict。(parsedTrainData.map(波长X:x.features))
labelsAnd predictions = parsedTrainData.map(波长X:x.label)的.zip(predictions)

I am trying to make predictions with the model that I got back from MLlib on Spark. The goal is to generate tuples of (orinalLabelInData, predictedLabel). Then those tuples can be used for model evaluation purpose. What is the best way to achieve this? Thanks.

Assuming parsedTrainData is a RDD of LabeledPoint

from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils

parsedTrainData = sc.parallelize([LabeledPoint(1.0, [11.0,-12.0,23.0]), 
                                  LabeledPoint(3.0, [-1.0,12.0,-23.0])])

model = DecisionTree.trainClassifier(parsedTrainData, numClasses=7,
categoricalFeaturesInfo={}, impurity='gini', maxDepth=8, maxBins=32)

model.predict(parsedTrainData.map(lambda x: x.features)).take(1)

This gives back the predictions, but I am not sure how to match each prediction back to the original labels in data.

I tried

parsedTrainData.map(lambda x: (x.label, dtModel.predict(x.features))).take(1)

however, it seems like my way of sending model to worker is not a valid thing to do here

/spark140/python/pyspark/context.pyc in __getnewargs__(self)
    250         # This method is called when attempting to pickle SparkContext, which is always an error:
    251         raise Exception(
--> 252             "It appears that you are attempting to reference SparkContext from a broadcast "
    253             "variable, action, or transforamtion. SparkContext can only be used on the driver, "
    254             "not in code that it run on workers. For more information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063. 

解决方案

Well, according to the official documentation you can simply zip predictions and labels like this:

predictions = model.predict(parsedTrainData.map(lambda x: x.features))
labelsAndPredictions = parsedTrainData.map(lambda x: x.label).zip(predictions)

这篇关于如何生成MLlib的火花(原拉布勒,predicted标签)的元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆