为什么LogisticRegressionModel失败在LIBSVM数据的得分？ [英] Why does LogisticRegressionModel fail at scoring of libsvm data?

查看：1343 发布时间：2016/5/22 15:18:58 apache-spark apache-spark-mllib apache-spark-ml

本文介绍了为什么LogisticRegressionModel失败在LIBSVM数据的得分？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

装入您想评分的数据。的数据存储在LIBSVM格式
  以下列方式：标签索引1：数值索引2：值2 ...（中
  按升序指数一基础，）这是样本数据结果
  100 10：1 11：1 208：1 400：1 1830：1。
  VAL unseendata：RDD [LabeledPoint] = MLUtils.loadLibSVMFile（SC，unseendatafileName）
    VAL scores_path = results_base + run_id +/+-scores
//加载保存的模型
    VAL LRM = LogisticRegressionModel.load（SC，logisticregressionmodels /为MyModel）    //我救了使用Save方法培训后的模型。下面是该模型的使用Metadate LogisticRegressionModel /为MyModel /元/兼职00000
{\"class\":\"org.apache.spark.mllib.classification.LogisticRegressionModel\",\"version\":\"1.0\",\"numFeatures\":176894,\"numClasses\":2}      //评估对看不见的数据模型
       VAR valuesAnd preDS = {unseendata.map点=＆GT;
       VAR prediction = LRM。predict（point.features）
        （point.label，prediction）
    }//存储分数
    valuesAnd preds.saveAsTextFile（scores_path）
 
下面是我得到的错误信息：
16/04/28 10点22分07秒WARN TaskSetManager：在第一阶段3.0迷失任务0.0（TID
  5，）：java.lang.IllegalArgumentException异常：
  在需求未能阶preDEF $ .require（predef.scala：221）。在
  org.apache.spark.mllib.classification.LogisticRegressionModel.$p$pdictPoint(LogisticRegression.scala:105)
    在
  org.apache.spark.mllib.regression.GeneralizedLinearModel.$p$pdict(GeneralizedLinearAlgorithm.scala:76)

解决方案

在code抛出异常是要求（dataMatrix.size == numFeatures）。
我的猜测是模型拟合与 176894 功能（见numFeatures：176894 在模型的输出），而LIBSVM文件只有 1830 功能。这些数字必须匹配。
更改在这里装载LIBSVM是行：
  VAL unseendata = MLUtils.loadLibSVMFile（SC，unseendatafileName，176894）
 
Load the data that you want score. The data is stored in libsvm format in the following manner: label index1:value1 index2:value2 ... (the indices are one-based and in ascending order) Here is the sample data
100 10:1 11:1 208:1 400:1 1830:1
 val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
    val scores_path = results_base + run_id + "/"  + "-scores"
// Load the saved model
    val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")

    // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}

      // Evaluate model on unseen data
       var valuesAndPreds = unseendata.map { point =>
       var prediction = lrm.predict(point.features)
        (point.label, prediction)
    }

// Store the scores
    valuesAndPreds.saveAsTextFile(scores_path)
Here is the error message that I get:

16/04/28 10:22:07 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, ): java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105) at org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)

解决方案
The code that throws the exception is require(dataMatrix.size == numFeatures).

My guess is that the model was fit with 176894 features (see "numFeatures":176894 in the output of the model) while the libsvm file has only 1830 features. The numbers must match.

Change the line where you load libsvm to be:
val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)
这篇关于为什么LogisticRegressionModel失败在LIBSVM数据的得分？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么LogisticRegressionModel失败在LIBSVM数据的得分？ [英] Why does LogisticRegressionModel fail at scoring of libsvm data?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么LogisticRegressionModel失败在LIBSVM数据的得分？ [英] Why does LogisticRegressionModel fail at scoring of libsvm data?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭