为什么LogisticRegressionModel失败在LIBSVM数据的得分? [英] Why does LogisticRegressionModel fail at scoring of libsvm data?

查看:1343
本文介绍了为什么LogisticRegressionModel失败在LIBSVM数据的得分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  

装入您想评分的数据。的数据存储在LIBSVM格式
  以下列方式:标签索引1:数值索引2:值2 ...(中
  按升序指数一基础,)这是样本数据结果
  100 10:1 11:1 208:1 400:1 1830:1。


  VAL unseendata:RDD [LabeledPoint] = MLUtils.loadLibSVMFile(SC,unseendatafileName)
    VAL scores_path = results_bas​​e + run_id +/+-scores
//加载保存的模型
    VAL LRM = LogisticRegressionModel.load(SC,logisticregressionmodels /为MyModel)    //我救了使用Save方法培训后的模型。下面是该模型的使用Metadate LogisticRegressionModel /为MyModel /元/兼职00000
{\"class\":\"org.apache.spark.mllib.classification.LogisticRegressionModel\",\"version\":\"1.0\",\"numFeatures\":176894,\"numClasses\":2}      //评估对看不见的数据模型
       VAR values​​And preDS = {unseendata.map点=>
       VAR prediction = LRM。predict(point.features)
        (point.label,prediction)
    }//存储分数
    values​​And preds.saveAsTextFile(scores_path)

下面是我得到的错误信息:


  

16/04/28 10点22分07秒WARN TaskSetManager:在第一阶段3.0迷失任务0.0(TID
  5,):java.lang.IllegalArgumentException异常:
  在需求未能阶preDEF $ .require(predef.scala:221)。在
  org.apache.spark.mllib.classification.LogisticRegressionModel.$p$pdictPoint(LogisticRegression.scala:105)
    在
  org.apache.spark.mllib.regression.GeneralizedLinearModel.$p$pdict(GeneralizedLinearAlgorithm.scala:76)



解决方案

在code抛出异常是要求(dataMatrix.size == numFeatures)

我的猜测是模型拟合与 176894 功能(见numFeatures:176894 在模型的输出),而LIBSVM文件只有 1830 功能。这些数字必须匹配。

更改在这里装载LIBSVM是行:

  VAL unseendata = MLUtils.loadLibSVMFile(SC,unseendatafileName,176894)

Load the data that you want score. The data is stored in libsvm format in the following manner: label index1:value1 index2:value2 ... (the indices are one-based and in ascending order) Here is the sample data
100 10:1 11:1 208:1 400:1 1830:1

 val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
    val scores_path = results_base + run_id + "/"  + "-scores"
// Load the saved model
    val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")

    // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}

      // Evaluate model on unseen data
       var valuesAndPreds = unseendata.map { point =>
       var prediction = lrm.predict(point.features)
        (point.label, prediction)
    }

// Store the scores
    valuesAndPreds.saveAsTextFile(scores_path)

Here is the error message that I get:

16/04/28 10:22:07 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, ): java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105) at org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)

解决方案

The code that throws the exception is require(dataMatrix.size == numFeatures).

My guess is that the model was fit with 176894 features (see "numFeatures":176894 in the output of the model) while the libsvm file has only 1830 features. The numbers must match.

Change the line where you load libsvm to be:

val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)

这篇关于为什么LogisticRegressionModel失败在LIBSVM数据的得分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆