为什么LogisticRegressionModel失败在LIBSVM数据的得分? [英] Why does LogisticRegressionModel fail at scoring of libsvm data?
问题描述
装入您想评分的数据。的数据存储在LIBSVM格式
以下列方式:标签索引1:数值索引2:值2 ...(中
按升序指数一基础,)这是样本数据结果
100 10:1 11:1 208:1 400:1 1830:1。
块引用>VAL unseendata:RDD [LabeledPoint] = MLUtils.loadLibSVMFile(SC,unseendatafileName)
VAL scores_path = results_base + run_id +/+-scores
//加载保存的模型
VAL LRM = LogisticRegressionModel.load(SC,logisticregressionmodels /为MyModel) //我救了使用Save方法培训后的模型。下面是该模型的使用Metadate LogisticRegressionModel /为MyModel /元/兼职00000
{\"class\":\"org.apache.spark.mllib.classification.LogisticRegressionModel\",\"version\":\"1.0\",\"numFeatures\":176894,\"numClasses\":2} //评估对看不见的数据模型
VAR valuesAnd preDS = {unseendata.map点=>
VAR prediction = LRM。predict(point.features)
(point.label,prediction)
}//存储分数
valuesAnd preds.saveAsTextFile(scores_path)下面是我得到的错误信息:
16/04/28 10点22分07秒WARN TaskSetManager:在第一阶段3.0迷失任务0.0(TID
5,):java.lang.IllegalArgumentException异常:
在需求未能阶preDEF $ .require(predef.scala:221)。在
org.apache.spark.mllib.classification.LogisticRegressionModel.$p$pdictPoint(LogisticRegression.scala:105)
在
org.apache.spark.mllib.regression.GeneralizedLinearModel.$p$pdict(GeneralizedLinearAlgorithm.scala:76)
块引用>解决方案在code抛出异常是
要求(dataMatrix.size == numFeatures)
。我的猜测是模型拟合与
176894
功能(见numFeatures:176894
在模型的输出),而LIBSVM文件只有1830
功能。这些数字必须匹配。更改在这里装载LIBSVM是行:
VAL unseendata = MLUtils.loadLibSVMFile(SC,unseendatafileName,176894)
Load the data that you want score. The data is stored in libsvm format in the following manner: label index1:value1 index2:value2 ... (the indices are one-based and in ascending order) Here is the sample data
100 10:1 11:1 208:1 400:1 1830:1
val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName) val scores_path = results_base + run_id + "/" + "-scores" // Load the saved model val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel") // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000 {"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2} // Evaluate model on unseen data var valuesAndPreds = unseendata.map { point => var prediction = lrm.predict(point.features) (point.label, prediction) } // Store the scores valuesAndPreds.saveAsTextFile(scores_path)
Here is the error message that I get:
16/04/28 10:22:07 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, ): java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105) at org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)
解决方案The code that throws the exception is
require(dataMatrix.size == numFeatures)
.My guess is that the model was fit with
176894
features (see"numFeatures":176894
in the output of the model) while the libsvm file has only1830
features. The numbers must match.Change the line where you load libsvm to be:
val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)
这篇关于为什么LogisticRegressionModel失败在LIBSVM数据的得分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!