如何获得P值logistic回归的火花mllib使用Java [英] how to get p value for logistic regression in spark mllib using java

查看:766
本文介绍了如何获得P值logistic回归的火花mllib使用Java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能获得使用Java在星火MLlib回归p值。如何找到分类类的概率。以下是code我有尝试:

  SparkConf sparkConf =新SparkConf()setAppName(GRP)setMaster(本地[*])。;
SparkContext CTX =新SparkContext(sparkConf);LabeledPoint POS =新LabeledPoint(1.0,Vectors.dense(1.0,0.0,3.0));
字符串路径=dataSetnew.txt;JavaRDD< LabeledPoint>数据= MLUtils.loadLibSVMFile(CTX,路径).toJavaRDD();
JavaRDD< LabeledPoint> [] =分裂data.randomSplit(新双[] {0.6,0.4},11L);
JavaRDD< LabeledPoint>训练=拆分[0] .cache();
JavaRDD< LabeledPoint>测试=拆分[1];最后org.apache.spark.mllib.classification.LogisticRegressionModel模型=
    新LogisticRegressionWithLBFGS()
        .setNumClasses(2)
        .setIntercept(真)
        .RUN(training.rdd());JavaRDD< Tuple2<对象,对象>> predictionAndLabels = test.map(
    新org.apache.spark.api.java.function.Function< LabeledPoint,Tuple2<对象,对象>>(){
        公共Tuple2<对象,对象>调用(LabeledPoint P){
          双prediction =模型$ P ​​$ pdict(p.features());
         //的System.out.println(prediction:+ prediction);
          返回新Tuple2&下;对象,对象>(prediction,p.label());
        }
      }
    );矢量denseVecnew = Vectors.dense(112,110,110,0,0,0,0,0,0,0,0);
双prediction =模型$ P ​​$ pdict(denseVecnew)。
矢量weightVector = model.weights();
的System.out.println(砝码+ weightVector);
的System.out.println(截距:+ model.intercept());
的System.out.println(预测+ prediction);
ctx.stop();


解决方案

有关,您可以使用 LogisticRegressionModel.clearThreshold 方法二分类。它被称为后 predict 将返回原始分数

在这里输入的形象描述

而不是标签。这些是在范围[0,1]和可PTED作为概率间$ P $

请参阅 clearThreshold 文档

How can I get p-value for logistic regression in Spark MLlib using Java. How to find the probability of the classified class. The following is the code i have tried with:

SparkConf sparkConf = new SparkConf().setAppName("GRP").setMaster("local[*]");
SparkContext ctx = new SparkContext(sparkConf);

LabeledPoint pos = new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0));
String path = "dataSetnew.txt";

JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(ctx, path).toJavaRDD();
JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[] {0.6, 0.4}, 11L);
JavaRDD<LabeledPoint> training = splits[0].cache();
JavaRDD<LabeledPoint> test = splits[1];   

final org.apache.spark.mllib.classification.LogisticRegressionModel model = 
    new LogisticRegressionWithLBFGS()
        .setNumClasses(2)
        .setIntercept(true)
        .run(training.rdd());    

JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
    new org.apache.spark.api.java.function.Function<LabeledPoint, Tuple2<Object, Object>>() {
        public Tuple2<Object, Object> call(LabeledPoint p) {
          Double prediction = model.predict(p.features());
         // System.out.println("prediction :"+prediction);
          return new Tuple2<Object, Object>(prediction, p.label());
        }
      }
    );   

Vector denseVecnew = Vectors.dense(112,110,110,0,0,0,0,0,0,0,0);
Double prediction = model.predict(denseVecnew);
Vector weightVector = model.weights();          
System.out.println("weights : "+weightVector);           
System.out.println("intercept : "+model.intercept());       
System.out.println("forecast"+ prediction);    
ctx.stop();

解决方案

For binary classification you can use LogisticRegressionModel.clearThreshold method. After it is called predict will return raw scores

instead of labels. These are in range [0, 1] and can be interpreted as probabilities.

See clearThreshold docs.

这篇关于如何获得P值logistic回归的火花mllib使用Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆