如何使用Apache火花MLlib的线性回归? [英] how to use the linear regression of MLlib of apache spark?

查看:195
本文介绍了如何使用Apache火花MLlib的线性回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的apache的火花,并从MLlib的文件,我发现斯卡拉的例子,但我真的不知道斯卡拉,是任何人都知道Java中的一个例子吗?谢谢!这个例子code是

I'm new to the apache spark, and from the document of MLlib, i found a example of scala, but i really don't know scala, is anyone knows a example in java? thanks! the example code is

import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint

// Load and parse the data
val data = sc.textFile("mllib/data/ridge-data/lpsa.data")
val parsedData = data.map { line =>
  val parts = line.split(',')
  LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray)
}

// Building the model
val numIterations = 20
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ +     _)/valuesAndPreds.count
println("training Mean Squared Error = " + MSE)

MLlib 的文件
谢谢!

from the document of MLlib thanks!

推荐答案

随着文档的说明:

所有的MLlib的方法使用Java友好的类型,因此你可以导入和
  叫他们那里,你在斯卡拉做同样的方式。唯一需要注意的是,
  方法采取斯卡拉RDD对象,而星火的Java API使用
  独立JavaRDD类。您可以通过一个Java RDD转换为斯卡拉1
  调用.rdd()你JavaRDD的对象。

All of MLlib’s methods use Java-friendly types, so you can import and call them there the same way you do in Scala. The only caveat is that the methods take Scala RDD objects, while the Spark Java API uses a separate JavaRDD class. You can convert a Java RDD to a Scala one by calling .rdd() on your JavaRDD object.

这是不容易的,因为你还是要重现在Java中阶code,但它的工作原理(至少在这种情况下)。

This is not easy, since you still have to reproduce the scala code in java, but it works (at least in this case).

话说回来,这里是一个java实现的:

Having said that, here is a java implementation :

public void linReg() {
    String master = "local";
    SparkConf conf = new SparkConf().setAppName("csvParser").setMaster(
            master);
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> data = sc.textFile("mllib/data/ridge-data/lpsa.data");
    JavaRDD<LabeledPoint> parseddata = data
            .map(new Function<String, LabeledPoint>() {
            // I see no ways of just using a lambda, hence more verbosity than with scala
                @Override
                public LabeledPoint call(String line) throws Exception {
                    String[] parts = line.split(",");
                    String[] pointsStr = parts[1].split(" ");
                    double[] points = new double[pointsStr.length];
                    for (int i = 0; i < pointsStr.length; i++)
                        points[i] = Double.valueOf(pointsStr[i]);
                    return new LabeledPoint(Double.valueOf(parts[0]),
                            Vectors.dense(points));
                }
            });

    // Building the model
    int numIterations = 20;
    LinearRegressionModel model = LinearRegressionWithSGD.train(
    parseddata.rdd(), numIterations); // notice the .rdd()

    // Evaluate model on training examples and compute training error
    JavaRDD<Tuple2<Double, Double>> valuesAndPred = parseddata
            .map(point -> new Tuple2<Double, Double>(point.label(), model
                    .predict(point.features())));
    // important point here is the Tuple2 explicit creation.

    double MSE = valuesAndPred.mapToDouble(
            tuple -> Math.pow(tuple._1 - tuple._2, 2)).mean();
    // you can compute the mean with this function, which is much easier
    System.out.println("training Mean Squared Error = "
            + String.valueOf(MSE));
}

有不够完善远,但我希望它会让你更好地了解如何使用上Mllib文档Scala的例子。

It is far from being perfect, but I hope it will make you understand better how to use scala examples on Mllib documentation.

这篇关于如何使用Apache火花MLlib的线性回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆