星火给予随机的结果MlLib线性回归（线性最小二乘） [英] Spark MlLib linear regression (Linear least squares) giving random results

查看：208 发布时间：2016/5/22 15:18:31 apache-spark machine-learning apache-spark-mllib

本文介绍了星火给予随机的结果MlLib线性回归（线性最小二乘）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

林在新的火花和机器学习一般。
我跟了成功的一些Mllib教程，我不能得到这个工作：

Im new in spark and Machine learning in general. I have followed with success some of the Mllib tutorials, i can't get this one working:

我发现样品code在这里：
<一href=\"https://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression\" rel=\"nofollow\">https://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression

i found the sample code here : https://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression

（第LinearRegressionWithSGD）

(section LinearRegressionWithSGD)

这里是code：

import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.LinearRegressionModel
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/lpsa.data")
val parsedData = data.map { line =>
  val parts = line.split(',')
  LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}.cache()

// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + MSE)

// Save and load model
model.save(sc, "myModelPath")
val sameModel = LinearRegressionModel.load(sc, "myModelPath")

（这正是的是网站）

(that's exactly what's is on the website)

的结果是

训练均方误差= 6.2087803138063045

和

valuesAndPreds.collect

给

    Array[(Double, Double)] = Array((-0.4307829,-1.8383286021929077),
 (-0.1625189,-1.4955700806407322), (-0.1625189,-1.118820892849544), 
(-0.1625189,-1.6134108278724875), (0.3715636,-0.45171266551058276), 
(0.7654678,-1.861316066986158), (0.8544153,-0.3588282725617985), 
(1.2669476,-0.5036812148225209), (1.2669476,-1.1534698170911792), 
(1.2669476,-0.3561392231695041), (1.3480731,-0.7347031705813306), 
(1.446919,-0.08564658011814863), (1.4701758,-0.656725375080344), 
(1.4929041,-0.14020483324910105), (1.5581446,-1.9438858658143454), 
(1.5993876,-0.02181165554398845), (1.6389967,-0.3778677315868635), 
(1.6956156,-1.1710092824030043), (1.7137979,0.27583044213064634), 
(1.8000583,0.7812664902440078), (1.8484548,0.94605507153074), 
(1.8946169,-0.7217282082851512), (1.9242487,-0.24422843221437684),...

我的问题这里是predictions看起来完全随机的（和错误的），并自该网站的例子的完美复制，用同样的输入数据（训练集），我不知道去哪里找，我我失去了一些东西？

My problem here is predictions looks totally random (and wrong), and since its the perfect copy of the website example, with the same input data (training set), i don't know where to look, am i missing something ?

请给我去哪里寻找一些建议或线索，我可以阅读和实验。

Please give me some advices or clue about where to search, i can read and experiment.

感谢

推荐答案

线性回归SGD基础，需要调整步长，看的 http://spark.apache.org/docs/latest/mllib-optimization.html 了解更多详情。

Linear Regression is SGD based and requires tweaking the step size, see http://spark.apache.org/docs/latest/mllib-optimization.html for more details.

在你的榜样，如果将步长0.1你获得更好的结果（MSE = 0.5）。

In your example, if you set the step size to 0.1 you get better results (MSE = 0.5).

import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.LinearRegressionModel
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/lpsa.data")
val parsedData = data.map { line =>
  val parts = line.split(',')
  LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}.cache()

// Build the model
var regression = new LinearRegressionWithSGD().setIntercept(true)
regression.optimizer.setStepSize(0.1)
val model = regression.run(parsedData)

// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + MSE)

有关更现实的数据集的另一个示例，请参阅

For another example on a more realistic dataset, see

<一个href=\"https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/datasets/winequalityred_linearregression.md\" rel=\"nofollow\">https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/datasets/winequalityred_linearregression.md

<一个href=\"https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearregression_winequalityred.scala\" rel=\"nofollow\">https://github.com/selvinsource/spark-pmml-exporter-validator/blob/master/src/main/resources/spark_shell_exporter/linearregression_winequalityred.scala

这篇关于星火给予随机的结果MlLib线性回归（线性最小二乘）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火给予随机的结果MlLib线性回归（线性最小二乘） [英] Spark MlLib linear regression (Linear least squares) giving random results

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

星火给予随机的结果MlLib线性回归（线性最小二乘） [英] Spark MlLib linear regression (Linear least squares) giving random results

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭