Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心 [英] Spark not utilizing all the core while running LinearRegressionwithSGD

查看：39 发布时间：2021/11/14 21:03:16 apache-spark apache-spark-mllib

本文介绍了Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在本地机器上运行 Spark(16G，8 个 cpu 内核).我试图在大小为 300MB 的数据集上训练线性回归模型.我检查了 cpu 统计信息以及正在运行的程序，它只执行一个线程.文档说他们已经实现了 SGD 的分布式版本.http://spark.apache.org/docs/latest/mllib-linear-methods.html#implementation-developer

I am running Spark on my local machine (16G,8 cpu cores). I was trying to train linear regression model on dataset of size 300MB. I checked the cpu statistics and also the programs running, it just executes one thread. The documentation says they have implemented distributed version of SGD. http://spark.apache.org/docs/latest/mllib-linear-methods.html#implementation-developer

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel
from pyspark import SparkContext


def parsePoint(line):
  values = [float(x) for x in line.replace(',', ' ').split(' ')]
  return LabeledPoint(values[0], values[1:])

sc = SparkContext("local", "Linear Reg Simple")
data = sc.textFile("/home/guptap/Dropbox/spark_opt/test.txt")
data.cache()
parsedData = data.map(parsePoint)


model = LinearRegressionWithSGD.train(parsedData)

valuesAndPreds = parsedData.map(lambda p: (p.label,model.predict(p.features)))
MSE = valuesAndPreds.map(lambda (v, p): (v - p)**2).reduce(lambda x, y: x + y) / valuesAndPreds.count()
print("Mean Squared Error = " + str(MSE))


model.save(sc, "myModelPath")
sameModel = LinearRegressionModel.load(sc, "myModelPath")

Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心 [英] Spark not utilizing all the core while running LinearRegressionwithSGD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心 [英] Spark not utilizing all the core while running LinearRegressionwithSGD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭