Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心 [英] Spark not utilizing all the core while running LinearRegressionwithSGD

查看:39
本文介绍了Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在本地机器上运行 Spark(16G,8 个 cpu 内核).我试图在大小为 300MB 的数据集上训练线性回归模型.我检查了 cpu 统计信息以及正在运行的程序,它只执行一个线程.文档说他们已经实现了 SGD 的分布式版本.http://spark.apache.org/docs/latest/mllib-linear-methods.html#implementation-developer

I am running Spark on my local machine (16G,8 cpu cores). I was trying to train linear regression model on dataset of size 300MB. I checked the cpu statistics and also the programs running, it just executes one thread. The documentation says they have implemented distributed version of SGD. http://spark.apache.org/docs/latest/mllib-linear-methods.html#implementation-developer

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel
from pyspark import SparkContext


def parsePoint(line):
  values = [float(x) for x in line.replace(',', ' ').split(' ')]
  return LabeledPoint(values[0], values[1:])

sc = SparkContext("local", "Linear Reg Simple")
data = sc.textFile("/home/guptap/Dropbox/spark_opt/test.txt")
data.cache()
parsedData = data.map(parsePoint)


model = LinearRegressionWithSGD.train(parsedData)

valuesAndPreds = parsedData.map(lambda p: (p.label,model.predict(p.features)))
MSE = valuesAndPreds.map(lambda (v, p): (v - p)**2).reduce(lambda x, y: x + y) / valuesAndPreds.count()
print("Mean Squared Error = " + str(MSE))


model.save(sc, "myModelPath")
sameModel = LinearRegressionModel.load(sc, "myModelPath")

推荐答案

我认为您想要做的是明确说明与本地上下文一起使用的内核数量.从评论中可以看出 here"local"(这就是你正在做的)在一个线程上实例化一个上下文,而 "local[4]" 将以 4 核运行.我相信您也可以使用 "local[*]" 在系统上的所有内核上运行.

I think what you want to do is explicitly state the number of cores to use with the local context. As you can see from the comments here, "local" (which is what you're doing) instantiates a context on one thread whereas "local[4]" will run with 4 cores. I believe you can also use "local[*]" to run on all cores on your system.

这篇关于Spark 在运行 LinearRegressionwithSGD 时没有利用所有核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆