如何在Spark中使用Sklearn模型进行预测? [英] How to do prediction with Sklearn Model inside Spark?

查看：974 发布时间：2020/9/4 8:02:08 python apache-spark scikit-learn pyspark apache-spark-mllib

本文介绍了如何在Spark中使用Sklearn模型进行预测?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经使用sklearn在python中训练了一个模型.我们如何使用相同的模型加载Spark并在Spark RDD上生成预测?

I have trained a model in python using sklearn. How we can use same model to load in Spark and generate predictions on a spark RDD ?

推荐答案

嗯，

我将在Sklearn中显示一个线性回归的示例，并向您展示如何使用它来预测Spark RDD中的元素.

I will show an example of linear regression in Sklearn and show you how to use that to predict elements in Spark RDD.

首先使用sklearn示例训练模型:

First training the model with sklearn example:

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

在这里，我们很合适，您需要根据RDD预测每个数据.

Here we just have the fit, and you need to predict each data from an RDD.

在这种情况下，您的RDD应该是带有X的RDD，如下所示:

Your RDD in this case should be a RDD with X like this:

rdd = sc.parallelize([1, 2, 3, 4])

因此，您首先需要广播sklearn模型:

So you first need to broadcast your model of sklearn:

regr_bc = self.sc.broadcast(regr)

然后您可以使用它来预测数据，如下所示:

Then you can use it to predict your data like this:

rdd.map(lambda x: (x, regr_bc.value.predict(x))).collect()

因此，RDD中的元素是X，而seccond元素将是您预测的Y.collect将返回类似以下内容:

So your element in the RDD is your X and the seccond element is going to be your predicted Y. The collect will return somthing like this:

[(1, 2), (2, 4), (3, 6), ...]

这篇关于如何在Spark中使用Sklearn模型进行预测?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Spark中使用Sklearn模型进行预测? [英] How to do prediction with Sklearn Model inside Spark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Spark中使用Sklearn模型进行预测? [英] How to do prediction with Sklearn Model inside Spark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭