如何在Spark中映射DataFrame以提取RowData并使用H2O Mojo模型进行预测 [英] How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

查看：118 发布时间：2020/9/4 9:06:47 scala apache-spark h2o sparkling-water

本文介绍了如何在Spark中映射DataFrame以提取RowData并使用H2O Mojo模型进行预测的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个以mojo格式保存的h2o模型，现在我尝试加载它并使用它对新数据集(df)进行预测，这是用scala编写的spark应用程序的一部分.理想情况下，我希望在现有DataFrame上添加一个新行，其中包含基于此模型的类概率.

I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset (df) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class probability based on this model.

我可以看到如何将Mojo应用于RowData格式的单个行(按照答案

I can see how to apply a mojo to an individual row already in a RowData format (as per answer here), but I am not sure how to map over an existing DataFrame so that it is in the right format to make predictions using the mojo model. I have worked with DataFrames a fair bit, but never with the underlying RDDs.

此外，该模型是否应该序列化/广播，以便可以在集群上并行进行预测，或者作为映射的一部分可供所有执行者使用?

Also, should this model be serialised / broadcast so that predictions can be done in parallel on a cluster, or will it be available to all executors as part of the map?

我已经走了这么远:

# load mojo model and create easy predict model wrapper
val mojo = MojoModel.load("loca/path/to/mojo/mojo.zip")
val easyModel = new EasyPredictModelWrapper(mojo)

# map over spark DataFrame, converty to rdd, and make predictions on each row:
df.rdd.map { row =>
   val prediction = easyModel.predictBinomial(row).classProbabilities
   println(prediction)
   }

但是我的row变量的格式不正确，无法正常工作.对接下来要尝试的内容有什么建议吗?

But my row variable is not in the right format for this to work. Any suggestions on what to try next?

我的DataFrame由70个预测性特征列组成，这些特征列是整数列和类别/因子列的混合.一个非常简单的示例DataFrame:

my DataFrame consists of 70 predictive feature columns which are a mixture of integers and category/factor columns. A very simple sample DataFrame:

val df = Seq(
  (0, 3, "cat1"),
  (1, 2, "cat2"),
  (2, 6, "cat1")
).toDF("id", "age", "category")

如何在Spark中映射DataFrame以提取RowData并使用H2O Mojo模型进行预测 [英] How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Spark中映射DataFrame以提取RowData并使用H2O Mojo模型进行预测 [英] How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭