Apache Spark MLlib 模型文件格式 [英] Apache Spark MLlib Model File Format

查看：52 发布时间：2021/11/14 20:58:52 apache-spark apache-spark-mllib

本文介绍了Apache Spark MLlib 模型文件格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Apache Spark MLlib 算法(例如，决策树)将模型保存在一个位置(例如，myModelPath)，在该位置创建两个目录，即.myModelPath/data 和 myModelPath/metadata.这些路径中有多个文件，它们不是文本文件.有一些格式为 *.parquet 的文件.

Apache Spark MLlib algorithms (e.g., Decision Trees) save the model in a location (e.g., myModelPath) where it creates two directories, viz. myModelPath/data and myModelPath/metadata. There are multiple files in these paths and those are not text files. There are some files of format *.parquet.

我有几个问题:

这些文件的格式是什么?
哪些文件/文件包含实际模型?
我可以将模型保存到其他地方，例如在数据库中吗?

推荐答案

Spark >= 2.4

由于 Spark 2.4 提供了与格式无关的编写器接口，并且选定的模型已经实现了这些接口.例如 LinearRegressionModel:

Since Spark 2.4 provides format agnostic writer interfaces and selected models already implement these. For example LinearRegressionModel:

val lrm: org.apache.spark.ml.regression.LinearRegressionModel = ???
val path: String = ???

lrm.write.format("pmml").save(path)

将创建一个目录，其中包含一个包含 PMML 表示的文件.

will create a directory with a single file containing PMML representation.

火花<2.4

这些文件的格式是什么?

What are the format of these files?

data/*.parquet 文件采用 Apache Parquet 列式存储格式
metadata/part-* 看起来像 JSON

data/*.parquet files are in Apache Parquet columnar storage format
metadata/part-* looks like JSON

哪些文件/文件包含实际模型?

Which file/files contain actual model?

model/*.parquet

我可以将模型保存到其他地方，例如在数据库中吗?

Can I save the model to somewhere else, for example in a DB?

我不知道有任何直接方法，但您可以将模型作为数据框加载，然后将其存储在数据库中:

I am not aware of any direct method but you can load model as a data frame and store it in a database afterwards:

val modelDf = spark.read.parquet("/path/to/data/")
modelDf.write.jdbc(...)

这篇关于Apache Spark MLlib 模型文件格式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark MLlib 模型文件格式 [英] Apache Spark MLlib Model File Format

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark MLlib 模型文件格式 [英] Apache Spark MLlib Model File Format

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭