仅将Spark ML管道用于转换 [英] Using Spark ML Pipelines just for Transformations

查看：70 发布时间：2020/9/4 18:39:32 apache-spark apache-spark-mllib apache-spark-ml

本文介绍了仅将Spark ML管道用于转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事一个项目，在该项目中，可配置的管道和对Spark DataFrame的更改的沿袭跟踪都是必不可少的.该管道的端点通常只是修改的DataFrame(将其视为ETL任务).对我而言，最有意义的是利用现有的Spark ML Pipeline API来跟踪这些更改.特别是，更改(基于其他列添加列等)被实现为自定义Spark ML变形金刚.

I am working on a project where configurable pipelines and lineage tracking of alterations to Spark DataFrames are both essential. The endpoints of this pipeline are usually just modified DataFrames (think of it as an ETL task). What made the most sense to me was to leverage the already existing Spark ML Pipeline API to track these alterations. In particular, the alterations (adding columns based on others, etc.) are implemented as custom Spark ML Transformers.

但是，我们现在正在进行内部辩论，这是否是实施此管道的最惯用的方法.另一个选择是将这些转换实现为一系列UDF，并基于DataFrame的架构历史记录(或Spark的内部DF沿袭跟踪)构建我们自己的沿袭跟踪.这方面的论点是，Spark的ML管道不仅仅用于ETL作业，而且应始终以生成可馈入Spark ML评估器的列为目标来实现.反对这一观点的论点是，这需要大量工作来镜像已经存在的功能.

However, we are now having an internal about debate whether or not this is the most idiomatic way of implementing this pipeline. The other option would be to implement these transformations as series of UDFs and to build our own lineage tracking based on a DataFrame's schema history (or Spark's internal DF lineage tracking). The argument for this side is that Spark's ML pipelines are not intended just ETL jobs, and should always be implemented with goal of producing a column which can be fed to a Spark ML Evaluator. The argument against this side is that it requires a lot of work that mirrors already existing functionality.

严格地将Spark的ML管道用于ETL任务是否有问题?仅使用变形金刚而不包含评估器的任务?

Is there any problem with leveraging Spark's ML Pipelines strictly for ETL tasks? Tasks that only make use of Transformers and don't include Evaluators?

仅将Spark ML管道用于转换 [英] Using Spark ML Pipelines just for Transformations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

仅将Spark ML管道用于转换 [英] Using Spark ML Pipelines just for Transformations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭