对于“迭代算法"，转换为 RDD 然后再转换回 Dataframe 有什么好处 [英] For "iterative algorithms," what is the advantage of converting to an RDD then back to a Dataframe

查看：28 发布时间：2021/11/14 22:41:32 apache-spark apache-spark-sql rdd catalyst-optimizer

本文介绍了对于“迭代算法"，转换为 RDD 然后再转换回 Dataframe 有什么好处的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在阅读高性能 Spark，作者提出以下声明:

I am reading High Performance Spark and the author makes the following claim:

虽然 Catalyst 优化器非常强大，但它目前遇到的挑战之一是非常大的查询计划.这些查询计划往往是迭代算法的结果，例如图算法或机器学习算法.一个简单的解决方法是在每次迭代结束时将数据转换为 RDD 并返回到 DataFrame/Dataset，如例 3-58 所示.

While the Catalyst optimizer is quite powerful, one of the cases where it currently runs into challenges is with very large query plans. These query plans tend to be the result of iterative algorithms, like graph algorithms or machine learning algorithms. One simple workaround for this is converting the data to an RDD and back to DataFrame/Dataset at the end of each iteration, as shown in Example 3-58.

示例 3-58 被标记为Round trip through RDD to cut query plan"，复制如下:

Example 3-58 is labeled "Round trip through RDD to cut query plan" and is reproduced below:

val rdd = df.rdd
rdd.cache()
sqlCtx.createDataFrame(rdd. df.schema)

有谁知道需要这种变通方法的根本原因是什么?

Does anyone know what is the underlying reason that makes this workaround necessary?

作为参考，已针对此问题提交了错误报告，可从以下链接获取:https://issues.apache.org/jira/browse/SPARK-13346

For reference, a bug report has been filed for this issue and is available at the following link: https://issues.apache.org/jira/browse/SPARK-13346

似乎没有修复，但维护者已经关闭了这个问题，并且似乎不认为他们需要解决它.

There does not appear to be a fix, but the maintainers have closed the issue and do not seem to believe they need to address it.

对于“迭代算法"，转换为 RDD 然后再转换回 Dataframe 有什么好处 [英] For "iterative algorithms," what is the advantage of converting to an RDD then back to a Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对于“迭代算法"，转换为 RDD 然后再转换回 Dataframe 有什么好处 [英] For &quot;iterative algorithms,&quot; what is the advantage of converting to an RDD then back to a Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

对于“迭代算法"，转换为 RDD 然后再转换回 Dataframe 有什么好处 [英] For "iterative algorithms," what is the advantage of converting to an RDD then back to a Dataframe

登录关闭