Spark:线程"main"中的异常;org.apache.spark.sql.catalyst.errors.package [英] Spark: Exception in thread "main" org.apache.spark.sql.catalyst.errors.package

查看:103
本文介绍了Spark:线程"main"中的异常;org.apache.spark.sql.catalyst.errors.package的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在运行我的提交火花的代码时,执行该错误.

While running my spark-submit code, I get this error when I execute.

执行连接的scala文件.

Scala file which performs joins.

我只是想知道这个TreeNodeException错误是什么.

I am just curious to know what is this TreeNodeException error.

为什么会有这个错误?

请就这个TreeNodeException错误分享您的想法:

Please share your ideas on this TreeNodeException error:

Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

推荐答案

我在加入数据框时也遇到了此异常

I encountered this exception when joining dataframes too

Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

要解决此问题,我只是颠倒了连接的顺序.也就是说,我没有做df1.join(df2,on_col ="A"),而是做了df2.join(df1,on_col ="A").不知道为什么会这样,但是我的直觉告诉我,当您使用前一个命令而不是后者时,Spark必须遵循的逻辑树是混乱的.您可以将其视为我的玩具示例中Spark必须与"A"列进行比较的次数,才能将两个数据框都加入.我知道这不是一个肯定的答案,但我希望它会有所帮助.

To fix it, I simply reversed the order of the join. That is, instead of doing df1.join(df2, on_col="A"), I did df2.join(df1, on_col="A"). Not sure why this is the case but my intuition tells me the logic tree that Spark must follow is messy when you use the former command but not the with the latter. You can think of it as the number of comparisons Spark would have to make with column "A" in my toy example to join both dataframes. I know it's not a definite answer but I hope it helps.

这篇关于Spark:线程"main"中的异常;org.apache.spark.sql.catalyst.errors.package的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆