如何解决"SparkException:在Future.get中引发的异常";问题? [英] How can I resolve "SparkException: Exception thrown in Future.get" issue?
问题描述
我正在处理两个pyspark数据帧,并对它们进行左-反连接以跟踪日常更改,然后发送电子邮件.
我第一次尝试:
I'm working on two pyspark dataframes and doing a left-anti join on them to track everyday changes and then send an email.
The first time I tried:
diff = Table_a.join(
Table_b,
[Table_a.col1== Table_b.col1, Table_a.col2== Table_b.col2],
how='left_anti'
)
预期输出是一个不包含任何数据的pyspark数据框.
Expected output is a pyspark dataframe with some or no data.
此差异数据框从Table_a获取其架构.我第一次运行它时,没有显示与模式表示相同的数据.下次再抛出SparkException:
This diff dataframe gets it's schema from Table_a. The first time I ran it, showed no data as expected with the schema representation. The next time onwards just throws SparkException:
Exception thrown in Future.get
推荐答案
我使用Scala,但是根据我的经验,当基础表之一发生某种更改时,会发生这种情况.我的建议是尝试简单地运行 display(Table_a)
和 display(Table_b)
,并查看其中任何命令是否失败.这应该给您有关问题出在哪里的提示.
I use Scala, but, from my experience, this happens when one of the underlying tables has been changed somehow. My advice would be to try to run simply
display(Table_a)
and display(Table_b)
, and see if any of those commands fail. This should give you a hint about where is the problem.
无论如何,为了有效解决该问题,我的建议是清除正在运行的缓存
In any case, to effectively solve the issue, my advice would clearing the cache running
%sql
REFRESH my_schema.table_a
REFRESH my_schema.table_b
然后重新定义这些变量,如
and, then, redefining those variables, as in
Table_a = spark.table("my_schema.table_a")
Table_b = spark.table("my_schema.table_b")
这对我有用-希望它对您也有帮助.
This worked for me - hope it helps you too.
这篇关于如何解决"SparkException:在Future.get中引发的异常";问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!