如何解决"SparkException:在Future.get中引发的异常";问题? [英] How can I resolve "SparkException: Exception thrown in Future.get" issue?

查看:112
本文介绍了如何解决"SparkException:在Future.get中引发的异常";问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理两个pyspark数据帧,并对它们进行左-反连接以跟踪日常更改,然后发送电子邮件.
我第一次尝试:

I'm working on two pyspark dataframes and doing a left-anti join on them to track everyday changes and then send an email.
The first time I tried:

diff = Table_a.join(
    Table_b, 
    [Table_a.col1== Table_b.col1, Table_a.col2== Table_b.col2], 
    how='left_anti'
)

预期输出是一个不包含任何数据的pyspark数据框.

Expected output is a pyspark dataframe with some or no data.

此差异数据框从Table_a获取其架构.我第一次运行它时,没有显示与模式表示相同的数据.下次再抛出SparkException:

This diff dataframe gets it's schema from Table_a. The first time I ran it, showed no data as expected with the schema representation. The next time onwards just throws SparkException:

Exception thrown in Future.get

推荐答案

我使用Scala,但是根据我的经验,当基础表之一发生某种更改时,会发生这种情况.我的建议是尝试简单地运行 display(Table_a) display(Table_b),并查看其中任何命令是否失败.这应该给您有关问题出在哪里的提示.

I use Scala, but, from my experience, this happens when one of the underlying tables has been changed somehow. My advice would be to try to run simply display(Table_a) and display(Table_b), and see if any of those commands fail. This should give you a hint about where is the problem.

无论如何,为了有效解决该问题,我的建议是清除正在运行的缓存

In any case, to effectively solve the issue, my advice would clearing the cache running

%sql
REFRESH my_schema.table_a
REFRESH my_schema.table_b

然后重新定义这些变量,如

and, then, redefining those variables, as in

Table_a = spark.table("my_schema.table_a")
Table_b = spark.table("my_schema.table_b")

这对我有用-希望它对您也有帮助.

This worked for me - hope it helps you too.

这篇关于如何解决"SparkException:在Future.get中引发的异常";问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆