如果 SparkSession 没有关闭会发生什么? [英] What happens if SparkSession is not closed?

查看:38
本文介绍了如果 SparkSession 没有关闭会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下两个有什么区别?

object Example1 {
    def main(args: Array[String]): Unit = {
        try {
            val spark = SparkSession.builder.getOrCreate
            // spark code here
        } finally {
            spark.close
        }
    }
}

object Example2 {
    val spark = SparkSession.builder.getOrCreate
    def main(args: Array[String]): Unit = {
        // spark code here
    }
}    

我知道 SparkSession 实现了 Closeable 并且它暗示它需要关闭.但是,如果 SparkSession 像 Example2 一样刚刚创建并且从未直接关闭,我想不出任何问题.

I know that SparkSession implements Closeable and it hints that it needs to be closed. However, I can't think of any issues if the SparkSession is just created as in Example2 and never closed directly.

如果 Spark 应用程序成功或失败(并退出 main 方法),JVM 将终止并且 SparkSession 将随之消失.这是正确的吗?

In case of success or failure of the Spark application (and exit from main method), the JVM will terminate and the SparkSession will be gone with it. Is this correct?

IMO:SparkSession 是单例的事实也不应该有太大的不同.

IMO: The fact that the SparkSession is a singleton should not make a big difference either.

推荐答案

你应该总是关闭你的 SparkSession 当你完成它的使用(即使最终结果只是遵循一种良好的做法,即回馈您所获得的东西).

You should always close your SparkSession when you are done with its use (even if the final outcome were just to follow a good practice of giving back what you've been given).

关闭 SparkSession 可能会触发释放集群资源,这些资源可以提供给其他应用程序.

Closing a SparkSession may trigger freeing cluster resources that could be given to some other application.

SparkSession 是一个会话,因此维护一些消耗 JVM 内存的资源.您可以根据需要拥有任意数量的 SparkSession(请参阅 SparkSession.newSession 重新创建一个会话)但你不希望他们使用内存,如果你不使用他们不应该使用的内存,因此 close 你不再需要.

SparkSession is a session and as such maintains some resources that consume JVM memory. You can have as many SparkSessions as you want (see SparkSession.newSession to create a session afresh) but you don't want them to use memory they should not if you don't use one and hence close the one you no longer need.

SparkSession 是 Spark SQL 对 Spark Core 的 SparkContext 等在幕后(如在任何 Spark 应用程序中),您将拥有集群资源,即 vcores 和内存,分配给您的 SparkSession(通过 SparkContext).这意味着只要您的 SparkContext 正在使用中(使用 SparkSession),集群资源就不会分配给其他任务(不一定是 Spark 的,也适用于其他非Spark 应用程序提交到集群).这些集群资源是你的,直到你说我完成了"这意味着......close.

SparkSession is Spark SQL's wrapper around Spark Core's SparkContext and so under the covers (as in any Spark application) you'd have cluster resources, i.e. vcores and memory, assigned to your SparkSession (through SparkContext). That means that as long as your SparkContext is in use (using SparkSession) the cluster resources won't be assigned to other tasks (not necessarily Spark's but also for other non-Spark applications submitted to the cluster). These cluster resources are yours until you say "I'm done" which translates to...close.

但是,如果在 close 之后,您只是退出 Spark 应用程序,则不必考虑执行 close,因为无论如何资源都会自动关闭.驱动程序和执行程序的 JVM 终止,与集群的(心跳)连接也终止,因此最终资源返回给集群管理器,以便它可以将它们提供给其他应用程序使用.

If however, after close, you simply exit a Spark application, you don't have to think about executing close since the resources will be closed automatically anyway. The JVMs for the driver and executors terminate and so does the (heartbeat) connection to the cluster and so eventually the resources are given back to the cluster manager so it can offer them to use by some other application.

这篇关于如果 SparkSession 没有关闭会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆