SparkContext.getOrCreate()的目的 [英] SparkContext.getOrCreate() purpose

查看:722
本文介绍了SparkContext.getOrCreate()的目的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SparkContext 类中的 getOrCreate 方法的目的是什么?我不知道何时应该使用此方法.

What is the purpose of the getOrCreate method from SparkContext class? I don't understand when we should use this method.

如果我有两个使用 spark-submit 运行的spark应用程序,并且在主要方法中,我使用 SparkContext.getOrCreate 实例化了spark上下文,则两个应用程序都将具有相同的上下文?

If I have 2 spark applications that are run with spark-submit, and in the main method I instantiate the spark context with SparkContext.getOrCreate, both app will have the same context?

或者目的更简单,唯一的目的是当我创建一个spark应用程序时,并且我不想将spark上下文作为参数发送给方法,而将其作为单例对象获得吗?/p>

Or the purpose is simpler, and the only purpose is when I create a spark app, and I don't want to send the spark context as a parameter to a method, and I will get it as a singleton object?

推荐答案

如果我有2个使用spark-submit运行的spark应用程序,并且在main方法中,我使用SparkContext.getOrCreate实例化spark上下文,那么两个应用程序将具有相同的上下文吗?

If I have 2 spark applications that are run with spark-submit, and in the main method I instantiate the spark context with SparkContext.getOrCreate, both app will have the same context?

否,SparkContext是本地对象.不能在应用程序之间共享它.

No, SparkContext is a local object. It is not shared between applications.

当我创建一个spark应用程序时,我不想将spark上下文作为参数发送给方法,而将其作为单例对象获得吗?

when I create a spark app, and I don't want to send the spark context as a parameter to a method, and I will get it as a singleton object?

这正是原因. SparkContext (或 SparkSession )在Spark应用程序和Spark核心资源中无处不在,而将它们传递出去将带来巨大负担.

This is exactly the reason. SparkContext (or SparkSession) are ubiquitous in Spark applications and core Spark's source, and passing them around would a huge burden.

它对于任意线程可以初始化上下文的多线程应用程序也很有用.

It also useful for multithreaded applications where arbitrary thread can initalize contexts.

关于文档:

is函数可用于获取或实例化SparkContext并将其注册为单例对象.因为每个JVM只能有一个活动的SparkContext,所以这在应用程序希望共享SparkContext时很有用.

is function may be used to get or instantiate a SparkContext and register it as a singleton object. Because we can only have one active SparkContext per JVM, this is useful when applications may wish to share a SparkContext.

驱动程序在其自己的JVM中运行,并且没有内置的机制可在多个成熟的Java应用程序之间共享它(正确的应用程序执行其自己的 main .请检查为什么只有一个JVM每个应用程序?以获取相关的一般性问题).应用程序是指逻辑应用程序",其中多个模块执行其自己的代码-一个示例是 spark-jobserver 上的 SparkJob .这种情况与将 SparkContext 传递给函数没有什么不同.

Driver runs in its own JVM and there is no built-in mechanism to share it between multiple full-fledged Java applications (proper application executing its own main. Check Is there one JVM per Java application? and Why have one JVM per application? for related general questions). Application refers to "logical application" where multiple modules execute its own code - one example is SparkJob on spark-jobserver. This scenario is no different than passing SparkContext to a function.

这篇关于SparkContext.getOrCreate()的目的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆