在同一个 Master 下的 Java 和 R Apps 之间共享 SparkContext [英] Share SparkContext between Java and R Apps under the same Master
问题描述
这里是设置.
目前我已经初始化了两个 Spark 应用程序.我需要在它们之间传递数据(最好通过共享 sparkcontext/sqlcontext 这样我就可以查询临时表).我目前使用 Parquet Files 进行数据帧传输,但还有其他可能吗?
Currently I have two Spark Applications initialized. I need to pass data between them (preferably through shared sparkcontext/sqlcontext so I can just query a temp table). I currently use Parquet Files to dataframe transfer, but is it possible any other way?
MasterURL 指向同一个 SparkMaster
MasterURL points to the same SparkMaster
通过终端启动 Spark:
Start Spark via Terminal:
/opt/spark/sbin/start-master.sh;
/opt/spark/sbin/start-slave.sh spark://`hostname`:7077
Java 应用设置:
JavaSparkContext context = new JavaSparkContext(conf);
//conf = setMaster(MasterURL), 6G memory, and 4 cores.
SQLContext sqlContext = new SQLContext(parentContext.sc());
然后我稍后注册一个现有的框架
Then I register an existing frame later on
//existing dataframe to temptable
df.registerTempTable("table");
和
SparkR
sc <- sparkR.init(master='MasterURL', sparkEnvir=list(spark.executor.memory='6G', spark.cores.max='4')
sqlContext <- sparkRSQL.init(sc)
# attempt to get temptable
df <- sql(sqlContext, "SELECT * FROM table"); # throws the error
推荐答案
据我所知,鉴于您当前的配置,这是不可能的.使用 registerTempTable
创建的表绑定到特定的 SQLContext
,该SQLContext
已用于创建相应的 DataFrame
.即使您的 Java 和 SparkR 应用程序使用同一个 master,它们的驱动程序也运行在不同的 JVM 上,并且不能共享单个 SQLContext
.
As far as I know it is not possible given your current configuration. Tables created using registerTempTable
are bound to the specific SQLContext
which has been used to create corresponding DataFrame
. Even if your Java and SparkR applications use the same master their drivers run on separate JVMs and cannot share single SQLContext
.
有一些工具,如 Apache Zeppelin,它们采用不同的方法,将单个 SQLContext
(和 SparkContext
)暴露给各个后端.通过这种方式,您可以使用例如 Scala 注册表并从 Python 中读取它.Zeppelin 的一个分支,它为 SparkR 和 R 提供了一些支持.您可以检查它如何 启动和交互 R 后端.
There are tools, like Apache Zeppelin, which take a different approach with a single SQLContext
(and SparkContext
) which is exposed to individual backends. This way you can register table using for example Scala and read it from Python. There is a fork of Zeppelin which provides some support for SparkR and R. You can check how it starts and interacts R backend.
这篇关于在同一个 Master 下的 Java 和 R Apps 之间共享 SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!