R Shiny和Spark:如何释放Spark资源? [英] R Shiny and Spark: how to free Spark resources?
问题描述
假设我们有一个 Shiny应用,该应用在
Say we have a Shiny app which is deployed on a Shiny Server. We expect that the app will be used several users via their web browser, as usual.
Shiny应用程序的server.R
包含一些sparklyr
程序包代码,这些代码连接到Spark集群,以对位于HDFS上的数据进行经典的filter
,select
,mutate
和arrange
操作.
The Shiny app's server.R
includes some sparklyr
package code which connects to a Spark cluster for classic filter
, select
, mutate
, and arrange
operations on data located on HDFS.
是否必须从Spark断开连接:在服务器末尾添加spark_disconnect
.R代码以释放资源?我认为我们绝不应该在让Spark处理负载的情况下断开连接每个到达和离开的用户.有人可以帮我确认一下吗?
Is it mandatory to disconnect from Spark: to include a spark_disconnect
at the end of the server.R code to free resources ? I think we should never disconnect at let Spark handle the load for each arriving and leaving user. Can somebody please help me to confirm this ?
推荐答案
TL; DR SparkSession
和SparkContext
并非轻量级资源,可以按需启动.
TL;DR SparkSession
and SparkContext
are not lightweight resources which can be started on demand.
撇开与直接从面向用户的应用程序启动Spark会话相关的所有安全注意事项,在服务器内部维护SparkSession
(在进入时启动会话,在退出时停止)根本不是可行的选择.
Putting aside all security considerations related to starting Spark session directly from a user-facing application, maintaining SparkSession
inside server (starting session on entry, stopping on exit) is simply not a viable option.
server
函数.而这仅仅是冰山一角.由于Spark重用了现有的sessions
(单个JVM只允许一个上下文),因此,如果已从另一个server
调用中停止了重用会话,则多用户访问可能会导致随机失败.
server
function will be executed every time there is an upcoming event effectively restarting a whole Spark application, and rendering project unusable. And this only the tip of the iceberg. Since Spark reuses existing sessions
(only one context is allowed for a single JVM), multiuser access could lead to random failures if reused session has been stopped from another server
call.
一种可能的解决方案是将 onSessionEnded
注册为spark_disconnect
,但是我很确定它仅在单个用户环境中有用.
One possible solution is to register onSessionEnded
with spark_disconnect
, but I am pretty sure it will be useful only in a single user environment.
另一种可能的方法是使用全局连接,并在出口处通过调用spark_disconnect_all
的函数包装runApp
:
Another possible approach is to use global connection, and wrap runApp
with function calling spark_disconnect_all
on exit:
runApp <- function() {
shiny::runApp()
on.exit({
spark_disconnect_all()
})
}
尽管实际上,资源管理器应在驱动程序解除关联时释放资源,而不显式停止会话.
although in practice resource manager should free resources when driver disassociates, without stopping session explicitly.
这篇关于R Shiny和Spark:如何释放Spark资源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!