R Shiny和Spark:如何释放Spark资源? [英] R Shiny and Spark: how to free Spark resources?

查看:116
本文介绍了R Shiny和Spark:如何释放Spark资源?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个 Shiny应用,该应用在服务器上已部署.我们希望该应用程序将像往常一样通过其网络浏览器被多个用户使用.

Say we have a Shiny app which is deployed on a Shiny Server. We expect that the app will be used several users via their web browser, as usual.

Shiny应用程序的server.R包含一些sparklyr程序包代码,这些代码连接到Spark集群,以对位于HDFS上的数据进行经典的filterselectmutatearrange操作.

The Shiny app's server.R includes some sparklyr package code which connects to a Spark cluster for classic filter, select, mutate, and arrange operations on data located on HDFS.

是否必须从Spark断开连接:在服务器末尾添加spark_disconnect .R代码以释放资源?我认为我们绝不应该在让Spark处理负载的情况下断开连接每个到达和离开的用户.有人可以帮我确认一下吗?

Is it mandatory to disconnect from Spark: to include a spark_disconnect at the end of the server.R code to free resources ? I think we should never disconnect at let Spark handle the load for each arriving and leaving user. Can somebody please help me to confirm this ?

推荐答案

TL; DR SparkSessionSparkContext并非轻量级资源,可以按需启动.

TL;DR SparkSession and SparkContext are not lightweight resources which can be started on demand.

撇开与直接从面向用户的应用程序启动Spark会话相关的所有安全注意事项,在服务器内部维护SparkSession(在进入时启动会话,在退出时停止)根本不是可行的选择.

Putting aside all security considerations related to starting Spark session directly from a user-facing application, maintaining SparkSession inside server (starting session on entry, stopping on exit) is simply not a viable option.

server函数.而这仅仅是冰山一角.由于Spark重用了现有的sessions(单个JVM只允许一个上下文),因此,如果已从另一个server调用中停止了重用会话,则多用户访问可能会导致随机失败.

server function will be executed every time there is an upcoming event effectively restarting a whole Spark application, and rendering project unusable. And this only the tip of the iceberg. Since Spark reuses existing sessions (only one context is allowed for a single JVM), multiuser access could lead to random failures if reused session has been stopped from another server call.

一种可能的解决方案是将 onSessionEnded 注册为spark_disconnect,但是我很确定它仅在单个用户环境中有用.

One possible solution is to register onSessionEnded with spark_disconnect, but I am pretty sure it will be useful only in a single user environment.

另一种可能的方法是使用全局连接,并在出口处通过调用spark_disconnect_all的函数包装runApp:

Another possible approach is to use global connection, and wrap runApp with function calling spark_disconnect_all on exit:

runApp <- function() {
  shiny::runApp()
  on.exit({
    spark_disconnect_all()
  })
}

尽管实际上,资源管理器应在驱动程序解除关联时释放资源,而不显式停止会话.

although in practice resource manager should free resources when driver disassociates, without stopping session explicitly.

这篇关于R Shiny和Spark:如何释放Spark资源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆