我们可以使用多个 sparksessions 访问两个不同的 Hive 服务器吗 [英] Can we able to use mulitple sparksessions to access two different Hive servers

查看:31
本文介绍了我们可以使用多个 sparksessions 访问两个不同的 Hive 服务器吗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个场景来比较来自两个单独的远程配置单元服务器的两个不同的表源和目标,我们可以使用两个 SparkSessions 就像我在下面尝试过的那样:-

I have a scenario to compare two different tables source and destination from two separate remote hive servers, can we able to use two SparkSessions something like I tried below:-

 val spark = SparkSession.builder().master("local")
  .appName("spark remote")
  .config("javax.jdo.option.ConnectionURL", "jdbc:mysql://192.168.175.160:3306/metastore?useSSL=false")
  .config("javax.jdo.option.ConnectionUserName", "hiveroot")
  .config("javax.jdo.option.ConnectionPassword", "hivepassword")
  .config("hive.exec.scratchdir", "/tmp/hive/${user.name}")
  .config("hive.metastore.uris", "thrift://192.168.175.160:9083")
  .enableHiveSupport()
  .getOrCreate()

SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

val sparkdestination = SparkSession.builder()
  .config("javax.jdo.option.ConnectionURL", "jdbc:mysql://192.168.175.42:3306/metastore?useSSL=false")
  .config("javax.jdo.option.ConnectionUserName", "hiveroot")
  .config("javax.jdo.option.ConnectionPassword", "hivepassword")
  .config("hive.exec.scratchdir", "/tmp/hive/${user.name}")
  .config("hive.metastore.uris", "thrift://192.168.175.42:9083")
  .enableHiveSupport()
  .getOrCreate() 

我尝试使用 SparkSession.clearActiveSession() 和 SparkSession.clearDefaultSession() 但它不起作用,抛出以下错误:

I tried with SparkSession.clearActiveSession() and SparkSession.clearDefaultSession() but it isn't working, throwing the error below:

Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

有没有其他方法可以实现使用多个 SparkSessionsSparkContext 访问两个 hive 表.

is there any other way we can achieve accessing two hive tables using multiple SparkSessions or SparkContext.

谢谢

推荐答案

SparkSession getOrCreate 方法

哪个状态

获取一个现有的 [[SparkSession]] 或者,如果没有现有的,根据此构建器中设置的选项创建一个新的.

gets an existing [[SparkSession]] or, if there is no existing one, creates a new one based on the options set in this builder.

该方法首先检查是否存在有效的thread-localSparkSession,如果是,返回那个.然后检查是否有一个有效的全局默认 SparkSession,如果是,则返回那个.如果不存在有效的全局默认 SparkSession,则该方法创建一个新的 SparkSession 并分配新创建的SparkSession 作为全局默认值.如果返回现有 SparkSession,则此构建器中指定的配置选项将应用于现有SparkSession.

This method first checks whether there is a valid thread-local SparkSession, and if yes, return that one. It then checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default. In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.

这就是它返回第一个会话及其配置的原因.

That's the reason its returning first session and its configurations.

请通过 docs 以找出创建会话的替代方法..

Please go through the docs to find out alternative ways to create session..

我正在开发 <2 spark 版本.所以我不确定如何在没有配置冲突的情况下创建新会话..

I'm working on <2 spark version. So I am not sure how to create new session with out collision of configuration exactly..

但是,这里有一个有用的测试用例,即 SparkSessionBuilderSuite.scala 来做到这一点-DIY..

But, here is useful test case i.e SparkSessionBuilderSuite.scala to do that- DIY..

该测试用例中的示例方法

Example method in that test case

test("use session from active thread session and propagate config options") {
    val defaultSession = SparkSession.builder().getOrCreate()
    val activeSession = defaultSession.newSession()
    SparkSession.setActiveSession(activeSession)
    val session = SparkSession.builder().config("spark-config2", "a").getOrCreate()

    assert(activeSession != defaultSession)
    assert(session == activeSession)
    assert(session.conf.get("spark-config2") == "a")
    assert(session.sessionState.conf == SQLConf.get)
    assert(SQLConf.get.getConfString("spark-config2") == "a")
    SparkSession.clearActiveSession()

    assert(SparkSession.builder().getOrCreate() == defaultSession)
    SparkSession.clearDefaultSession()
  }

这篇关于我们可以使用多个 sparksessions 访问两个不同的 Hive 服务器吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆