避免以编程方式使用创建的上下文启动HiveThriftServer2 [英] Avoid starting HiveThriftServer2 with created context programmatically

查看:150
本文介绍了避免以编程方式使用创建的上下文启动HiveThriftServer2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



首先,我们创建了sparkSession并启用了Hive Support。我们正在尝试使用ThriftServer从spark temp表中查询Spark 2.0.0中的数据。
目前,我们开始ThriftServer与sqlContext这样的:

  HiveThriftServer2.startWithContext(spark.sqlContext()); 

我们有注册临时表spark_temp_table的spark流:

'pre> StreamingQuery streamingQuery = streamedData.writeStream()
.format( 存储器)
.queryName( spark_temp_table)
。开始();

通过直线,我们可以看到临时表(运行SHOW TABLES);

当我们想用这种方法运行第二个工作(使用第二个sparkSession)时,我们必须以不同的端口启动第二个ThriftServer。



I在这里有两个问题:


  1. 有没有办法让一个ThriftServer在一个端口上访问不同的临时表sparkSessions


  2. HiveThriftServer2.startWithContext(spark.sqlContext()); 注有 @DeveloperApi 。有没有办法以编程方式启动与上下文不存在的上下文服务器?

    我看到有配置 - conf spark.sql.hive.thriftServer.singleSession = true code>在启动时传递给ThriftServer(sbin / start-thriftserver.sh),但我不明白如何定义这个作业。我尝试在sparkSession构建器中设置此配置属性,但直线没有显示临时表。


  3. $ b有没有办法在一个端口上拥有一个ThriftServer,并且可以访问不同sparkSessions中的所有临时表?

    没有。 ThriftServer 使用特定的会话和临时表,只能在此会话中访问。这是为什么:

    lockquote
    直线没有显示临时表


    当你启动独立的服务器时, sbin / start-thriftserver.sh



    spark.sql.hive.thriftServer.singleSession 并不意味着您可以为多个服务器获取单个会话。它对与单个Thrift服务器的所有连接使用相同的会话。可能的用例:


    • 启动节点服务器。

    • client1连接到此服务器并创建临时表 foo

    • client2连接到此服务器并读取 foo


    We are trying to use ThriftServer to query data from spark temp tables, in spark 2.0.0.

    First, we have created sparkSession with enabled Hive Support. Currently, we start ThriftServer with sqlContext like this:

    HiveThriftServer2.startWithContext(spark.sqlContext());
    

    We have spark stream with registered temp table "spark_temp_table":

    StreamingQuery streamingQuery = streamedData.writeStream()
                                                 .format("memory")
                                                 .queryName("spark_temp_table")
                                                 .start();
    

    With beeline we are able to see temp tables (running SHOW TABLES);

    When we want to run second job (with second sparkSession) with this approach we have to start second ThriftServer with different port.

    I have two questions here:

    1. Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?

    2. HiveThriftServer2.startWithContext(spark.sqlContext()); is annotated with @DeveloperApi. Is there any way to start thrift server with context not in the code programatically?
      I saw there is configuration --conf spark.sql.hive.thriftServer.singleSession=true passed to ThriftServer on startup (sbin/start-thriftserver.sh) but I don't understand how to define this for a job. I tried to set this configuration property in sparkSession builder , but beeline didn't display temp tables.

    解决方案

    Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?

    No. ThriftServer uses specific session and temporary tables can be accessed only within this session. This is why:

    beeline didn't display temp tables.

    when you start independent server with sbin/start-thriftserver.sh.

    spark.sql.hive.thriftServer.singleSession doesn't mean you get a single session for multiple servers. It uses the same session for all connections to a single Thrift server. Possible use case:

    • you start thrift server.
    • client1 connects to this server and creates temp table foo.
    • client2 connects to this server and reads foo

    这篇关于避免以编程方式使用创建的上下文启动HiveThriftServer2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆