避免以编程方式使用创建的上下文启动 HiveThriftServer2 [英] Avoid starting HiveThriftServer2 with created context programmatically
问题描述
我们正在尝试使用 ThriftServer 从 spark 临时表中查询数据,在 spark 2.0.0 中.
We are trying to use ThriftServer to query data from spark temp tables, in spark 2.0.0.
首先,我们创建了启用 Hive 支持的 sparkSession.目前,我们使用 sqlContext 启动 ThriftServer,如下所示:
First, we have created sparkSession with enabled Hive Support. Currently, we start ThriftServer with sqlContext like this:
HiveThriftServer2.startWithContext(spark.sqlContext());
我们有带有注册临时表spark_temp_table"的火花流:
We have spark stream with registered temp table "spark_temp_table":
StreamingQuery streamingQuery = streamedData.writeStream()
.format("memory")
.queryName("spark_temp_table")
.start();
使用 beeline 我们可以看到临时表(运行 SHOW TABLES);
With beeline we are able to see temp tables (running SHOW TABLES);
当我们想用这种方法运行第二个作业(使用第二个 sparkSession)时,我们必须用不同的端口启动第二个 ThriftServer.
When we want to run second job (with second sparkSession) with this approach we have to start second ThriftServer with different port.
我有两个问题:
有没有办法让一个端口上的一个 ThriftServer 可以访问不同 sparkSession 中的所有临时表?
Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?
HiveThriftServer2.startWithContext(spark.sqlContext());
用 @DeveloperApi
注释.有没有办法以编程方式在代码中没有上下文的情况下启动 thrift 服务器?
我看到有配置 --conf spark.sql.hive.thriftServer.singleSession=true
在启动时传递给 ThriftServer (sbin/start-thriftserver.sh) 但我不明白如何定义它为了一份工作.我试图在 sparkSession builder 中设置这个配置属性,但是 beeline 没有显示临时表.
HiveThriftServer2.startWithContext(spark.sqlContext());
is annotated with @DeveloperApi
. Is there any way to start thrift server with context not in the code programatically?
I saw there is configuration --conf spark.sql.hive.thriftServer.singleSession=true
passed to ThriftServer on startup (sbin/start-thriftserver.sh) but I don't understand how to define this for a job. I tried to set this configuration property in sparkSession builder , but beeline didn't display temp tables.
推荐答案
有没有办法让一个端口上的一个 ThriftServer 可以访问不同 sparkSession 中的所有临时表?
Is there any way to have one ThriftServer on one port with access to all temp tables in a different sparkSessions?
没有.ThriftServer
使用特定的会话,临时表只能在此会话中访问.这就是为什么:
No. ThriftServer
uses specific session and temporary tables can be accessed only within this session. This is why:
beeline 没有显示临时表.
beeline didn't display temp tables.
当您使用 sbin/start-thriftserver.sh
启动独立服务器时.
when you start independent server with sbin/start-thriftserver.sh
.
spark.sql.hive.thriftServer.singleSession
并不意味着您获得多个服务器的单个会话.它对到单个 Thrift 服务器的所有连接使用相同的会话.可能的用例:
spark.sql.hive.thriftServer.singleSession
doesn't mean you get a single session for multiple servers. It uses the same session for all connections to a single Thrift server. Possible use case:
- 您启动了 thrift 服务器.
- client1 连接到此服务器并创建临时表
foo
. - client2 连接到此服务器并读取
foo
这篇关于避免以编程方式使用创建的上下文启动 HiveThriftServer2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!