使用Apache Spark查询多个Hive商店 [英] Querying on multiple Hive stores using Apache Spark

查看：176 发布时间：2018/6/12 13:48:28 apache-spark hive spark-hive

本文介绍了使用Apache Spark查询多个Hive商店的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Spark应用程序，它将成功连接到配置单元并使用Spark引擎在配置单元表上进行查询。

为了构建这个，我只是将 hive-site.xml 添加到应用程序的类路径中，spark将读取 hive-site.xml 来连接到它的Metastore。这个方法是在spark的邮件列表中建议的。

到目前为止这么好。现在我想连接到两个配置单元存储，我不认为在我的类路径中添加另一个 hive-site.xml 会很有帮助。我引用了不少文章和火花邮件列表，但找不到任何人这样做。

有人可以建议我怎么做到这一点吗？

谢谢。

文件介绍：

我没有试过的东西

使用环境1的HiveContext和环境2的SqlContext

希望这会有用。

I have a spark application which will successfully connect to hive and query on hive tables using spark engine.

To build this, I just added hive-site.xml to classpath of the application and spark will read the hive-site.xml to connect to its metastore. This method was suggested in spark's mailing list.

So far so good. Now I want to connect to two hive stores and I don't think adding another hive-site.xml to my classpath will be helpful. I referred quite a few articles and spark mailing lists but could not find anyone doing this.

Can someone suggest how I can achieve this?

Thanks.

Docs referred:

Hive on Spark

Spak docs

HiveContext

解决方案
I think this is possible by making use of Spark SQL capability of connecting and reading data from remote databases using JDBC.

After an exhaustive R & D, I was successfully able to connect to two different hive environments using JDBC and load the hive tables as DataFrames into Spark for further processing.

Environment details

hadoop-2.6.0

apache-hive-2.0.0-bin

spark-1.3.1-bin-hadoop2.6

Code Sample HiveMultiEnvironment.scala
import org.apache.spark.SparkConf import org.apache.spark.sql.SQLContext import org.apache.spark.SparkContext object HiveMultiEnvironment { def main(args: Array[String]) { var conf = new SparkConf().setAppName("JDBC").setMaster("local") var sc = new SparkContext(conf) var sqlContext = new SQLContext(sc) // load hive table (or) sub-query from Environment 1 val jdbcDF1 = sqlContext.load("jdbc", Map( "url" -> "jdbc:hive2://<host1>:10000/<db>", "dbtable" -> "<db.tablename or subquery>", "driver" -> "org.apache.hive.jdbc.HiveDriver", "user" -> "<username>", "password" -> "<password>")) jdbcDF1.foreach { println } // load hive table (or) sub-query from Environment 2 val jdbcDF2 = sqlContext.load("jdbc", Map( "url" -> "jdbc:hive2://<host2>:10000/<db>", "dbtable" -> "<db.tablename> or <subquery>", "driver" -> "org.apache.hive.jdbc.HiveDriver", "user" -> "<username>", "password" -> "<password>")) jdbcDF2.foreach { println } } // todo: business logic }
Other parameters can also be set during load using SqlContext such as setting partitionColumn. Details found under 'JDBC To Other Databases' section in Spark reference doc: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

Build path from Eclipse:

What I Haven't Tried

Use of HiveContext for Environment 1 and SqlContext for environment 2

Hope this will be useful.

这篇关于使用Apache Spark查询多个Hive商店的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Apache Spark查询多个Hive商店 [英] Querying on multiple Hive stores using Apache Spark

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Apache Spark查询多个Hive商店 [英] Querying on multiple Hive stores using Apache Spark

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭