如何使用IntelliJ上的Scala从spark连接到Hive? [英] How do I connect to Hive from spark using Scala on IntelliJ?
问题描述
我对hive和spark不熟悉,并试图找到一种方法来访问hive中的表以操纵和访问数据.怎么办?
I am new to hive and spark and am trying to figure out a way to access tables in hive to manipulate and access the data. How can it be done?
推荐答案
在Spark中<2.0
in spark < 2.0
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val myDataFrame = sqlContext.sql("select * from mydb.mytable")
在更高版本的spark中,使用SparkSession:
in later versions of spark, use SparkSession:
SparkSession现在是Spark的新入口点,取代了旧的SQLContext和HiveContext.请注意,旧的SQLContext和保留HiveContext是为了向后兼容.新目录可以从SparkSession访问接口-数据库上的现有API和表访问,例如listTables,createExternalTable,dropTempView,cacheTable都移到了这里.-来自文档
SparkSession is now the new entry point of Spark that replaces the old SQLContext and HiveContext. Note that the old SQLContext and HiveContext are kept for backward compatibility. A new catalog interface is accessible from SparkSession - existing API on databases and tables access such as listTables, createExternalTable, dropTempView, cacheTable are moved here. -- from the docs
val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
val myDataFrame = spark.sql("select * from mydb.mytable")
这篇关于如何使用IntelliJ上的Scala从spark连接到Hive?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!