Hive表在Spark作业中不存在 [英] hive table does not exist in spark job
问题描述
我正在EMR中使用Hive Metastore.
我可以通过HiveSQL或SparkSQL手动查询表.
但是,当我在Spark Job中使用同一张表时,它会显示 表或视图未找到
I am using Hive Metastore in EMR.
I am able to query the table manually through HiveSQL or SparkSQL.
But When i use the same table in Spark Job, it says Table or view not found
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco pyspark.sql.utils.AnalysisException:
u"Table or view not found: `logan_test`.`salary_csv`; line 1 pos 21;
'Aggregate [unresolvedalias(count(1), None)]
+- 'UnresolvedRelation `logan_test`.`salary_csv`
这是我的完整代码
from pyspark import SparkContext, HiveContext
from pyspark import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext(appName = "test")
sqlContext = SQLContext(sparkContext=sc)
sqlContext.sql("select count(*) from logan_test.salary_csv").show()
print("done..")
我提交了以下工作以使用配置单元目录表.
I submitted my job as below to use hive catalog tables.
spark-submit test.py --files/usr/lib/hive/conf/hive-site.xml
推荐答案
好像您正在使用Spark 2,因此 SQLContext
和 HiveContext
应该替换为 enableHiveSupport()
Looks like you're using Spark 2, therefore SQLContext
and HiveContext
should be replaced with SparkSession.sql()
after you enableHiveSupport()
代替 .sql()
,可以使用 SparkSession.table()
来获取整个表的DataFrame,然后在其后加上count()
,然后是您想要的其他任何查询.
And instead of .sql()
, you can use SparkSession.table()
to get a DataFrame of the entire table, then follow it with a count()
, then whatever other queries you want.
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().appName("Hive Example").getOrCreate()
salary_csv = spark.table("logan_test.salary_csv")
print(salary_csv.count())
这篇关于Hive表在Spark作业中不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!