通过Spark加载的表无法在Hive中访问 [英] Table loaded through Spark not accessible in Hive

查看:713
本文介绍了通过Spark加载的表无法在Hive中访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法通过Hive访问通过Spark(pyspark)创建的Hive表.

Hive table created through Spark (pyspark) are not accessible from Hive.

df.write.format("orc").mode("overwrite").saveAsTable("db.table")

从Hive访问时出错:

Error while accessing from Hive:

错误:java.io.IOException:java.lang.IllegalArgumentException:bucketId超出范围:-1(状态=,代码= 0)

Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)

在Hive中成功创建表,并能够在Spark中读取该表.可以访问表元数据(在Hive中),并在表目录中的数据文件(在hdfs中).

Table getting created successfully in Hive and able to read this table back in spark. Table metadata is accessible (in Hive) and data file in table (in hdfs) directory.

Hive表的TBLPROPERTIES是:

TBLPROPERTIES of Hive table are :

  'bucketing_version'='2',                         
  'spark.sql.create.version'='2.3.1.3.0.0.0-1634', 
  'spark.sql.sources.provider'='orc',              
  'spark.sql.sources.schema.numParts'='1',

我还尝试了使用其他解决方法创建表,但是在创建表时出错:

I also tried creating table with other workarounds but getting error while creating table:

df.write.mode("overwrite").saveAsTable("db.table")

OR

df.createOrReplaceTempView("dfTable")
spark.sql("CREATE TABLE db.table AS SELECT * FROM dfTable")

错误:

AnalysisException:u'org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Table default.src由于严格原因未能通过严格的托管表检查:表被标记为托管表,但未标记为托管表事务性的).'

AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Table default.src failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional.);'

堆栈版本详细信息:

Spark2.3

Hive3.1

Hortonworks数据平台HDP3.0

Hortonworks Data Platform HDP3.0

推荐答案

从HDP 3.0开始,Apache Hive和Apache Spark的目录是分开的,它们使用自己的目录.也就是说,它们是互斥的-Apache Hive目录只能由Apache Hive或该库访问,而Apache Spark目录只能由Apache Spark中的现有API访问.换句话说,某些功能(例如ACID表或带有Apache Hive表的Apache Ranger)仅可通过Apache Spark中的此库使用.不能直接在Apache Spark API本身中访问Hive中的那些表.

From HDP 3.0, catalogs for Apache Hive and Apache Spark are separated, and they use their own catalog; namely, they are mutually exclusive - Apache Hive catalog can only be accessed by Apache Hive or this library, and Apache Spark catalog can only be accessed by existing APIs in Apache Spark . In other words, some features such as ACID tables or Apache Ranger with Apache Hive table are only available via this library in Apache Spark. Those tables in Hive should not directly be accessible within Apache Spark APIs themselves.

  • 下面的文章解释了这些步骤:

集成带有Apache Spark的Apache Hive-Hive仓库连接器

这篇关于通过Spark加载的表无法在Hive中访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆