如何在不使用 HDP 3.1 中的仓库连接器的情况下从 Spark 编写表以配置单元 [英] How to write a table to hive from spark without using the warehouse connector in HDP 3.1

查看：24 发布时间：2021/11/14 21:58:08 apache-spark hive apache-spark-sql hdp hadoop3

本文介绍了如何在不使用 HDP 3.1 中的仓库连接器的情况下从 Spark 编写表以配置单元的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当尝试在 HDP 3.1 上使用 spark 2.3 将没有仓库连接器的 Hive 表直接写入配置单元模式时，使用:

when trying to use spark 2.3 on HDP 3.1 to write to a Hive table without the warehouse connector directly into hives schema using:

spark-shell --driver-memory 16g --master local[3] --conf spark.hadoop.metastore.catalog.default=hive
val df = Seq(1,2,3,4).toDF
spark.sql("create database foo")
df.write.saveAsTable("foo.my_table_01")

失败:

Table foo.my_table_01 failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional

但是:

val df = Seq(1,2,3,4).toDF.withColumn("part", col("value"))
df.write.partitionBy("part").option("compression", "zlib").mode(SaveMode.Overwrite).format("orc").saveAsTable("foo.my_table_02")

带有 spark.sql("select * from foo.my_table_02").show 的 Spark 工作正常.现在去蜂巢/直线:

Spark with spark.sql("select * from foo.my_table_02").show works just fine. Now going to Hive / beeline:

0: jdbc:hive2://hostname:2181/> select * from my_table_02;
Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0)

 describe extended my_table_02;

 +-----------------------------+----------------------------------------------------+----------+
|          col_name           |                     data_type                      | comment  |
+-----------------------------+----------------------------------------------------+----------+
| value                       | int                                                |          |
| part                        | int                                                |          |
|                             | NULL                                               | NULL     |
| # Partition Information     | NULL                                               | NULL     |
| # col_name                  | data_type                                          | comment  |
| part                        | int                                                |          |
|                             | NULL                                               | NULL     |
| Detailed Table Information  | Table(tableName:my_table_02, dbName:foo, owner:hive/bd-sandbox.t-mobile.at@SANDBOX.MAGENTA.COM, createTime:1571201905, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:value, type:int, comment:null), FieldSchema(name:part, type:int, comment:null)], location:hdfs://bd-sandbox.t-mobile.at:8020/warehouse/tablespace/external/hive/foo.db/my_table_02, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{path=hdfs://bd-sandbox.t-mobile.at:8020/warehouse/tablespace/external/hive/foo.db/my_table_02, compression=zlib, serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:part, type:int, comment:null)], parameters:{numRows=0, rawDataSize=0, spark.sql.sources.schema.partCol.0=part, transient_lastDdlTime=1571201906, bucketing_version=2, spark.sql.create.version=2.3.2.3.1.0.0-78, totalSize=740, spark.sql.sources.schema.numPartCols=1, spark.sql.sources.schema.part.0={\"type\":\"struct\",\"fields\":[{\"name\":\"value\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"part\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}, numFiles=4, numPartitions=4, spark.sql.partitionProvider=catalog, spark.sql.sources.schema.numParts=1, spark.sql.sources.provider=orc, transactional=true}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER, writeId:-1) |

如何在不使用仓库连接器的情况下使用 spark 写入 hive ，但仍写入相同的 Metastore，稍后可以由 hive 读取?据我所知，外部表应该是可能的(你不是托管的，不是 ACID 不是事务性的)，但我不确定如何告诉 saveAsTable 如何处理这些.

How can I use spark to write to hive without using the warehouse connector but still writing to the same metastore which can later on be read by hive? To my best knowledge external tables should be possible (thy are not managed, not ACID not transactional), but I am not sure how to tell the saveAsTable how to handle these.

如何在不使用 HDP 3.1 中的仓库连接器的情况下从 Spark 编写表以配置单元 [英] How to write a table to hive from spark without using the warehouse connector in HDP 3.1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在不使用 HDP 3.1 中的仓库连接器的情况下从 Spark 编写表以配置单元 [英] How to write a table to hive from spark without using the warehouse connector in HDP 3.1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭