Apache Spark Parquet:无法建立空组 [英] Apache Spark Parquet: Cannot build an empty group

查看：228 发布时间：2020/9/4 7:59:26 apache-spark parquet

本文介绍了Apache Spark Parquet:无法建立空组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是Apache Spark 2.1.1(使用的版本是2.1.0，现在相同，今天已切换). 我有一个数据集:

I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, switched today). I have a dataset:

root
|-- muons: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- reco::Candidate: struct (nullable = true)
|    |    |-- qx3_: integer (nullable = true)
|    |    |-- pt_: float (nullable = true)
|    |    |-- eta_: float (nullable = true)
|    |    |-- phi_: float (nullable = true)
|    |    |-- mass_: float (nullable = true)
|    |    |-- vertex_: struct (nullable = true)
|    |    |    |-- fCoordinates: struct (nullable = true)
|    |    |    |    |-- fX: float (nullable = true)
|    |    |    |    |-- fY: float (nullable = true)
|    |    |    |    |-- fZ: float (nullable = true)
|    |    |-- pdgId_: integer (nullable = true)
|    |    |-- status_: integer (nullable = true)
|    |    |-- cachePolarFixed_: struct (nullable = true)
|    |    |-- cacheCartesianFixed_: struct (nullable = true)

如您所见，此架构中有3个空结构.我知道100％我可以阅读/操作/做任何事情.但是，当我尝试以拼花形式写入磁盘时，出现以下异常:

As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:

dsReduced.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)

因此，基本上，我想了解这是错误还是预期的行为???我还假定它与空结构有关.任何帮助将不胜感激！

So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!

更新:我已经快速创建了剥离版本，并且该版本可以正常运行！任何见解都将真正有帮助！

Apache Spark Parquet:无法建立空组 [英] Apache Spark Parquet: Cannot build an empty group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark Parquet:无法建立空组 [英] Apache Spark Parquet: Cannot build an empty group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭