Spark 在加载 Hive 表时创建了多少个分区 [英] How many partitions Spark creates when loading a Hive table

查看：31 发布时间：2021/11/14 23:01:27 apache-spark hadoop pyspark apache-spark-sql

本文介绍了Spark 在加载 Hive 表时创建了多少个分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

即使是 Hive 表或 HDFS 文件，当 Spark 读取数据并创建数据帧时，我认为 RDD/数据帧中的分区数将等于 HDFS 中的部分文件数.但是当我对 Hive 外部表进行测试时，我可以看到该数字与部分文件的数量不同.数据帧中的分区数为 119.该表是一个 Hive 分区表，其中包含 150 个部分文件, 文件的最小大小为 30 MB，最大大小为 118 MB.那么是什么决定了分区的数量?

Even if it is a Hive table or an HDFS file, when Spark reads the data and creates a dataframe, I was thinking that the number of partitions in the RDD/dataframe will be equal to the number of partfiles in HDFS. But when I did a test with Hive external table, I could see that the number was coming different than the number of part-files .The number of partitions in a dataframe was 119. The table was a Hive partitioned table with 150 partfiles in it, with a minimum size of a file 30 MB and max size is 118 MB. So then what decides the number of partitions?

Spark 在加载 Hive 表时创建了多少个分区 [英] How many partitions Spark creates when loading a Hive table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 在加载 Hive 表时创建了多少个分区 [英] How many partitions Spark creates when loading a Hive table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭