加载Hive表时Spark创建多少个分区 [英] How many partitions Spark creates when loading a Hive table

查看：94 发布时间：2021/4/8 19:54:50 apache-spark hadoop pyspark apache-spark-sql

本文介绍了加载Hive表时Spark创建多少个分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

即使是Hive表或HDFS文件，当Spark读取数据并创建数据框时，我仍认为RDD/数据框中的分区数将等于HDFS中的部分文件数.但是，当我使用Hive外部表进行测试时，我发现该数字与部件文件的数量有所不同.数据帧中的分区数为119.该表是其中包含150个部件文件的Hive分区表.，文件的最小大小为30 MB，最大大小为118 MB.那么，什么决定分区的数量呢?

Even if it is a Hive table or an HDFS file, when Spark reads the data and creates a dataframe, I was thinking that the number of partitions in the RDD/dataframe will be equal to the number of partfiles in HDFS. But when I did a test with Hive external table, I could see that the number was coming different than the number of part-files .The number of partitions in a dataframe was 119. The table was a Hive partitioned table with 150 partfiles in it, with a minimum size of a file 30 MB and max size is 118 MB. So then what decides the number of partitions?

加载Hive表时Spark创建多少个分区 [英] How many partitions Spark creates when loading a Hive table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加载Hive表时Spark创建多少个分区 [英] How many partitions Spark creates when loading a Hive table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭