通过从 Hive 表中读取数据创建的 spark 数据帧的分区数 [英] Number of partitions of a spark dataframe created by reading the data from Hive table

查看：26 发布时间：2021/11/14 22:42:55 hive apache-spark-sql

本文介绍了通过从 Hive 表中读取数据创建的 spark 数据帧的分区数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我对 spark dataframe 分区数有疑问.

I have question on spark dataframe number of partitions.

如果我有包含列(名称、年龄、id、位置)的 Hive 表(员工).

If I have Hive table(employee) which has columns (name,age,id,location).

CREATE TABLE employee (name String, age String, id Int) PARTITIONED BY (location String);

如果员工表有 10 个不同的位置.所以数据会在 HDFS 中被划分为 10 个分区.

If the employee table has 10 different locations. So data will be partitioned into 10 partitions in HDFS.

如果我通过读取 Hive 表(员工)的全部数据来创建 Spark 数据帧(df).

If I create a Spark dataframe(df) by reading the whole data of a Hive table(employee).

Spark 将为一个数据帧(df)创建多少个分区?

How many number of partitions will be created by Spark for a dataframe(df)?

df.rdd.partitions.size = ??